Beware! Beware of Unintended Changes When Altering Columns!

belalugosi_550x415

(last updated: 2019-01-15 @ 19:50 EST / 2019-01-16 @ 00:50 UTC )

I haven’t seen a lot written about these particular behaviors so I thought I would mention them. And, while they are documented, that doesn’t mean that most people are aware of them.

What I am referring to are the silent changes that can occur when changing the datatype of a column via ALTER TABLE ... ALTER COLUMN .... There are two properties that can change if a) the current setting is not the default, and b) they are not specified when making the change. Those properties are:

  1. NULLability (NULL vs NOT NULL)
  2. Collation

NULLability

If the column is NOT NULL, then not specifying NOT NULL will cause it to become NULLable. The documentation for ALTER TABLE even states:

ANSI_NULL defaults are always on for ALTER COLUMN; if not specified, the column is nullable.

Let’s see for ourselves. We will create a simple table with two columns, both being defined as NOT NULL. We can then verify that property for the columns using a system catalog view, and then we will attempt to insert a NULL.

USE [tempdb];

-- DROP TABLE dbo.AlterColumnInt;
CREATE TABLE dbo.AlterColumnInt
(
  [Col1] INT NOT NULL,
  [Col2] DATETIME NOT NULL
);


SELECT [name], system_type_id, max_length, [is_nullable], [collation_name]
FROM   sys.columns
WHERE  [object_id] = OBJECT_ID(N'dbo.AlterColumnInt');
/*
name   system_type_id   max_length   is_nullable   collation_name
Col1   56               4            0             NULL
Col2   61               8            0             NULL
*/


INSERT INTO dbo.AlterColumnInt ([Col1], [Col2])
  VALUES (NULL, '2019-01-01');
/*
Msg 515, Level 16, State 2, Line XXXXX
Cannot insert the value NULL into column 'Col1', table
  'tempdb.dbo.AlterColumnInt'; column does not allow nulls. INSERT fails.
The statement has been terminated.
*/

As you can see at the end of the example above, we received an error due to our attempt to insert the NULL. This was expected.

Now, we will change the INT column to be BIGINT. We will verify that the datatype changed, and can also see that the value of is_nullable has changed. Finally, we will attempt that insert again.

ALTER TABLE dbo.AlterColumnInt
  ALTER COLUMN [Col1] BIGINT;
-- Commands completed successfully.
-- (no warnings)


SELECT [name], system_type_id, max_length, [is_nullable], [collation_name]
FROM   sys.columns
WHERE  [object_id] = OBJECT_ID(N'dbo.AlterColumnInt');
/*
name   system_type_id   max_length   is_nullable   collation_name
Col1   127              8            1             NULL
Col2   61               8            0             NULL
*/

INSERT INTO dbo.AlterColumnInt ([Col1], [Col2])
  VALUES (NULL, '2019-01-02');
-- Success (oops!)

As you can see at the end of the example above, attempting to insert a NULL this time succeeds.

IMPACT:

  1. Potential bad data (the NULLs) allowed in.

Collation

If the column’s collation is not the database’s default collation, then not specifying the COLLATE clause with the column’s current collation will cause the column’s collation to change to the database’s default collation. The documentation even states:

If the COLLATE clause is not specified, changing the data type of a column will cause a collation change to the default collation of the database.

Let’s see for ourselves. This time we need to create a test database so that we can guarantee that its default collation and the test column’s collation are different. I’m using the SQL_Latin1_General_CP1_CI_AS collation because it’s so widely used (of course, that’s largely due to it being the unfortunate default when installing on English-based systems, as well as the only option for Instance-level collation in LocalDB and Azure SQL Database Managed Instances). We will create another simple table with two VARCHAR columns, but this time the columns will be exactly the same except for their names (which will allow us to see if the behavior is different between changing only the size of the column and changing the datatype to NVARCHAR). We will verify the properties of the columns, and then insert two rows: one using Hebrew letters (found on Code Page 1255, which is what the Hebrew_* collations use), and one using standard ASCII characters (so we can test for changes in case-sensitivity since Hebrew does not have both upper- and lower- case letters).

-- Instance-level collation:
SELECT SERVERPROPERTY('Collation');
-- Latin1_General_100_CI_AS_SC_UTF8


IF (DB_ID(N'AlterTest') IS NULL)
BEGIN
  CREATE DATABASE [AlterTest] COLLATE SQL_Latin1_General_CP1_CI_AS;
  ALTER DATABASE [AlterTest] SET RECOVERY SIMPLE;
END;


USE [AlterTest];

-- DROP TABLE dbo.AlterColumnVarchar;
CREATE TABLE dbo.[AlterColumnVarchar]
(
  [Col1] VARCHAR(10) COLLATE Hebrew_100_CS_AS_SC NOT NULL,
  [Col2] VARCHAR(10) COLLATE Hebrew_100_CS_AS_SC NOT NULL
);


SELECT [name], system_type_id, max_length, [is_nullable], [collation_name]
FROM   sys.columns
WHERE  [object_id] = OBJECT_ID(N'dbo.AlterColumnVarchar');
/*
name   system_type_id   max_length   is_nullable   collation_name
Col1   167              10           0             Hebrew_100_CS_AS_SC
Col2   167              10           0             Hebrew_100_CS_AS_SC
*/


INSERT INTO dbo.AlterColumnVarchar ([Col1], [Col2])
  VALUES (0xF9D1C8ECE5C9ED, 0xF9D1C8ECE5C9ED); --"peace" (shalom) in Hebrew
INSERT INTO dbo.AlterColumnVarchar ([Col1], [Col2])
  VALUES ('BOB', 'BOB');

With the environment set up, we can now run some tests. First we will verify that the data exists in the table as we expect, then verify that the case-sensitive comparison works, and finally verify that the comparison is truly case-sensitive and not a case-insensitive.

SELECT * FROM dbo.AlterColumnVarchar;
/*
Col1     Col2
שָׁלוֹם     שָׁלוֹם
BOB      BOB
*/

SELECT * FROM dbo.AlterColumnVarchar WHERE [Col1] = 'BOB';
/*
Col1     Col2
BOB      BOB
*/

SELECT * FROM dbo.AlterColumnVarchar WHERE [Col1] = 'bob';
-- no rows

Next we change both columns. [Col1] will only have its size increased, while [Col2] will be changed to NVARCHAR.

ALTER TABLE dbo.[AlterColumnVarchar]
  ALTER COLUMN [Col1] VARCHAR(50);

ALTER TABLE dbo.[AlterColumnVarchar]
  ALTER COLUMN [Col2] NVARCHAR(50);
-- Commands completed successfully.
-- (no warnings)


SELECT [name], system_type_id, max_length, [is_nullable], [collation_name]
FROM   sys.columns
WHERE  [object_id] = OBJECT_ID(N'dbo.AlterColumnVarchar');
/*
name   system_type_id   max_length  nullable  collation_name
Col1   167              50          1         SQL_Latin1_General_CP1_CI_AS
Col2   231              100         1         SQL_Latin1_General_CP1_CI_AS
*/

The results shown at the end of the example code directly above indicate the following:

  • [Col1]
    1. had its size go from 20 to 50 (expected).
    2. is now NULLable.
    3. now has a collation of SQL_Latin1_General_CP1_CI_AS (which is definitely the Database’s default collation and not the Instance-level collation)
  • [Col2]
    1. is now NVARCHAR.
    2. had its size go from 20 to 100 (expected: remember, this is in terms of bytes, and this column is now NVARCHAR).
    3. is now NULLable.
    4. now has a collation of SQL_Latin1_General_CP1_CI_AS (which is definitely the Database’s default collation and not the Instance-level collation)

So, what is the impact of that collation change?

SELECT * FROM dbo.AlterColumnVarchar;
/* (oops!)
Col1     Col2
???????  שָׁלוֹם
BOB      BOB
*/

SELECT * FROM dbo.AlterColumnVarchar WHERE [Col1] = 'BOB';
/* (expected)
Col1     Col2
BOB      BOB
*/

SELECT * FROM dbo.AlterColumnVarchar WHERE [Col1] = 'bob';
/* (oops!)
Col1     Col2
BOB      BOB
*/

As you can see from the results in the code shown above:

  • [Col1]
    1. since it remained as VARCHAR but changed code pages to one that does not support those Hebrew characters (from 1255 to 1252), those characters were each converted into the default replacement character: “?”.
    2. is no longer case-sensitive since “bob” now matches “BOB”.
  • [Col2]
    1. since it changed to NVARCHAR, which supports all characters, did not have any data-loss.
    2. even though there is no example here showing this, there could be minor differences in sorting and comparison rules between the original and new collations, even if the two collations have the same sensitivities. Differences in either version (the number, or lack of number, in the collation name: 90 vs 100, etc) or locale (the culture name: French vs Latin1_General vs Hebrew, etc) imply that there will be at least some differences. Do not assume that no data loss and same sensitivities means that the behavior will be the same in all cases!

At this point, if you were to change the column again to set it back to its previous properties, you could change it back to:

  1. being NOT NULL, if no NULLs have been inserted.
  2. using Hebrew_100_CS_AS_SC, which will make it case-sensitive again.

But, you cannot:

  1. change it back to NOT NULL if any NULLs have been inserted.
  2. get any lost characters back. That’s right: anything that was converted into a question mark “?” is now just a question mark with no info / clues anywhere regarding what the original character was. The only way to get the original characters back is to restore from a backup and UPDATE the live table from the restored copy (after changing the collation back to the original one!).

Given the info noted above, please keep in mind the following points regarding potential data-loss when changing a VARCHAR column:

  1. changing code pages does not always convert characters without an exact match in the new code page into “?”. Sometimes they are converted to a close-enough match, known as a “best fit” mapping. This is sometimes better than converting to “?”, but there is still no indication as to what the original character was.
  2. you will not lose any data (i.e. characters turning into “?” or a “best fit” mapping) when:
    1. the characters are in the standard ASCII range 0 – 127 (0x00 – 0x79)
    2. switching to NVARCHAR
    3. staying VARCHAR but changing collations to one that uses the same code page, even if it is a different locale (e.g. there are 38 locales / collation groups that use code page 1252: SQL_Latin1_General, Latin1_General, French, Finnish_Swedish, German_PhoneBook, etc)
  3. please see “Collation (Bonus Round)” below for UTF-8-specific notes.

IMPACT:

  1. Potential data loss (for both existing and new data).
  2. Potential change in behavior (if there is a change in one or more sensitivities — case, accent, Kana type, width, and/or variation selector — or in _SC , _BIN , _BIN2 , or _UTF8 options)
  3. As a secondary issue related to item #2: potential failure to re-create a unique index/constraint (that was dropped in order to alter the column) if a sensitivity — case, accent, Kana type, width, and/or variation selector — changed from “sensitive” to “insensitive” and data exists in the column that would no longer be unique in an “insensitive” collation (meaning: “a” and “A” are both unique in a case-sensitive or binary collation, but are not unique in a case-insensitive collation).

Collation (Bonus Round: UTF-8)

So, how do the UTF-8 collations (new in SQL Server 2019) fit into this story? UTF-8 is a Unicode encoding which means that it supports all characters. It is also an 8-bit encoding which means that it works with the VARCHAR datatype. And this is where there is potential for problems. VARCHAR sizing allows for 1 byte per character (e.g. VARCHAR(13) allows up to 13 bytes to be stored. But when dealing with UTF-8, only characters with values 0 – 127 (the standard ASCII set) are 1 byte per character. The next approximately 1900 characters are 2 bytes per character (same as UTF-16 / NVARCHAR), and the remaining 63k characters (in the lower 65k) are 3 bytes per character.

We need to change the collation of the database to a UTF-8 collation so that it will be the resulting collation from the ALTER TABLE operation when no collation is specified. We will then create another table, this time using a Thai collation, which uses code page 874. Like last time, we will use have two columns that are the same except for the name so we can see the difference between VARCHAR and NVARCHAR.

ALTER DATABASE [AlterTest] COLLATE French_100_CI_AS_SC_UTF8;


-- DROP TABLE dbo.AlterColumnVarcharThai;
CREATE TABLE dbo.[AlterColumnVarcharThai]
(
  [Col1] VARCHAR(10) COLLATE Thai_100_CS_AS_SC NOT NULL,
  [Col2] VARCHAR(10) COLLATE Thai_100_CS_AS_SC NOT NULL
);

SELECT [name], system_type_id, max_length, [is_nullable], [collation_name]
FROM   sys.columns
WHERE  [object_id] = OBJECT_ID(N'dbo.AlterColumnVarcharThai');
/*
name   system_type_id   max_length   is_nullable   collation_name
Col1   167              10           0             Thai_100_CS_AS_SC
Col2   167              10           0             Thai_100_CS_AS_SC
*/


INSERT INTO dbo.AlterColumnVarcharThai ([Col1], [Col2])
  VALUES (0xA4C7D2C1CAA7BA, 0xA4C7D2C1CAA7BA);
  -- "peace" in Thai (Khwām s̄ngb)
  -- (according to Google Translate; I don't know Thai)


SELECT * FROM dbo.AlterColumnVarcharThai;
/*
Col1      Col2
ความสงบ    ความสงบ
*/

As you can see in the example code directly above, we inserted 7 bytes into both columns and it correctly stored, and retrieves, 7 Thai characters.

Now let’s increase the max size by 1 byte for both columns, and change one column to NVARCHAR.

ALTER TABLE dbo.[AlterColumnVarcharThai]
  ALTER COLUMN [Col1] VARCHAR(11) NOT NULL;

ALTER TABLE dbo.[AlterColumnVarcharThai]
  ALTER COLUMN [Col2] NVARCHAR(11) NOT NULL;
-- Commands completed successfully.
-- (no warnings)


SELECT [name], system_type_id, max_length, [is_nullable], [collation_name]
FROM   sys.columns
WHERE  [object_id] = OBJECT_ID(N'dbo.AlterColumnVarcharThai');
/*
name   system_type_id   max_length   is_nullable   collation_name
Col1   167              11           0             French_100_CI_AS_SC_UTF8
Col2   231              22           0             French_100_CI_AS_SC_UTF8
*/

As you can see in the example code directly above, this collation was changed to the database’s default collation (as expected). Although, at least this time we fixed the NOT NULL issue by specifying it in the ALTER TABLE statements.

Now we should look to see what is being stored in the columns:

SELECT * FROM dbo.AlterColumnVarcharThai;
/* (oops!)
Col1     Col2
ควา      ความสงบ
*/

Yikes! We lost 4 of the 7 Thai characters in the VARCHAR column. But why?

If we look at how those same 7 Thai characters are represented in each of the 3 encodings being used here, it will be much clearer:

DECLARE @Thai NVARCHAR(10) = 0x040E270E320E210E2A0E070E1A0E;
SELECT @Thai; -- ความสงบ

SELECT DATALENGTH(CONVERT(VARBINARY(50), CONVERT(VARCHAR(50), @Thai
                          COLLATE Thai_100_CS_AS_SC))) AS [CodePage874],
       DATALENGTH(CONVERT(VARBINARY(50), @Thai)) AS [UTF-16],
       DATALENGTH(CONVERT(VARBINARY(50), CONVERT(VARCHAR(50), @Thai
                          COLLATE Thai_100_CS_AS_SC_UTF8))) AS [UTF-8]
/*
CodePage874    UTF-16    UTF-8
7              14        21
*/

UTF-16 (which is what NVARCHAR uses) requires 14 bytes, and there is space for that because NVARCHAR(11) gives us up to 22 bytes. But UTF-8 requires 21 bytes due to each character taking up 3 bytes, yet VARCHAR(11) only gives us up to 11 bytes. The bytes that do not fit in the allocated space are silently truncated. Oops.

Additional notes about potential data-loss, specific to UTF-8 collations:

  1. if the original collation is a UTF-8 collation (new in SQL Server 2019) and the new collation is a non-UTF-8 collation (and the column is staying VARCHAR), then that is a code page change that can result in data-loss, even if the locale / culture is the same. This is, after all, going from Unicode (all characters) to non-Unicode (not all characters).
  2. if staying VARCHAR but switching to a UTF-8 collation (new in SQL Server 2019), then while all characters are supported, it is still possible to get data-loss due to truncation if the new column size does not have space for characters that require 2 or 3 bytes when encoded as UTF-8.
  3. you might run into this if you are intentionally converting VARCHAR columns to UTF-8, so be careful! (though you probably shouldn’t be using UTF-8 for anything but ASCII text anyway; please see: Native UTF-8 Support in SQL Server 2019: Savior or False Prophet?)

Anything Else?

These changes are specific to ALTER TABLE and do not happen when changing other aspects of a column, such as changing the name via sp_rename.

Conclusion

When changing the datatype of a column, it is very important to pay attention to the current is_nullable and collation_name properties of the column. Non-default values that should not be changing need to be explicitly set (to their current values) in the ALTER TABLE statement.

Now, more than ever before, the prophetic words 1 of the late Bela Lugosi ring true:

Beware.
Beware.
Beware of the big green dragon that sits on your doorstep.
He eats little boys, puppy dog tails, and big, fat snails.
Beware.
Take care.
Beware!

Clearly, the “big green dragon” is a reference to unintended, silent changes. And the eating of “little boys, puppy dog tails, and big, fat snails” refers to the potential data loss, potential bad data allowed into the system, and potential time spentwasted finding and fixing these issues. Clearly!

So: Beware … Take care … Beware!


  1. Taken from the movie, “Glen or Glenda“ 

1 thought on “Beware! Beware of Unintended Changes When Altering Columns!”

Thoughts? Questions? Comments? Suggestions? Words of praise?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s