Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?

Demystifying What “sqlservr -q” Actually Does

(last updated on 2018-06-19)

For various reasons, people sometimes find themselves in the unfortunate (and unenviable) situation of having an Instance of SQL Server configured with the wrong Collation. This can often lead to unexpected errors and/or sorting and comparison behavior.

People sometimes try to “fix” this problem by adding the COLLATE DATABASE_DEFAULT to string columns of temporary tables, and/or COLLATE {collation_name} to WHERE / JOIN predicates. While this might work in some situations, it is not a true fix since it does not address areas controlled by the Instance-level Collation: resolving names of variables, cursors, and GOTO labels. It also does not affect the behavior of any Microsoft-provided feature that is contained within msdb and might use the system Collation (possibly including SQL Server Agent, Database Mail, Central Management Server (CMS), Maintenance Plans, Policy Management, etc).

For situations where adding the COLLATE {collation_name} keyword is inadequate or undesirable, there is the option of changing the Instance-level Collation. There are two ways to accomplish this: one “official” and the other “unofficial”. But first, before seeing how to change an Instance’s Collation, it is important to understand exactly what the total impact of that change could be.

General Concepts

The following should be a mostly-complete list of affected areas. It assumes that the Collation will be changing across all levels (Instance, Database, and column) since one of the two methods for changing the Instance-level Collation will change all levels.

  1. Simplistically, changing any Collation, even to the same Collation of a newer version (e.g. Latin1_General_CI_AS to Latin1_General_100_CI_AS) can easily change sort orders as well as what equates to what:
    1. Since the Database default Collations are changing, IF / WHILE / etc conditions for variable and input parameters can change behavior. For example, consider the following code:
      IF (@Variable = N'some string')
      Passing in yes for @Variable would evaluate to True in a case-insensitive Database. But if the Database’s Collation changes to be binary or case-sensitive, then that same value would evaluate to False.
    2. Since the Instance default Collation is changing, this impacts resolution of names of variables / parameters, cursors, and GOTO labels. For example, consider the following code:
      DECLARE @Var INT;
      SET @var = 1;

      This will work just fine on an Instance having a case-insensitive Collation. But, if that Instance changes to either a binary or case-sensitive Collation, then that code will break due to the difference between @Var and @var.
    3. JOINs and WHERE predicates can change behavior based on columns having a new Collation:
      • More rows might match when moving from a binary or case-sensitive Collation to a case-insensitive Collation. This could result in a Cartesian product.
      • Conversely, fewer rows than before might match when moving from a case-insensitive Collation to either a binary or case-sensitive Collation. This could result in missing rows.
    4. GROUP BY and DISTINCT behavior could change in the same manner.
    5. FOREIGN KEYS: Rows that equated to the Primary Key reference due to using a Collation with one or more insensitivities (case, accent, etc.) might no longer equate to the PK when moving to a binary or case-sensitive Collation.
    6. ORDER BY behavior could change.
    7. CHECK CONSTRAINT behavior.
  2. Data-loss potential for non-Unicode string columns: these rely upon the Code Page used by the Collation of the column to determine which characters can be stored in the column. Data loss can occur only if all 3 of the following conditions are true:
    1. Data is stored in a column using a non-Unicode type: CHAR / VARCHAR / TEXT (FYI: the TEXT type has been deprecated starting with the release of SQL Server 2005, so don’t use it, but it might exist in some places)
    2. Characters with underlying values of 128 (0x80) – 255 (0xFF) are being used.
    3. The old and new Collations use different Code Pages. It does not matter if the old and new Collations use different LCIDs (i.e. Locales); only a change in the column’s Code Page matters. For example, the following cultures (and several others) all use Code Page 1252, and hence switching between them will not be a Code Page conversion: French, Finnish_Swedish, German, Latin1_General, Spanish, Norwegian, etc.

    Even if all three conditions are met, that does not guarantee that there will be data loss. Many Code Pages have many of the same characters in the 0x80 – 0xFF range. Whether or not there will be data loss depends on the specific characters being used, what the new Code Page supports, and which method of updating the Collations is being used:

    • The documented method does an actual conversion: if switching to a Collation that uses a different Code Page than the current one, it is possible that some characters might not be available in the new Code Page and would be converted to either a “best fit” mapping, if one can be found, or else to a “?”. But if the characters are available, then the underlying byte value will be changed if necessary.
    • The undocumented method simply changes the Collation: the bytes of the source data will remain the same, but what characters they represent might change if the new Code Page has a different character with that same underlying byte value.
  3. Similar to # 2, if a Database’s new Collation uses a different Code Page than the previous Collation, then non-Unicode string literals in that Database (i.e. those not prefixed with an upper-case “N”) can have one or more characters changed to either a best-fit mapping or into a “?” if not available in the Code Page of the new Database default Collation. This can be PRINT statements, RAISERROR messages, literals used for INSERT or UPDATE statements. PLEASE NOTE: this conversion will take place during parsing and hence will not be visible to you. Please see the following two part series for a full explanation: “Which Collation is Used to Convert NVARCHAR to VARCHAR in a WHERE Condition?” (Part A of 2: “Duck”) and (Part B of 2: “Rabbit”).

  4. All Indexes containing string-type columns in key fields (INCLUDE columns shouldn’t matter unless one or more of them are non-Unicode and the old and new Collations use different Code Pages) need to be fully rebuilt (or more likely dropped and recreated):

    1. Their ordering might have changed
    2. If using a filter expression (and this applies equally to filtered statistics) that filters on a string column, then the rows that have been included / excluded from the index (or statistic) might change!
    3. For non-Unicode string columns, if the old and new Collations use different Code Pages, then it is possible that characters might change (especially if their underlying value is between 0x80 (128) and 0xFF (255)). See item #2 above regarding data-loss.
  5. Potential breaking of code: similar to #1, but here the code will actually break (not just silently behave differently) in any of the following situations if a column in a User Table (i.e. Collation will be changing) is used in combination with a column from a System Table that is NOT based on the Instance default Collation or the Database default Collation (some Collations are hard-coded and are the same between all systems) such that the Collations between the columns is no longer the same:
    • JOIN or WHERE predicate
    • String concatenation (e.g. column1 + column2 )
    • UNION / UNION ALL
    • COALESCE (but ISNULL is fine)
    • CASE statements returning the columns in question
    • CONCAT function
    • possibly some other situations
  6. Some columns might not be desirable to change. Some (perhaps many) applications use the same Collation for all string columns. But, for any columns that are set to a different Collation for a specific reason, it might not be desirable to change that to the same Collation as all other columns. On the other hand, it might be less work to change a few columns back to non-standard Collations than it would be to change most of them manually. At the very least you need to do an audit to make sure that you know where all of your “special” columns with differing Collations are, and what Collation they are using so that they can be set back to that after the mass-update.

  7. System objects might be in conflict and/or code might break if object names can no longer be resolved, or resolve to duplicates. If using inconsistent casing between object definition and object reference in code (e.g. Table name = “Customers”, Table reference in stored procedures / functions / views = “customers”) and moving to a case-sensitive or binary Collation, that code will fail. If currently using a binary or case-sensitive Collation and having objects named both “Customers” and “customers”, moving to a case-insensitive Collation will fail on a unique constraint violation for what is the internal table holding what we see in sys.objects.

  8. What about Full Text Search? Not sure if that is impacted or not, and if so, how…

Methods

IMPORTANT: Before making any changes to your system, be sure to have a complete (and working) set of backups, just in case something goes wrong. “Working” here means that you have tested them by restoring them (somewhere).

Documented Approach

Below is a general overview of this approach that only mentions the parts that actually change the Collation of some part of the system. This is not a full, step-by-step guide. Please see the “Set or Change the Server Collation” link below for additional details.

Rebuild System Databases

Do this to change the Instance-level default Collation, as well as the Collation of the system Databases: master, model, msdb, and tempdb (which is just a copy of model).

Note: the ^ character is for line-continuation. Otherwise, hitting Enter executes the command.

SETUP.EXE /QUIET /ACTION=REBUILDDATABASE ^
/INSTANCENAME=InstanceName ^
/SQLCOLLATION=CollationName ^
/SQLSYSADMINACCOUNTS=accounts  [ ^
/OptionalSwitches ]
 

PLEASE BE AWARE of the following warning in the documentation:

RebuildDatabase scenario deletes system databases and installs them again in clean state. Because the setting of tempdb file count does not persist, the value of number of tempdb files is not known during setup.

This step does not do anything more than change the Collation of the four system Databases, Instance-level meta-data, and the Instance itself. User Databases, as well as the string columns of the User Tables within them, are ignored.

Change Database-level Collation of User Databases

Do this to change the Database’s default Collation, as well as the Collation of Database-level meta-data:

ALTER DATABASE { database_name | CURRENT } COLLATE {new_collation_name} ;

For example:

ALTER DATABASE [TestDB] COLLATE Latin1_General_100_CI_AS_SC;

or:

ALTER DATABASE CURRENT COLLATE Latin1_General_100_CI_AS_SC;

PLEASE NOTE: You cannot change the Collation for any of the system Databases. Attempting to do so will result in the following error:

Msg 3708, Level 16, State 5, Line 271
Cannot alter the database ‘model’ because it is a system database.

Also, there are certain conditions which can prevent this command from completing. For example, if there are any objects that were created with the SCHEMABINDING option and that use the Database’s default Collation. For more details, please see the documentation for “ALTER DATABASE: Changing the Database Collation“.

This step does not do anything more than change the Collation of Database-level meta-data, and the Database itself. String columns of the User Tables within the Database are ignored.

Undocumented Approach

How to Do it

The following approach has some advantages over the documented approach described above, especially being one step instead of three, but it is undocumented and hence unsupported, so if you run into any problems, Microsoft is not likely to help, nor will they fix any buggy or unexpected behavior. The -q switch of SQLSERVR.EXE is not found in the Microsoft documentation, nor is it listed when passing in the -? switch to get the help info. However, it can be used as follows:

sqlservr -c -m -T4022 -T3659 -s"{instance_name}" -q"{new_collation_name}"

For example:

sqlservr -c -m -T4022 -T3659 -s"CHANGECOLLATION" -q"Estonian_100_CS_AS_SC"

You must execute this command in an “Administrator” Command Prompt, not a regular Command Prompt. Attempting to do this in a regular Command Prompt will result in the following error messages:

2018-06-10 16:08:19.06 Server Error: 17058, Severity: 16, State: 1.
2018-06-10 16:08:19.06 Server initerrlog: Could not open error log file ”. Operating system error = 3(The system cannot find the path specified.).
… {same error repeated 9 more times)
2018-06-10 16:08:22.08 Server SQL Server shutdown has been initiated

You will also need to make sure that the Instance is not currently running before executing that command-line.

Trace Flag 4022 instructs SQL Server to not execute any Stored Procedures marked as “execute at startup”.

Trace Flag 3659 instructs SQL Server to log all errors to the error log during server startup.

Once it completes, you should see the following in the Command Prompt window:

2018-06-10 16:10:45.97 spid6s   The default collation was successfully
                                changed.
2018-06-10 16:10:46.12 spid6s   Recovery is complete. This is an
                                informational message only. No user action
                                is required.

However, the process is still running and does not self-terminate.
Hit Control-C.

You will then see:

Do you wish to shutdown SQL Server (Y/N)?

Hit y.

You should then see the following:

2018-06-10 16:10:56.13 spid6s   SQL Server shutdown due to Ctrl-C or
                                Ctrl-Break signal. This is an informational
                                message only. No user action is required.
2018-06-10 16:10:56.13 spid6s   SQL Server shutdown has been initiated
2018-06-10 16:10:56.13 spid6s   SQL Trace was stopped due to server
                                shutdown. Trace ID = '1'. This is an
                                informational message only; no user action
                                is required.

What it Does

  1. Converts one Database at a time (appears to be in database_id order):
    1. master
    2. tempdb
    3. model
    4. msdb
    5. User DBs
  2. Will roll back a DB if it does not complete fully, but DBs that have completed will remain converted if an error occurs. This behavior holds true for the system Database’s also. This could lead to the process leaving the Instance with inconsistent Collations for the system DBs if they do not all complete successfully. Meaning, if less than all 4 system DBs complete successfully, there will be a mismatch between the system DBs. And if they do complete successfully but a User DB fails, then the system DBs won’t match the Instance-level Collation. This is not terribly bad, though, since you can fix the problem and restart the operation.

  3. Conversion bypasses restrictions imposed on documented method (ALTER DATABASE, and ALTER TABLE…ALTER COLUMN). The following do not cause an error with this method:

    1. Schema-bound objects
    2. Check constraints that use the Database’s Collation
    3. Computed columns that use the Database’s Collation
    4. Table-Valued Functions (TVFs) that pass back string columns that did not specify the COLLATE keyword.
    5. Indexes on string columns
  4. Indexes containing string columns are dropped and recreated. If the Clustered Index contains at least one string column, then all Indexes on the Table are dropped and recreated (even if they do not contain any string columns).

  5. Conversion bypasses checks meant to prevent leaving the data in an invalid state:

    1. NTEXT columns do not error when being set to a Collation that supports Supplementary Characters. This leaves the column as effectively read-only: you can select from it, but attempting to modify it will get the following error:
      Msg 4189, Level 16, State 0, Line XXXXX
      Cannot convert to text/ntext or collate to ‘{collation_name}’ because these legacy LOB types do not support the Unicode supplementary characters whose codepoints are U+10000 or greater. Use types varchar(max), nvarchar(max), or a collation which does not have the _SC flag.

      However, it is easy enough to manually change the Collation of these columns to one that is not Supplementary Character Aware.
    2. Foreign Key rows that referenced a PK value due to relying upon an insensitivity (e.g. “Y = y” due to case-insensitivity) might no longer reference any PK value. There is no constraint verification, so the FK will be left as “enabled” and “trusted” (assuming it was both “enabled” and “trusted” prior to the operation), yet these same values can no longer be added due to now getting the expected FK violation error.
  6. Data loss can occur in VARCHAR / CHAR / TEXT columns if the new Collation uses a different Code Page than was being used previously, and the new Code Page does not have the same character having the same numeric value. This is a different type of data loss than what happens with the other method. The documented method does an actual Code Page conversion, which will both (a) adjust the byte value to a different one for characters that exist in both Code Pages but with different underlying values, and (b) attempt to find a similar looking character, known as a “best fit” match, if a “best fit” mapping exists. This undocumented method does neither of those character conversions; no attempt is made to maintain consistency of the character / glyph itself. The underyling byte values remain the same, but the character that they map to might change between Code Pages. For example, in Code Page 1252 (Latin1_General), byte value 0xC6 equates to “Æ” (Latin Capital Letter AE), but using this method to change to Code Page 1257 (Baltic Rim), the data would show “Ę” (Latin Capital Letter E with Ogonek) because that is what 0xC6 maps to in that Code Page, even though Code Page 1257 contains “Æ”, but having an underlying value of 0xAF instead of 0xC6. A true Code Page conversion would have changed the underlying byte value from 0xC6 to 0xAF so that the data would have shown the same character, “Æ”, after the operation.
    This alone indicates that the operation is not doing any actual string conversions, but instead is merely updating the meta-data for string columns directly to the system catalog tables.

  7. Instance-level default Collation will not change until all DBs have successfully converted.

  8. This operation will make no changes at all if the current Instance-level Collation is the same as the new Collation being requested by this operation. This means that if the process converts some Databases but then fails while converting another Database (hence not completing and not changing the Instance-level Collation), then you cannot “revert” the operation by going back to the original Collation. If the Collation you request using the -q switch is the same as the Instance-level Collation, then running this command-line will simply start the Instance in Single-User mode instead of making any Collation changes. However, you will still be able to make changes to the Database that the error occurred in and repeat the operation for the new Collation.

  9. While both documented and undocumented methods use the Transaction Log for the dropping and recreating of Indexes, this method sometimes uses less Tran Log space due to not doing Code Page conversions of VARCHAR / CHAR / TEXT columns. Of course, if the old and new Collations use the same Code Page, then the documented method won’t be doing Code Page conversions anyway.

  10. Unlike the documented “SETUP.EXE /ACTION=REBUILDDATABASE” method, this method does not drop and recreate the system Databases (requiring a bit of extra work to get the instance back to its original state minus the change in Collation). This could save a good bit of work in terms of restoring Databases, recreating Instance-level objects, SQL Server Agent configuration and jobs, etc.

Known issues

  1. Due to read-only DB (documented in link #3 at the bottom):
    Error: 5804, Severity: 16, State: 1
    Character set, sort order, or collation cannot be changed at the server level because at least one database is not writable.
    Make the database writable, and retry the operation.

  2. Due to files being read-only (documented in link #3 at the bottom):
    Error: 3416, Severity: 20, State: 1
    The server contains read-only files that must be made writable before the server can be recollated.

  3. Due to In-Memory DB (documented in link #3 at the bottom):
    Error: 41317, Severity: 16, State: 4
    A user transaction that accesses memory optimized tables or natively compiled procedures cannot access more than one user database or databases model and msdb, and it cannot write to master.

  4. Due to In-Memory DB (documented in link #3 at the bottom) and Unique Constraint Violation:
    Error: 3434, Severity: 20, State: 1
    Cannot change sort order or locale. An unexpected failure occurred while trying to reindex the server to a new collation. SQL
    Server is shutting down. Restart SQL Server to continue with the sort order unchanged. Diagnose and correct previous errors and then retry the operation.

  5. Due to Unique Constraint Violation:
    Error: 1505, Severity: 16, State: 1.
    The CREATE UNIQUE INDEX statement terminated because a duplicate key was found for the object name ‘dbo.UniqueIndexViolation’ and the index name ‘CUIX_UniqueIndexViolation’. The duplicate key value is (a ).

Posts dealing with the undocumented sqlservr.exe -q option:

  1. Changing SQL Server Collation After Installation ( 2015-02-19 )
  2. SQL Server – Changing Sql Instance Collation – via sqlservr/-q ( 2016-11-04 ; based on post linked directly above)
  3. SQL Server – Changing Sql Instance Collation – via sqlservr/-q – Little Traps ( 2017-01-08 ; follow-up to post linked directly above)

SQL Server Express LocalDB

Changing the Instance-level Collation for LocalDB is not possible. It will always be SQL_Latin1_General_CP1_CI_AS (quite unfortunately!):

  1. There is no SETUP.EXE for LocalDB.

  2. Attempting to use the sqlservr.exe -q option results in one of the following errors, depending on how you specify the Instance name:

    • Using just the instance name:

    Your SQL Server installation is either corrupt or has been tampered with (Error getting instance ID from name.). Please uninstall then re-run setup to correct this problem

    • Using (LocalDB)\InstanceName:

    Your SQL Server installation is either corrupt or has been tampered with (Error: Instance name exceeds maximum length.). Please uninstall then re-run setup to correct this problem

Testing

I tested on SQL Server 2017 CU6, but the behavior should be consistent across versions.

General

{ When I have time I will post the test cases }

LocalDB

SETUP

sqllocaldb c TestChange

CD C:\Program Files\Microsoft SQL Server\140\LocalDB\Binn

TEST 1

.\sqlservr -c -m -T4022 -T3659 -s"TestChange" -q"Hebrew_100_CI_AS"

REM Your SQL Server installation is either corrupt or has been tampered
REM with (Error getting instance ID from name.).  Please uninstall then
REM re-run setup to correct this problem

TEST 2

.\sqlservr -c -m -T4022 -T3659 -s"(LocalDB)\TestChange" -q"Hebrew_100_CI_AS"

REM Your SQL Server installation is either corrupt or has been tampered
REM with (Error: Instance name exceeds maximum length.).  Please
REM uninstall then re-run setup to correct this problem

More…

For more info on Collations / encodings / Unicode / Extended ASCII (especially as they relate to Microsoft SQL Server), please visit:

Collations.Info

4 thoughts on “Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?”

Thoughts? Questions? Comments? Suggestions? Words of praise?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s