What’s in a Name?: Inside the Wacky World of T-SQL Identifiers

thingamajig-and-whatchamacallit

(last updated: 2018-04-18)

Today we are going to take a look into an area that you probably are thinking has very little to see: T-SQL Identifiers (i.e. entity names). Most of the time the rules governing how you name things in SQL Server are pretty simple, but there are several nuances that you might want to be aware of.

The Basics

To start with, most names (but not all) are of type sysname. sysname is an alias (think User-Defined Data Type / UDDT) for NVARCHAR(128). You can see this using the following query:

SELECT alias.[name] AS [AliasName],
       base.[name] AS [BaseName],
       alias.[max_length] AS [MaxSize]
FROM   sys.types alias
INNER JOIN sys.types base
        ON base.[user_type_id] = alias.[system_type_id]
WHERE  alias.[name] = N'sysname'

which returns:

AliasName    BaseName    MaxSize
sysname      nvarchar    256

Please note that MaxSize is expressed in sys.types in terms of bytes, not characters. Because this is NVARCHAR instead of VARCHAR, the max of 128 “characters” (i.e. 16-bit Code Points) equates to a max of 256 bytes.

The Rules (According to Documentation)

The official Microsoft documentation for Database Identifiers states the following (reordered to make more sense):

  • There are two types of identifiers: Regular and Delimited
    • Regular Identifiers
      • Do not need to be enclosed by delimiters
      • First character:
        • Any “letter” according to Unicode 3.2 classification
        • underscore ( _ ), at sign ( @ ), or number sign ( # )
          • A regular identifier that starts with the at sign (@) always denotes a local variable or parameter and cannot be used as the name of any other type of object.
          • An identifier that starts with a number sign ( # ) denotes a temporary table or procedure.
          • An identifier that starts with double number signs ( ## ) denotes a global temporary object.

          Although the number sign or double number sign characters can be used to begin the names of other types of objects, we do not recommend this practice.

      • Subsequent characters:
        • Any “letter” according to Unicode 3.2 classification
        • Any “decimal” number according to Unicode 3.2 classification
        • underscore ( _ ), at sign ( @ ), number sign ( # ), or dollar sign ( $ )
      • No embedded spaces or special characters
      • No supplementary characters
      • Cannot be a T-SQL reserved word
    • Delimited Identifiers
      • Do need to be enclosed by delimiters (due to not conforming to rules for identifiers)
      • Delimiters are either double-quotes ( ) or square brackets ( [ ] )
  • Collation
    • Instance-level items (Databases, Logins, etc) governed by Instance default Collation
    • Database-level items (Users, Schemas, Tables, etc) governed by Database default Collation
  • Length
    • 1 – 128 characters in most cases
    • 1 – 116 characters for local temporary tables
  • Variable and Parameter names must follow rules of identifiers

That all seems simple and straight-forward enough. So what’s the problem, if there even is one?

Well, I’m glad you asked. There are several issues with these rules:

  1. It is sometimes unclear what they mean by “indetifiers” when that term is used by itself since they have named two types of identifiers.
  2. For example, what exactly is being stated concerning variable and parameter names? Is it only max length?
  3. What exactly are the valid “letters” and “decimal” numbers according to Unicode 3.2?
  4. For variables, parameters, and local and global temporary objects, does the prefix that designates their type (i.e. the @, #, and ## prefixes) count as part of the identifier or not?
  5. What impact, exactly, does Collation have? Is it the usual sensitivities and locale-specific linguistic rules?
  6. Has anything been left out?

Investigation

Valid Characters

Finding the exact list of valid characters turned out to be an entire project in itself. For those details, please see the following three posts:

  1. The Uni-Code: The Search for the True List of Valid Characters for T-SQL Regular Identifiers, Part 1
  2. The Uni-Code: The Search for the True List of Valid Characters for T-SQL Regular Identifiers, Part 2
  3. The Uni-Code: The Search for the True List of Valid Characters for T-SQL Identifiers, Part 3 of 2 (Delimited Identifiers)

Max. (and a few Min.) Lengths

First we can try to find all tables with name columns that are either not sysname or NVARCHAR, or don’t have a “maxsize” of 256:

SELECT ao.[name], ao.[type_desc], ac.*
FROM   sys.all_columns ac
INNER JOIN sys.all_objects ao
        ON ao.[object_id] = ac.[object_id]
WHERE ao.[is_ms_shipped] = 1
AND   ac.[name] = N'name'
AND  (
       ac.[user_type_id] NOT IN (256, 231)
   OR  ac.[max_length] <> 256
     )
ORDER BY  ao.[name], ac.[column_id];

/*
name                            max_length
---------------------------     ----------
assembly_files                  520
dm_server_audit_status          512
dm_tran_active_transactions     64
dm_xe_sessions                  512
*/

The results of that query indicate that Assembly Files, Transactions, Server Audit Status, and Extended Event Sessions should be investigated. Fortunately “Server Audit Status” and “Extended Event Sessions” were both false-positives. The reason that they were false positives is that they do not contain the source values. In both cases, the tables that hold the source values — sys.server_audits and sys.server_event_sessions, respectively — do use sysname and NVARCHAR(128), respectively.

That leaves sys.assembly_files and BEGIN TRAN / SAVE TRAN.

Assembly Files

According to the documentation for sys.assembly_files, the datatype is NVARCHAR(260) , which matches the max_length of 520 that the query above returned. But, the documentation for ALTER ASSEMBLY, which is how you add files to Assemblies, doesn’t say anything about either a max length or why it isn’t the standard 128. However, it does provide a clue in the Arguments section where it describes the [ ADD FILE FROM { client_file_specifier [ AS file_name]... clause:

client_file_specifier specifies the location from which to upload the file…If file_name is not specified, the file_name part of client_file_specifier is used as file_name.

Meaning, if you add a file from a file path instead of providing a VARBINARY literal (i.e. hex bytes), and you do not provide a custom AS file_name, then the full path to the file being loaded will be used as the “name” (this also happens when creating an Assembly from a file path). So then why the value of 260? That comes from the default maximum length of a path, which is 260 (see Naming Files, Paths, and Namespaces: Maximum Path Length Limitation for details).

Now we just need to verify that info. We need an actual Assembly to add a file to since the built-in CLR types Assembly, Microsoft.SqlServer.Types, does not allow for files to be added to it. So, I used the main SQL# Assembly from my SQL# SQLCLR library:

DECLARE @NumLetters INT = 259;
DECLARE @SQL16 NVARCHAR(MAX);
SET @SQL16 = N'
ALTER ASSEMBLY [SQL#] ADD FILE FROM 0x0102 AS N'''
    + REPLICATE(N'p', @NumLetters) + N''';
ALTER ASSEMBLY [SQL#] DROP FILE N'''
    + REPLICATE(N'p', @NumLetters) + N''';
';
PRINT @SQL16;
EXEC (@SQL16);
/* -- Using a @NumLetters value of 260 gets the following error:
Msg 6236, Level 16, State 2, Line XXXXX
ALTER ASSEMBLY failed because filename 'ppp...ppp' is too long.
*/

It seems that 259 is the max length that can be used. Why? Going back to that “Naming Files…” documentation (linked above), it states:

For example, the maximum path on drive D is “D:\some 256-character path string<NUL>” where “<NUL>” represents the invisible terminating null character for the current system codepage. (The characters < > are used here for visual clarity and cannot be part of a valid path string.)

Meaning: the 260th character is the “<NUL>” string terminator (i.e. NCHAR(0) ).

Transaction Names

Some quick tests to verify that following the rules does work:

BEGIN TRAN _4; SAVE TRAN [a]; COMMIT;
BEGIN TRAN h$; SAVE TRAN [a]; COMMIT;
BEGIN TRAN #4; SAVE TRAN [a]; COMMIT;
BEGIN TRAN h@4; SAVE TRAN [a]; COMMIT;

And a couple of tests to verify that not following the rules causes an error:

BEGIN TRAN 4;
BEGIN TRAN $h;

Both of the queries above get the following error:

Msg 102, Level 15, State 1, Line XXXXX
Incorrect syntax near …

The documentation for BEGIN TRANSACTION and SAVE TRANSACTION both state that:

  1. The max size of a transaction name is 32 characters
  2. Literals / constants (i.e. regular identifiers) over 32 characters are not allowed
  3. Variables containing over 32 characters are silently truncated
  4. transaction_name is always case sensitive, even when the instance of SQL Server is not case sensitive.”

First we test the regular identifier:

BEGIN TRAN _23456789a123456789b123456789c12; SAVE TRAN [a]; COMMIT;
-- Success

BEGIN TRAN _23456789a123456789b123456789c123; SAVE TRAN [a]; COMMIT;
/*
Msg 103, Level 15, State 2, Line XXXXX
The identifier that starts with '_23456789a123456789b123456789c123' is
    too long. Maximum length is 32.
*/

Next we test a variable containing the transaction name:

DECLARE @TranName NVARCHAR(50);
SET @TranName = N'123456789a123456789b123456789c123456789d';

BEGIN TRAN @TranName; SAVE TRAN [a]; ROLLBACK TRAN @TranName;

SELECT [Transaction Name], *
FROM   sys.fn_dblog(NULL, NULL)
WHERE  [Transaction Name] LIKE N'1234%';
-- 123456789a123456789b123456789c12

Finally we test the case-sensitivity of the name by executing on a case-insensitive instance (I executed on an instance of SQL Server Express LocalDB as they are always case-insensitive):

BEGIN TRAN tr; SAVE TRAN a; ROLLBACK TRAN tr;
-- Success!

SELECT 1 WHERE 'a' = 'A';
-- 1

BEGIN TRAN tr; SAVE TRAN a; ROLLBACK TRAN TR;
/*
Msg 6401, Level 16, State 1, Line XXXXX
Cannot roll back TR. No transaction or savepoint of that name was found.
*/

But wait. Are transaction names really being handled as case-sensitive, or is this really a binary (i.e. “ordinal” in .NET) comparison? Those two types of comparisons are not the same:

DECLARE @Composed NVARCHAR(10) = NCHAR(220),
        @Decomposed NVARCHAR(10) = N'U' + NCHAR(0x0308);
SELECT @Composed, @Decomposed;
-- Ü    Ü

SELECT 1
WHERE  @Composed = @Decomposed COLLATE Latin1_General_100_CS_AS;
-- 1

BEGIN TRAN @Composed; SAVE TRAN a; ROLLBACK TRAN @Decomposed;
/*
Msg 6401, Level 16, State 1, Line XXXXX
Cannot roll back Ü. No transaction or savepoint of that name was found.
*/

Ah, so they are actually using a binary / ordinal comparison, and not a case-sensitive comparison.

Database Names

The documentation for CREATE DATABASE states:

database_name can be a maximum of 128 characters, unless a logical name is not specified for the log file. If a logical log file name is not specified, SQL Server generates the logical_file_name and the os_file_name for the log by appending a suffix to database_name. This limits database_name to 123 characters so that the generated logical file name is no more than 128 characters.

Let’s test:

DECLARE @NumLetters INT = 124;
DECLARE @SQL17 NVARCHAR(MAX);
SET @SQL17 = N'
CREATE DATABASE ' + REPLICATE(N'q', @NumLetters) + N';

   SELECT [file_id], [type_desc], [name], [physical_name]
   FROM ' + REPLICATE(N'q', @NumLetters) + N'.sys.database_files;

DROP DATABASE ' + REPLICATE(N'q', @NumLetters) + N';
';
PRINT @SQL17;
EXEC (@SQL17);

/* -- Using a @NumLetters value of 125 gets the following error:
Msg 407, Level 16, State 1, Line XXXXX
Internal error. The string routine in file
      sql\ntdbms\storeng\dfs\manager\filemgr.cpp,
      line 748 failed with HRESULT 0x8007007a.
*/

That is not exactly a well-handled error ;-). Still, it appears that the true max name size when not specifying the logical name is actually 124 characters, not 123. This makes sense since the system generated logical name will be the database name plus _log, which is 4 characters, so the total is 128 characters.

Local Temporary Tables and Stored Procedures

First we can test the minimum length for Tables:

CREATE TABLE # ([Col1] INT);

SELECT * FROM #;

SELECT * FROM tempdb.sys.objects WHERE [name] LIKE N'#[_]%';
-- #__...__00000000001F

DROP TABLE #;

And then for Stored Procedures:

GO
CREATE PROCEDURE #
AS
SET NOCOUNT ON;
GO

EXEC #;

SELECT * FROM tempdb.sys.objects WHERE [name] LIKE N'#[_]%';
-- #__...___0000187B

DROP PROCEDURE #;

Next we test for maximum length and Collation scope:

SELECT 1 WHERE 'a' = 'A';
-- 1

DECLARE @NumLetters INT = 115;
DECLARE @SQL1 NVARCHAR(MAX);
SET @SQL1 = N'CREATE TABLE #A' + REPLICATE(N'a', (@NumLetters - 1))
          + N' ([Col1] INT);
SELECT * FROM #' + REPLICATE(N'a', @NumLetters) + N';
';
PRINT @SQL1;
EXEC (@SQL1);

/* -- Using a @NumLetters value of 116 gets the following error:
Msg 193, Level 15, State 1, Line XXXXX
The object or column name starting with '#Aaa...aaa' is too long.
    The maximum length is 116 characters.
*/

In a case-insensitive Database on an instance using a binary Collation, the test above returns a 1 from the “SELECT 1“, but the “SELECT * FROM #...” fails since the casing of the first letter is different between the CREATE and SELECT.

In a Database using a binary Collation on a case-insensitive instance, the test above returns nothing from the “SELECT 1“, but the “SELECT * FROM #...” succeeds, even with the casing of the first letter being different between the CREATE and SELECT.

All three of these tests prove that the first character of the name is the number sign ( # ). This is why we could only add 115 characters to the name instead of 116.

etc…

(I will update this post later with more test queries for global temporary objects, variables, etc)

thingamajig-and-whatchamacallit_official_600x310

The Rules (According to SQL Server)

The actual (and complete) set of rules are:

  • There are two types of identifiers: Regular and Delimited
    • Regular Identifiers
      • Do not need to be enclosed by delimiters
      • First character:
        • Any of the 45,692 BMP characters classified in Unicode 3.2 as “ID_Start” (i.e. Identifier Start)
        • underscore ( _ , a.k.a. low line) or fullwidth low line ( _ )
        • at sign ( @ )
          • Always denotes a local variable or parameter and cannot be used as the name of any other type of object.
        • number sign ( # )
          • Schema-bound objects
            • A single number sign ( # ) denotes a local temporary table or procedure.
            • Double number signs ( ## ) denote a global temporary table or procedure.
            • A number sign is invalid for all other types of schema-bound objects; it will cause an error.
          • Non-Schema-bound objects
            • A number sign is a valid first character
            • While it does not cause an error, the official recommendation is to not use a number sign as a first character in these cases.
      • Subsequent characters:
        • Any of the 46,514 BMP characters classified in Unicode 3.2 as “ID_Continue” (i.e. Identifier Continue)
        • at sign ( @ ), number sign ( # ), or dollar sign ( $ )
        • Any of the 26 BMP characters classified in Unicode 3.2 as “General_Category = Cf” (i.e. format control characters)
      • No supplementary characters
      • For the complete list of valid characters, please see:
        Completely Complete List of Valid T-SQL Identifier Characters (please give the page a moment to load; it’s 3.5 MB and almost 47k lines)
      • Cannot be a T-SQL reserved word
    • Delimited Identifiers
      • Do need to be enclosed by delimiters
      • Delimiters are either double-quotes ( " ) or square brackets ( [ ] )
      • All characters (including Supplementary Characters) are valid, except:
        • U+0000 (i.e. NCHAR(0) )
        • U+FFFF (i.e. NCHAR(65535) )
      • The following characters need to be escaped:
        • ] needs to be escaped as ]] only when delimiting with square brackets ( [ ] )
        • " needs to be escaped as "" only when delimiting with double-quotes ( " )
  • Collation
    • Instance-level items (Databases, Logins, etc) governed by Instance default Collation:
      1. variable names (including parameters and table variables)
      2. cursor names
      3. GOTO labels
      4. Extended Event sessions
      5. Temporary objects (tables and stored procedures, both local and global)
      6. others…
    • Database-level items (Users, Schemas, Tables, etc) governed by Database default Collation
      1. table aliases
      2. others…
    • Transaction names: always treated as if governed by a binary Collation, regardless of instance or database Collation.
  • Length
    • Most entities (schema-bound objects, indexes, columns, variables, GOTO labels, Extended Event sessions, etc):
      1 – 128 characters in most cases
    • Exceptions:
      • Local temporary objects (tables and stored procedures): 1 – 116 characters
        • # is the first character, making the effective range: 0 – 115 characters
        • While it is possible to create a local temporary table or stored procedure with a name consisting of a single number sign, it is a bad practice as it reduces the readability and maintainability of the code
      • Global temporary objects (tables and stored procedures): 1 – 128 characters
        • ## are the first two characters, making the effective range: 0 – 126 characters
        • While it is possible to create a global temporary table or stored procedure with a name consisting of only two number signs, it is a bad practice as it reduces the readability and maintainability of the code
      • Variables, Parameters, and Table Variables: 1 – 128 characters
        • @ is the first character, making the effective range: 0 – 127 characters
        • While it is possible to create a variable, parameter, or table variable with a name consisting of a single at sign, it is a bad practice as it reduces the readability and maintainability of the code
      • Database names:
        • If the LOG file is given a logical name, then the range is: 1 – 128 characters
        • If the LOG file is not given a logical name, then the range is: 1 – 124 characters
      • Transaction names: 1 – 32 characters
        • string constant / literal over 32 characters will error
        • variable containing over over 32 characters will silently truncate (no error)
      • Assembly File names: 1 – 259 characters
  • Identifiers that cannot be delimited (must follow rules of [Regular] identifiers)
    • Variable, Table Variable, and Parameter names
    • GOTO labels
  • Non-Obvious Impact of Collation
    • The 26 “format control” characters are valid as “subsequent” characters but are not supposed to allow for making two names equate if they are otherwise the same. However, when using a binary collation, the format control characters do not equate to each other.
    • Be careful when using Supplementary Characters (i.e. any character with a Code Point value above 0xFFFF / 65,535) for the following reasons:
      1. The documentation states that they are “not supported for use in metadata, such as in names of database objects”. Of course, this statement could very well be obsolete since they do “work” to varying degrees (see next two points). It could be that “not supported” is just a nice way of Microsoft saying that using them for metadata is far too likely to result in confusion, and they don’t want to waste time supporting it.
      2. They don’t always display correctly. Displaying of characters is controlled by fonts, and possibly the underlying OS. I have not been able to get Supplementary Characters to display correctly in SQL Server 2008 R2 on Windows XP, yet the same character does display correctly in SQL Server 2008 R2 running on Windows Server 2012. So the issue is not with SQL Server 2008 R2.
      3. Supplementary Characters were not given any sort weights in the older Collations: SQL_ Collations and non-versioned Windows Collations (i.e. no version number in the name). This means that all Supplementary Characters are treated as “hidden” when it comes to sorting and comparison. Any number of them will equate to any number of them (because they do not factor into the generation of the sort key). Given that there are also 21,230 BMP characters in the SQL_ and non-versioned Windows Collations that do not have sort weights, this just increases the chances for confusion and difficult-to-debug scenarios. Binary Collations are not a problem since they do not use sort weights.

      The only Collations that will treat different supplementary characters as existing and being distinct are binary collations (regardless of BIN vs BIN2 or SQL Server vs Windows Collation), and any Windows Collation with a version of 90 or newer. For example, neither SQL_Latin1_General_CP1_CI_AS nor Latin1_General_CI_AS assign any weights to any supplementary characters, while Macedonian_FYROM_90_CI_AS and Latin1_General_100_CI_AS do. This behavior has nothing to do with the Collation being Supplementary Character-aware (i.e. name either ends with _SC, or version is 140 or newer) or not. SQL_Latin1_General_CP850_BIN does not assign any weights, but it still treats each supplementary character as being a distinct character.

3 thoughts on “What’s in a Name?: Inside the Wacky World of T-SQL Identifiers”

    1. Ken, thanks for pointing that out. I have it fixed here but it will take some time to filter down to SQL Server Central. What is up there now isn’t even the correct version of the post, but it seems that I had a bad character in my newest post that doesn’t break anything here but somehow broke whatever sucks the content in at SSC. I think I have it fixed so hopefully things will update there soon. Thanks again!

      Like

Thoughts? Questions? Comments? Suggestions? Words of praise?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s