Sql Quantum Leap :: Completely Complete List of Rules for T-SQL Identifiers

(last updated: 2020-01-25 @ 17:45 EST / 2020-01-25 @ 22:45 UTC )

The actual (and complete) set of rules for T-SQL identifiers are:

There are two types of identifiers: Regular and Delimited
- REGULAR IDENTIFIERS
  - Do not need to be enclosed by delimiters
  - First character:
    - Any of the 45,692 BMP characters classified in Unicode 3.2 as “ID_Start” (i.e. Identifier Start)
    - underscore ( _ , a.k.a. low line) or fullwidth low line ( ＿ / U+FF3F )
    - at sign ( @ )
      - Always denotes a local variable or parameter and cannot be used as the name of any other type of object.
    - number sign ( # )
      - Schema-bound objects
        
        A single number sign ( # ) denotes a local temporary table or procedure.
        
        Double number signs ( ## ) denote a global temporary table or procedure.
        
        A number sign is invalid for all other types of schema-bound objects; it will cause an error.
      - Non-Schema-bound objects
        
        A number sign is a valid first character
        
        While it does not cause an error, the official recommendation is to not use a number sign as a first character in these cases.
  - Subsequent characters:
    - Any of the 46,514 BMP characters classified in Unicode 3.2 as “ID_Continue” (i.e. Identifier Continue)
    - at sign ( @ ), number sign ( # ), or dollar sign ( $ )
    - Any of the 26 BMP characters classified in Unicode 3.2 as “General_Category = Cf” (i.e. format control characters)
  - No supplementary characters
  - For the complete list of valid characters, please see:
    Completely Complete List of Valid T-SQL Identifier Characters (please give the page a moment to load; it’s 3.5 MB and almost 47k lines)
  - Cannot be a T-SQL reserved word. Technically, the following five words are exceptions which do not cause errors (at least not when used as Table names), but it is still best to treat them as if they did:
    - DISK
    - DUMP
    - LOAD
    - PRECISION
    - SECURITYAUDIT
- DELIMITED IDENTIFIERS
  - Do need to be enclosed by delimiters
  - Delimiters are either double-quotes ( " ) or square brackets ( [ ] )
  - All characters (including Supplementary Characters) are valid, except:
    - U+0000 (i.e. NCHAR(0) )
    - U+FFFF (i.e. NCHAR(65535) )
  - The following characters need to be escaped:
    - ] needs to be escaped as ]] only when delimiting with square brackets ( [ ] )
    - " needs to be escaped as "" only when delimiting with double-quotes ( " )
  - Effect on characters with special meaning:
    - at sign ( @ )
      - If delimited, then is not a variable and can be used for any object (e.g. [@ObjectName] )
      - @[VariableName] is an invalid construct; variables cannot be delimited.
    - number sign ( # ) and double number signs ( ## )
      - Behavior does not change when delimited (e.g. [#🙀] is a valid local temporary table, and [##🙀] is a valid global temporary table)
      - #[Anything] is an invalid construct
Collation
- Instance-level items (Databases, Logins, etc) governed by Instance default Collation (see “Contained Databases” directly below for exceptions):
  1. Variable names (including parameters and table variables)
  2. Cursor names
  3. GOTO labels
  4. Temporary objects (tables and stored procedures, both local and global)
  5. Extended Event sessions
  6. others…
- Database-level items (Users, Schemas, Tables, etc) governed by Database default Collation (see “Contained Databases” directly below for exceptions):
  1. Table aliases
  2. others…
- Contained Databases:
  Within a “Contained” Database (i.e. CONTAINMENT = PARTIAL ), the following items always use a Collation of Latin1_General_100_CI_AS_WS_KS_SC:
  1. Variable names (including parameters and table variables)
  2. Temporary objects (tables and stored procedures, both local and global)
  3. Cursor names
  4. GOTO labels
  5. Database-level items / metadata
- Transaction names: always treated as if governed by a binary Collation, regardless of instance or database Collation.
Length
- Most entities (schema-bound objects, indexes, columns, logins, GOTO labels, Extended Event sessions, etc):
  1 – 128 two-byte code units in most cases (BMP characters use 1 two-byte code unit; Supplementary Characters use 2 two-byte code units)
- Exceptions:
  - Local temporary objects (tables and stored procedures): 1 – 116 code units
    - # is the first character, making the effective range: 0 – 115 code units
    - While it is possible to create a local temporary table or stored procedure with a name consisting of a single number sign, it is a bad practice as it reduces the readability and maintainability of the code
  - Global temporary objects (tables and stored procedures): 1 – 128 code units
    - ## are the first two characters, making the effective range: 0 – 126 code units
    - While it is possible to create a global temporary table or stored procedure with a name consisting of only two number signs, it is a bad practice as it reduces the readability and maintainability of the code
  - Variables, Parameters, and Table Variables: 1 – 128 code units
    - @ is the first character, making the effective range: 0 – 127 code units
    - While it is possible to create a variable, parameter, or table variable with a name consisting of a single at sign, it is a bad practice as it reduces the readability and maintainability of the code
  - Database names:
    - If the LOG file is given a logical name, then the range is: 1 – 128 code units
    - If the LOG file is not given a logical name, then the range is: 1 – 124 code units
  - Transaction names: 1 – 32 code units
    - string constant / literal over 32 code units will error
    - variable containing over over 32 code units will silently truncate (no error)
  - Assembly File names: 1 – 259 code units
Identifiers that cannot be delimited (must follow rules of [Regular] identifiers)
- Variable, Table Variable, and Parameter names
- GOTO labels
Non-Obvious Impact of Collation
- The 26 “format control” characters are valid as “subsequent” characters and are not supposed to prevent making two names equate if they are otherwise the same. However, when using a binary collation, the format control characters do not equate to each other.
- Be careful when using Supplementary Characters (i.e. any character with a Code Point value above 0xFFFF / 65,535) for the following reasons:
  1. The documentation states that they are “not supported for use in metadata, such as in names of database objects”. Of course, this statement could very well be obsolete since they do “work” to varying degrees (see next two points). It could be that “not supported” is just a nice way of Microsoft saying that using them for metadata is far too likely to result in confusion (given the next two points), and they don’t want to waste time supporting it.
    Or, it could be that supplementary characters work on a basic level (and they do), but do not always work correctly with certain features and/or client libraries. Perhaps replication has problems with them, or maybe older clients such as ADO. I have not tested either replication or ADO, and am not saying that they don’t work. I am merely suggesting a possible reason for Microsoft to say that they are “unsupported” when they do technically work.
  2. They don’t always display correctly. Displaying of characters is controlled by fonts and the underlying OS. By default, Windows XP does not support Supplementary Characters while newer Windows OSs do.
    
    If supplementary characters are not displaying correctly, you probably need a font that has them mapped. Most fonts do not include supplementary characters. Regardless of your Windows or SQL Server versions, get a font that has at least some of the supplementary characters mapped. The “Code2003” font by James Kass, available at http://www.fontspace.com/st-gigafont-typefaces/code2003 , seems to be the best option. Install by either right-clicking on the TTF file and selecting “Install”, or copy-and-paste the TTF file into the C:\Windows\Fonts folder.
    
    For Windows XP, the following additional steps are needed, but will definitely allow supplementary characters to display correctly, even in SQL Server 2005 / SSMS:
    1. Edit the Windows Registry to add the following ( CAUTION: Edit the Registry at your own risk ):
      - KEY = HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\LanguagePack
      - VALUE = SURROGATE (DWORD (32-bit) Value / REG_DWORD)
      - DATA = 2
    2. Edit the Windows Registry to add the following ( CAUTION: Edit the Registry at your own risk ):
      - KEY = HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\LanguagePack\SurrogateFallback
      - VALUE = Plane1 (String Value / REG_SZ)
      - DATA = Code2003 { or the name of whatever font you are using }
    3. Reboot the computer
  3. Supplementary Characters were not given any sort weights in the older Collations: SQL_ Collations and non-versioned Windows Collations (i.e. no version number in the name). This means that all Supplementary Characters are treated as “hidden” when it comes to sorting and comparison. Any number of them will equate to any number of them (because they do not factor into the generation of the sort key). Given that there are also 21,230 BMP characters in the SQL_ and non-versioned Windows Collations that do not have sort weights, this just increases the chances for confusion and difficult-to-debug scenarios. Binary Collations are not a problem since they do not use sort weights.
  The only Collations that will treat different supplementary characters as existing and being distinct are binary collations (regardless of BIN vs BIN2 or SQL Server vs Windows Collation), and any Windows Collation with a version of 90 or newer. For example, neither SQL_Latin1_General_CP1_CI_AS nor Latin1_General_CI_AS assign any weights to any supplementary characters, while Macedonian_FYROM_90_CI_AS and Latin1_General_100_CI_AS do. This behavior has nothing to do with the Collation being Supplementary Character-aware (i.e. name either ends with _SC, or version is 140 or newer) or not. SQL_Latin1_General_CP850_BIN does not assign any weights, nor does it allow the built-in string functions to properly handle supplementary characters, but it still treats each supplementary character as being a distinct character.

( For an explanation of how I gathered the information contained in this list, please see the following post: What’s in a Name?: Inside the Wacky World of T-SQL Identifiers )

Share this awesome post:

Like this: