Whoever who has ever written code in any language would agree that date and time data types are the most frustrating of all. Of course SQL is not an exception. Each SQL dialect has its own pitfalls that one has to be aware of, in order to produce meaningful and trustworthy results.
Microsoft SQL Server specifically, has many data types that combine both date and time representations making things even more complex.
Knowing a few technical details about how each type is stored, what limitations does it have and what are its differences from other similar data types can be a real saver. Otherwise, you may end up with horribly wrong reports presented to your manager or -even worse- to your clients. So, with a sense of humor, for the sake of your future career I would strongly suggest you to stop seeing MS SQL Server datetime functions as black boxes but instead spend some time uncovering what is going on under the hood.
This is part #4 of our helpful SQL short guides:
- Table and Data Partitioning
- SQL type casting
- SQL pattern matching
- Organizing SQL Queries
- Handling Datatypes for IP addresses in SQL
- SQL DATETIME explained (this part)
Data Types Internal Representation
The first concept that requires clarification has to do with how SQL Server stores the different data types. To begin with, all date and time related data types are not stored in any human readable format. In contrast to what some may believe, the data types of this kind are stored either as a set of floats or as a set of integers. The exact type being stored has to do with the the actual datetime data type.
The most widely used one is the DATETIME (look here) as it has been present since the earlier versions of MS SQL Server. When choosing the DATETIME data type the timestamp is internally represented by two integer: The first one represents the date and the second one the time. Translating in bytes, the DATETIME type takes up 8 bytes, 4 bytes for storing the date and 4 for storing the corresponding time.
As a baseline for this representation has been set the 1/1/1900 00:00:00. Dates prior to the baseline are as represented by negative numbers.
So, if one wants to check the internal representation of a value of type DATETIME can use the varbinary() function which converts the initial data into a hex number. By converting this hex into integer can be verified that dates prior to the baseline date are indeed negative numbers while date after this date are positive.
Regarding the time precision of the DATETIME type, according to the documentation time is set to 0 at midnight and increments by 1 every 0.003333… seconds.
Apart from DATETIME, there are some other data types including DATETIME2, TIME and DATE, alternatives to DATETIME, that are internally represented a bit differently. In these type the representation precision is set to 7 bytes. The first out of them is used to store precision, the last three to store date and those in between to store time. The larger the specified precision, the larger the number of bytes devoted to this.
Language Dependent Data Types
Another important topic in MS SQL Server has to do with the fact that the MS SQL Server settings are not universal. Depending on the language that you have set during configuration, the format of the DATETIME data type can be different. Indicatively is the difference between the representations when the language set is British and US.
Suppose that we have a date formatted like this: ‘DD/MM/YYYY HH:MM:SS’. While the SQL Server will successfully interpret this input as datetime if the language is set to British, it will probably throw an exception for English_US setup. Wondering why ? It is because in the second case, SQL Server will try to interpret the DD as Month. So, even in cases where the day is by coincidence at most 12, thus interpretable as months too, you will end up producing nonsense.
In order to avoid any confusion, there are two options to choose from. The first has to do with choosing a data type that is type and language independent. The two formats that satisfy this condition are the one specified by the ISO8601 (2017-11-13T14:00:14Z) and the YYYYMMDD. Another option, to me the most preferable, is moving from DATETIME to DATETIME2 (look here) which is not affected by language configuration.
Taking Timezone into consideration
And what is going on in cases where the timezone of the events stored in a database is not the same? 2017-11-13T14:00 is a completely different point in time depending on where you are and this can make a huge difference in all sorts of applications, especially in those that have to do monitoring of critical processes, such as a server’s uptime.
At this point it becomes evident why knowing the internal representation of each data type is useful. Based on what we have mentioned before regarding the internal representation of DATETIME, DATETIME2, DATE and TIME, it becomes evident that none of them is actually sufficient for this work since they do not specify any timezone leading to ambiguous types. The is no way to distinguish if timestamps are measured based on the local machine’s timezone, on UTC or according to the server time.
The best way to avoid wrong aggregations of data measured in different timezone is working with the DATETIMEOFFSET format. The representation of this data type, apart from what we mentioned before, also includes 2 extra bytes for storing the timezone offset from the UTC time.
In case you want to compare a DATETIMEOFFSET (look here) column with the current timestamp you can use the SYSDATETIMEOFFSET function which returns current datetime with offset from UTC or the DATENAME if only the offset is needed.
In case you want to change the datetimeoffset Microsoft SQL Server have you covered too. The function that implements exactly this functionality is the SWITCHOFFSET() which takes as input data of type DATETIMEOFFSET along with the desired timezone and performs the conversion.
Datetime values in search conditions
A sneaky and insidious pitfall that will cause you a lot of trouble if not handled with caution. Assume you have defined a column of type DATETIME, DATETIME2 or DATE and have populated it with data like 2017-11-13 14:00:14.234 and 2017-11-13 00:00:00.000.
Now let’s assume that later on you want to retrieve all the records that refer to that specific day, 2017-11-13. A first approach would probably be to write something like this:
SELECT date FROM table WHERE date = '2017-11-13';
In this case, what MS SQL Server is going to return is only the record with date = 2017-11-13 00:00:00.000. Long story short, what is actually going on is that when you are trying to retrieve data of a data type that stores both date and time, without specifying the time, SQL Server assumes that you mean midnight and so “fills up” the given date with the midnight timestamp.
Datetime Data rounding
The date rounding operations that the SQL Server performs for date/time data types can be a complete headache for someone who wants to perform analytic processes that require a high degree of precision. Although some precision restrictions do exist inevitably as all data type have a finite precision, there are also cases where rounding mechanisms are triggered out of the blue causing weird behavior.
So assume that you want to store the datetimes presented in the following tables as DATETIME2:
|Original datetime||Stored as DATETIME2 datetime|
|2017-11-13 23:32:12.1234567||2017-11-13 23:32:12.1234567|
|2017-11-13 23:32:12.123456789||2017-11-13 23:32:12.1234568|
|2017-11-13 23:59:59.999999999||2017-11-14 00:00:00.0000000|
Obviously, SQL Server has rounded the initial dates by paring the nine digits down to seven. This behavior is completely predictable as the DATETIME2 precision is by default 7 digits. As a matter of fact, although this may cause some misunderstanding at first like in the third case where we moved to the next day, the behavior can be predicted and thus avoided.
Yet, there are other cases where things get more blur. For example assume we have a similar example with the one before. This time we are going to use the DATETIME data type and the input date will have 3 digits precision (as many as DATETIME supports) and observe the outcomes:
|Original Datetime||Stored as DATETIME datetime|
|2017-11-13 23:59:59.996||2017-11-13 23:59:59.997|
|2017-11-13 23:59:59.998||2017-11-13 23:59:59.997|
|2017-11-13 23:59:59.999||2017-11-14 00:00:00.000|
The reason why this is happening is that DATETIME stores data in increments of .000, .003 and .007. Thus, .996 and .998 get rounded to .997 and .999 to .000 having jumped to the next second too.
It seems that after all no behavior is unexplainable when it comes to databases. In contrast, everything works in a pretty deterministic and predictable way if only you acquire some knowledge regarding some “internal stuff” of your system. By understanding how things are getting stored, how certain functions are implemented and what assumptions are made during architectural design you can avoid unexpected results and save yourself from a lot of frustration and effort.