The majority of the SQL-dialects admittedly offer to the users a rich set of data types. The most commonly used among them are probably the Integer and Float of different precision, the Boolean, the Timestamp and the Time data type. Yet, these basic data types are not sufficient for covering all the analytical needs that may arise. So, how can I handle an IP address in SQL?
There are also some other data types that require special handling, like IP addresses and ranges. A typical case of an analysis that utilizes IP addresses would probably be the filtering of the range of the IP addresses related to certain logged in actions or events in order to identify the companies from which the visitors to your website come from.
This is part #5 of our helpful SQL short guides:
- Table and Data Partitioning
- SQL type casting
- SQL pattern matching
- Organizing SQL Queries
- Handling Datatypes for IP addresses in SQL (this part)
- SQL DATETIME explained
- SQL Window Functions
- A guide for Advanced Grouping
In clickstream analysis, being able to group actions performed by the same user or multiple users within the same company can provide an analyst very useful information. Another common application has to do with the monitoring and administration of systems and services like digging into the logs of a web server trying to troubleshoot a specific problem.
Storing this information as varchar(), although common, can become troublesome very easily due to the large variety of possible representations on the one hand and the incompatibility with the string operators on the other.
Although they are essentially positive 32 or 128-bit integers, they are more commonly represented as blocks of 8 or 16-bit integers written in decimal and separated by dots in IPv4 or written in hexadecimal and separated by colons in IPv6. Especially IPv6 addresses can be stored with upper- or lower-case hexadecimal letters since they are case-insensitive. Also, leading zeros can be suppressed for abbreviation and so :0000: or :0: are used for a group of zeros.
For shortening more than one consecutive groups of zeros the :: annotation is used (but only once in an address) while when there are more than one sets of consecutive groups of zeros to shorten then only the leftmost set gets suppressed. It becomes evident that this lack of uniform representation complicates manipulation when stored as strings.
Furthermore, even in cases when a lot of manual work is being done towards adaptation to a fixed standard, the majority of the operators available for strings won’t work as desired on IP addresses resulting in misleading and unexpected outcomes.
Luckily for us, there are database systems, like PostgreSQL and BigQuery, that offer special data types dedicated to network operations. In some others, like MS SQL Server and Amazon Redshift, although these capabilities are not directly supported, there are still some hacks that one can do.
The offered data types include inet and cidr for storing host and network addresses respectively in either IPv4, IPv6 or both. Even when data is not initially stored as such, you can always cast string into inet or cidr, if needed. This can be done simply with the :: annotation as shown below:
The simplest operations that one can perform with these data types include subtraction, bitwise operations, and basic numeric comparisons, as presented in the following examples:
- ‘192.168.1.1’::inet + 256
- ‘192.168.2.1’::inet – 256
- ‘192.168.1.5’::inet < ‘192.168.1.6’ :: inet
For more advanced manipulation, containment operators and utility functions are also available. More specifically, there are operators to check if one value contains or is contained by another (>> and <<), or the same with equality (>>= and <<=), or containment going either way (&&). They consider only the network parts of the two addresses (ignoring any host part) and determine whether one network is identical to or a subnet of the other.
- ‘192.168.1.5’::inet << ‘192.168.1/24’::inet
- ‘192.168.1/24’::inet>> ‘192.168.1.5’::inet
- ‘192.168.1/24’::inet >>= ‘192.168.1/24’::inet
- ‘192.168.1/24’::inet <<= ‘192.168.1/24’::inet
- ‘192.168.1/24’::inet && ‘192.168.1.80/28’::inet
Available one can find the three major bitwise operations including AND(&), OR(|) and NOT(~).
- ~ ‘192.168.1.6’::inet
- ‘192.168.1.6’::inet & ‘0.0.0.255’::inet
- ‘192.168.1.6’::inet | ‘0.0.0.255’::inet
PostgreSQL also offers a set of a functions for a bunch of other useful operations such as the following:
- network(inet): extract network part of address
- broadcast(inet): broadcast address for network
- host(inet): extract IP address as text