One of the most basic clauses in any SQL dialect is the GROUP BY. Anyone who interacts frequently with relational databases has for sure used this clause thousands of times in order to compute all sort of aggregations, including sums, averages or counts, over his data in order to incorporate the results in some report. But apart from these typical grouping operations, the need for even more complex grouping arises quite frequently.
Luckily for those using PostgreSQL, there are already implemented functions that allow the computation of aggregated metrics by subgroups, hierarchy, and combinations. Seeing the sets one next to the other, how they fit together progressively, and the data combinations they create can give you a more profound comprehension of your data.
This is part #8 of our helpful SQL short guides:
- Table and Data Partitioning
- SQL type casting
- SQL pattern matching
- Organizing SQL Queries
- Handling Datatypes for IP addresses in SQL
- SQL DATETIME explained
- SQL Window Functions
- A guide for Advanced Grouping (this part)
In Postgres, the functions available for advanced grouping include grouping sets, rollup, and cube. To begin with, the concept of grouping sets allows you to group separately the data selected by the FROM and WHERE clauses by each different set, and compute various aggregations over each one of them. To clarify this idea, imagine that we have available a table where all sales of an online clothing store are recorded. The table includes information regarding the product name and the size of the products being sold each time.
The conventional GROUP BY would allow us to compute the total sales either by product name or by product size with the following queries.
SELECT Product_Name, count(*) FROM sales GROUP BY Product_Name
SELECT Size, count(*) FROM sales GROUP BY Size
In order to compute these aggregations with a traditional GROUP BY, you need multiple statements, where you perform a UNION ALL between the individual result sets.
However, this approach has multiple disadvantages. Apart from the fact that the query is going to be huge, the involved tables have to be accessed multiple times – one for every statement.
Luckily, PostgreSQL offers its users some advanced grouping functionalities, presented in the current post, that can considerably simplify the SQL code and improve its performance.
The GROUPING SETS option enables us to construct sets for each field specified in the GROUP BY clause of aggregation queries. This way, rather than the combination of the field values, we’ll get subtotals for each of the field values and a grand total of all the matching rows.
In our toy example the following query will produce the results presented in the table:
SELECT Product_Name, Size, count(*) FROM sales GROUP BY GROUPING SETS ((Product_Name), (Size), ())
In the above syntax, each sublist of grouping sets may specify zero or more columns and is interpreted the same way as though it were directly in the GROUP BY. The results produced include the count of sales for each of the items and for each of the available sizes by which sales are categorized. We also have a grand total of all sales recorded in the store.
Rollup is another option available in Postgres that allows users to create hierarchical rollups starting with the primary group and followed by the others in hierarchical order. This hierarchy is implied by the order in which the fields appear in the GROUP BY clause.
For example, a clause of the form
ROLLUP(g1, g2, g3) is equivalent to
GROUPING SETS((g1, g2, g3), (g1, g2), (g1), ()).
In our toy example presented above, we have only two levels of hierarchy, i.e. product names and sizes, and thus the query will be pretty straightforward.
SELECT Product_Name, Size, count(*) FROM sales GROUP BY ROLLUP(Product_Name, Size)
Now we have sales summaries for each size within each product, a sales subtotal at the product level and the grand total with all sales included.
Next, Cube is another clause available in Postgres that allows users to get all combinations. Basically, CUBE will give a combined report of sets and of the hierarchical rollup. Consequently, a clause of the form
CUBE(g1, g2, g3) is equivalent to
GROUPING SETS((g1, g2, g3), (g1, g2), (g1, g3),(g1), (g2, g3), (g2), (g3), ()).
Note that the individual elements of a CUBE clause may be either individual expressions or sublists of elements in parentheses. In the latter cases, the sublists are treated as single units for the purposes of generating the individual grouping sets.
Returning to our toy example the cube clause in the following query would produce the result in the table:
SELECT Product_Name, Size, Count(*) FROM sales GROUP BY CUBE(Product_Name, Size)
In this article, we’ve looked at the advanced grouping abilities PostgreSQL offers its users allowing alternative reports of aggregated data. These new features easily let them report the results of sets and hierarchical rollups along with various aggregated metrics computed on all possible combinations of data. As a plus, all subtotals and grand totals are also included by default! For more information, you can always check the official PostgreSQL documentation as well, which can be found here.
By taking advantage of these capabilities, the SQL code required for the creation of business reports is significantly simplified, made much more readable and easily maintained. Furthermore, the improvement in terms of performance is also considerable, as by grouping sets the base data need to be accessed only once.