Digging into PostgreSQL: A guide for Advanced Grouping

Eleni Markou

One of the most basic clauses in any SQL dialect is the GROUP BY. Anyone who interacts frequently with relational databases has for sure used this clause thousands of times in order to compute all sort of aggregations, including sums, averages or counts, over his data in order to incorporate the results in some report. But apart from these typical grouping operations, the need for even more complex grouping arises quite frequently.

Luckily for those using PostgreSQL, there are already implemented functions that allow the computation of aggregated metrics by subgroups, hierarchy, and combinations. Seeing the sets one next to the other, how they fit together progressively, and the data combinations they create can give you a more profound comprehension of your data.

In Postgres, the functions available for advanced grouping include grouping sets, rollup, and cube. To begin with, the concept of grouping sets allows you to group separately the data selected by the FROM and WHERE clauses by each different set, and compute various aggregations over each one of them. To clarify this idea, imagine that we have available a table where all sales of an online clothing store are recorded. The table includes information regarding the product name and the size of the products being sold each time.

Product_NameSize
Product_1L
Product_2M
Product_2L
Product_1M

The conventional GROUP BY would allow us to compute the total sales either by product name or by product size with the following queries.

SELECT Product_Name, count(*)
FROM  sales
GROUP BY Product_Name

Results:

Product_NameCount
Product_12
Product_22

Or

SELECT Size, count(*)
FROM sales
GROUP BY Size

Results:

SizeCount
M2
L2

 

In order to compute these aggregations with a traditional GROUP BY, you need multiple statements, where you perform a UNION ALL between the individual result sets.

However, this approach has multiple disadvantages. Apart from the fact that the query is going to be huge, the involved tables have to be accessed multiple times – one for every statement.

Luckily, PostgreSQL offers its users some advanced grouping functionalities, presented in the current post,  that can considerably simplify the SQL code and improve its performance.

GROUPING SETS

The GROUPING SETS option enables us to construct sets for each field specified in the GROUP BY clause of aggregation queries. This way, rather than the combination of the field values, we’ll get subtotals for each of the field values and a grand total of all the matching rows.

In our toy example the following query will produce the results presented in the table:

SELECT Product_Name, Size, count(*)
FROM sales
GROUP BY GROUPING SETS ((Product_Name), (Size), ())

Results:

Product_NameSizeCount
Product_12
Product_22
M2
L2
4

 

In the above syntax, each sublist of grouping sets may specify zero or more columns and is interpreted the same way as though it were directly in the GROUP BY. The results produced include the count of sales for each of the items and for each of the available sizes by which sales are categorized.  We also have a grand total of all sales recorded in the store.

ROLLUP

Rollup is another option available in Postgres that allows users to create hierarchical rollups starting with the primary group and followed by the others in hierarchical order. This hierarchy is implied by the order in which the fields appear in the GROUP BY clause.

For example, a clause of the form ROLLUP(g1, g2, g3) is equivalent to GROUPING SETS((g1, g2, g3), (g1, g2), (g1), ()).

In our toy example presented above, we have only two levels of hierarchy, i.e. product names and sizes, and thus the query will be pretty straightforward.

SELECT Product_Name, Size, count(*)
FROM sales
GROUP BY ROLLUP(Product_Name, Size)

Results:

Product_NameSizeCount
Product_1M1
Product_1L1
Product_12
Product_2M1
Product_2L1
Product_22
4

Now we have sales summaries for each size within each product, a sales subtotal at the product level and the grand total with all sales included.

CUBE

Next, Cube is another clause available in Postgres that allows users to get all combinations. Basically,  CUBE will give a combined report of sets and of the hierarchical rollup. Consequently, a clause of the form CUBE(g1, g2, g3) is equivalent to GROUPING SETS((g1, g2, g3), (g1, g2), (g1, g3),(g1), (g2, g3), (g2), (g3), ()).

Note that the individual elements of a CUBE clause may be either individual expressions or sublists of elements in parentheses. In the latter cases, the sublists are treated as single units for the purposes of generating the individual grouping sets.

Returning to our toy example the cube clause in the following query would produce the result in the table:

SELECT Product_Name, Size, Count(*)
FROM sales
GROUP BY CUBE(Product_Name, Size)

Results:

Product_NameSizeCount
P1M1
P1L1
P12
P2M1
P2L1
P22
4
M2
L2

Wrap Up

In this article, we’ve looked at the advanced grouping abilities PostgreSQL offers its users allowing alternative reports of aggregated data. These new features easily let them report the results of sets and hierarchical rollups along with various aggregated metrics computed on all possible combinations of data. As a plus, all subtotals and grand totals are also included by default! For more information, you can always check the official PostgreSQL documentation as well, which can be found here.

By taking advantage of these capabilities, the SQL code required for the creation of business reports is significantly simplified, made much more readable and easily maintained. Furthermore, the improvement in terms of performance is also considerable, as by grouping sets the base data need to be accessed only once.