Storing JSON in PostgreSQL: A must-know feature

Eleni Markou

What is JSON?

JSON (Javascript Object Notation) is an open-standard file format which organizes data into key-value pairs and array data types with respect to certain syntactic rules. It is widely adopted in web services as the primary format for transmitting data as it is much more lightweight than the older XML. Thus there is a quite common need to perform all kinds of analysis over data in JSON format.

Although there are many NoSQL database systems that work directly with the JSON format, those who are used to work with RDBMS would love to be able to manipulate their JSON formatted data stored in a PostgreSQL database using its familiar SQL dialect. Luckily PostgreSQL got you covered ever since release 9.2 and has made JSON manipulation in relational databases easier than ever before.

Why use JSON/ JSONB data types in a database?

Since PostgreSQL 9.2, which was released in 2012,  two different data types for storing JSON data are available: JSON and  JSONB. Although JSON formatted data can be stored as plain text as well, it is much more preferable to use these dedicated data types taking advantage of the fact that each stored value is valid according to the JSON rules since the utilization of dedicated JSON types unlocks numerous functions and operators useful for manipulating these data.

Furthermore, even in the case of entity-attribute-tables (EAV) the replacement with simpler schema designs with JSON/JSONB columns which can be indexed and queried will most probably lead to incredible performance improvement, even up until x1000 (here).

JSON vs JSONB

As mentioned before PostgreSQL supports two data types: JSON and JSONB. Although these data types are claimed to be almost identical, there are some differences regarding performance.

To begin with, JSON data type stores an exact copy of the input text and thus processing function has to parse it every time it is necessary. On the contrary, JSONB is stored as a decomposed binary format and thus no parsing is needed. However, the binary format of the JSONB results in small delays during input due to conversion overhead. Furthermore, JSONB data type supports indexing which can make a huge difference in the performance of the queries.

Overall, in most applications, it is much more preferable to store data using one of the special data types that PostgreSQL offers instead of plain text or following the EAV schema design. Among JSON and JSONB, in most applications, JSONB is the definite winner, unless there are quite specialized needs.

Functionalities

Basic Operators

So next we are going to focus on the actual functions and the operators that JSON/JSONB data types offer.  Utilizing these you can get array elements by index, objects by key or at a specified path. According to the official documentation of PostgreSQL, the available operators are summarized in the following table and are applicable on both JSON and JSONB objects.

 

Operator Right Operand Type Description Usage Example Example Results
-> int Get JSON array element (indexed from zero, negative integers count from the end) ‘[{“a”:”foo”},{“b”:”bar”},{“c”:”baz”}]’::json->2 {“c”:”baz”}
-> text Get JSON object field by key ‘{“a”: {“b”:”foo”}}’::json->’a’ {“b”:”foo”}
->> int Get JSON array element as text ‘[1,2,3]’::json->>2 3
->> text Get JSON object field as text ‘{“a”:1,”b”:2}’::json->>’b’ 2
#> text[] Get JSON object at specified path ‘{“a”: {“b”:{“c”: “foo”}}}’::json#>'{a,b}’ {“c”: “foo”}
#>> text[] Get JSON object at specified path as text ‘{“a”:[1,2,3],”b”:[4,5,6]}’::json#>>'{a,2}’ 3

In addition, there are a few more operators that are defined only for JSONB objects. At this point we are going to present how one can test containment and existence. These operators can also be used to effectively index jsonb data.

Containment

The ability to test containment is one of the biggest differences between JSON and JSONB. Although, when using JSON this operation is not directly available, the JSONB data type comes with a very handy operator suitable for this job, the containment operator (@>).

When we refer to containment testing, we are basically talking about cases where we want to check if a document, e.g. a set or an array, is contained within another. For example, let’s assume that we want to check whether the array [“Product_1”, “Product_2”] is contained in the array [“Product_1”, “Product_2”, “Product_3”, “Product_4”].

The general idea behind @> is that the contained object must match the containing object as to structure and data contents. However, duplicates will be considered only once as the order of the array elements is not taken into consideration when performing a containment match.

Existence

JSONB data type also offers existence operators which work similarly with the theme of containment.

When we refer to existence testing, we are basically talking about cases where we want to check whether a string appears as a key or as an array element of the top level of the JSONB object. Note that object values, as well as objects not belonging to the top level, are not considered.

For example, the following query will return True as expected since ‘Product_2’ is present in the given array:

SELECT '["Product_1", "Product_2", "Product_3"]'::jsonb ? 'bar';

However, this one will return False since existence must match on the top level:

SELECT '{"Products: {"Product_1": "Product_2"}}'::jsonb ? Product_2;

Query Cheat Sheet

That being said now we are going to create a short cheat sheet containing examples of the most common queries you may need.  While studying the following examples, assume the existence of a table named ‘Sales’ in a PostgreSQL database containing a JSONB column named ‘Products’.

Select products by value of a first level attribute

The first option involves that usage of the containment operator is the following:

SELECT * FROM Sales WHERE Products @> '{"product_name": "Product_1"}';

Alternatively, you can the ->> which is useful for retrieving JSON object field by key. Note that this query would be valid even in the case where the ‘Products’ column was of type JSON instead of JSONB.

SELECT * FROM Sales WHERE Products ->> ‘product_name’ =  ‘Product_1’ ;

Select products by value of a nested attribute

As previously there are two alternative options, one using the containment operator and one using ->>.

SELECT * FROM Sales WHERE Products->'manufacturer'->>'company_name' = "ABC";
SELECT *  FROM Sales  WHERE Products- @> '{"manufacturer":{"company_name": "ABC"}}';

Select products by value of an attribute in an array

Assuming an array containing available sizes for each product like “available_size”: [“S”, “M”, “L”], you query it by passing “available_size”: [“M”] to the WHERE clause as follows:

SELECT * FROM Sales WHERE Products @> '{"available_size": ["M"]}';

Select product items’ name

In this case, the ->> operator is directly used in the SELECT clause. This way you can retrieve all values present of an existing attribute.

SELECT Products ->> ‘product_name’ FROM Sales;

Indexing JSONB

When having millions for entries in your PostgreSQL database, a common approach to cut seek time is to build indexes. Luckily, JSONB offers this functionality and allows users to define GIN indexes to efficiently search for keys or key-value pairs within a large number of JSONB documents.

According to PostgreSQL documentation, two different generalized inverted indexes (GIN) “operator classes” exist offering different performance and flexibility trade-offs. The default one, named jsonb_ops, for the JSONB class, supports queries with top-level existence operators (?, ?& and ?|) operators and path/value-exists operator @>. On the contrary, the other GIN, named jsonb_path_ops, supports indexing the containment operator, i.e. @>, only. However, the jsonb_path_ops offers significantly improved performance compared to jsonb_ops as it is usually much smaller, the index scans are incredibly fast and yet complex nested containment queries remain indexable.

Conclusion

I guess so far you have already realized the great power that JSON and JSONB data types have to offer. The combination of all the available operators along with the extremely efficient indexing puts Postgres in a good position even in the field of document-oriented database systems. Well done Postgres! 

Work with analytics-ready data into your database with no configuration - no maintenance data pipelines. Raw data to insights in minutes.

Blendo is the easiest way to automate powerful data integrations.

Try Blendo free for 14 days. No credit card required.

Leave a Comment