Data Modeling and Table Design for Amazon Redshift

Bonus Material: FREE Amazon Redshift Guide for Data Analysts PDF

Introduction

Amazon Redshift Data Warehouse is a relational database system. It is based on PostgreSQL from which it inherits a number of design characteristics.

As a relational database system, Redshift organizes the information in tables. However, at the same time, as a high performance, distributed data warehouse system, Amazon Redshift is optimized to execute complex queries over huge amounts of data. For this reason, there are important differences between Amazon Redshift and other common database systems like PostgreSQL.

In this section, we will see the main differences that an OLAP system like Amazon Redshift has, compared to OLTP systems and most importantly how these differences affect the way we design a database.

We go through the important concept of Distribution Keys, which is essential for designing a performant database. Both the nature of the data and the queries that the analyst plans to execute, influence significantly the choice of Distribution Keys.

Sort Keys is the second important design concept that affects the tables of a database significantly, and it contributes to the performance of the queries to perform. In this section, we will see how to use the knowledge of a data analyst to influence the selection of proper Sort Keys for your tables.

At the end of this section, we see Amazon Redshift’s support for Materialized Views and how these views can improve query performance.

There’s also another parameter of a table on Amazon Redshift used as part of table design. The parameter is Column Compression Type, but we decided to cover this in the context of the Maintenance Section in our guide.

There’s also another parameter of a table on Amazon Redshift used as part of table design. The parameter is Column Compression Type, but we decided to cover this in the context of the Maintenance Section in our guide.

Click here to get our 90+ page PDF Amazon Redshift Guide and read about performance, tools and more!

Amazon Redshift and Table Constraints

Anyone familiar with SQL will be aware of the following constraints:

  • Uniqueness
  • Primary Key
  • Foreign Key
  • Not Null Constraint

The above are commonly used in database design and also exist on Amazon Redshift. The main difference is that except the Not Null Constraint, the rest are not enforced by Amazon Redshift.

They exist as metadata for informational purposes and help the query planner to optimize the query execution. So you should define them but keep in mind that the system will not enforce these constraints. That means you have to rely on your ETL infrastructure for their enforcement. Blendo as an ETL as a service platform, for example, can take care of these constraints when it loads data into an Amazon Redshift cluster.

It is crucial to remember that if you use any of these constraints, you have to be certain your data do not violate them. The query planner assumes that the constraints are respected, and you might end up with wrong results to your queries.

It is crucial to remember that if you use any of these constraints, you have to be certain your data do not violate them. The query planner assumes that the constraints are respected, and you might end up with wrong results to your queries.

load data into any data warehouse - Blendo