Table Views in Amazon Redshift
Creating views on your tables is a great way of organizing your analysis. Especially in OLAP systems like Amazon Redshift, the analysis usually performed is quite complex and using views is a great way of organizing long queries into smaller parts.
In this chapter, we explore the mechanism for table views of Amazon Redshift, its limitations and possible workarounds to obtain the benefits of materialized views.
Create Table Views on Amazon Redshift
Creating a view on Amazon Redshift is a straightforward process. You just need to use the CREATE VIEW command. The command takes as a parameter the query that you wish to use for the view and some other options:
- A Name which is the name of the view/table it is going to be created. If you prefix the name with a #, the view is temporary and exists only for the current session.
- column_name, you can optionally rename the columns to be created. By default, the names derive from the original table.
- OR REPLACE which tells Redshift what to do if a view with the same name already exists. It is replaced only if the query is different.
Limitations of Redshift Table Views
Views are coming with some restrictions on Amazon Redshift with the most notable being the following:
You cannot DELETE or UPDATE a Table View. It looks like a table from a SELECT query, but you cannot affect its data.
The Table View is not physically materialized, which means that there’s no actual table created in the database. Instead, the query executes every time you request access to the Table View.
Table Views reference the internal names of the tables and columns and not what is visible to you. That might lead to fragmentary views and queries.
The query planner is having issues optimizing queries over views.
Overcoming the limitations of Table Views on Amazon Redshift with Materialized Views
There is a way to overcome the above limitations of Amazon Redshift and its Table Views. The way to do it is by emulating Materialized Views on your cluster. To do that, you create actual tables using the queries that you would use for your views. That is possible on Amazon Redshift as you can invoke the CREATE TABLE with a query as a parameter which defines the table using other tables.
In this way, you are manually creating Materialized Views on your system, and you can overcome the limitations we mentioned earlier. Now it is possible to UPDATE and DELETE your views, and the query planner optimizes your queries.
The main issue with this approach is that you have to maintain the Materialized Views on yourself. So whenever a change occurs on one of your tables, you need to update your Materialized Views by dropping and Recreating the tables.
In an OLAP system like Amazon Redshift, data does not change that often and thus maintaining these tables as Materialized Views should not be a big problem. Nevertheless, as your tables increase in numbers and your queries get more complex, it would be best to rely on your ETL solution to handle these views.
As your tables increase in numbers and your queries get more complex, it would be best to rely on your ETL solution to handle these views.
ETL as a service Platforms like Blendo make it easier to manage a large number of Materialized Views as part of your ETL pipeline.