Workload Management and Query Queues
No matter how big your Amazon Redshift cluster is, it will always have a certain amount of resources available. As you start digging further and further into your data, more queries will have to be executed.
Queries that in most cases will have to be scheduled to run at specific time intervals to generate new reports and their execution time, together with the resources needed will vary greatly. As soon as you start using your Amazon Redshift cluster in a production environment, you will experience a situation where an ad-hoc query that takes too long to complete might block some important queries that have to generate a report.
So how do you deal with these situations? Fortunately, Amazon Redshift is equipped with a mechanism that allows us to manage the resources of a cluster by creating priorities queues with specific resources attached to them and then attach queries to these queues. In this way, it is possible to prioritize the execution of our queries taking into consideration their importance again the available resources that we have.
Introducing Amazon Redshift WorkLoad Management
Which is the mechanism that Redshift offers for managing the queries that you have to execute on your cluster against the available resources? A few notable things about Redshift Workload Management,
- The principal mechanism that allows the management of resources is the definition of priorities queues for the execution of queries.
- By default, Amazon Redshift defines two queues
- One Superuser queue, which is reserved only for the superuser role and it cannot be configured.
- One default user queue, which by default is configured to run up to five queries concurrently. Keep in mind that as you add new queues, the default queue must be the last queue in the WLM configuration.
- The maximum concurrency level across all your queues is 50 queries, so you cannot run more than 50 queries at the same time.
- You can create up to 8 queues that can run up to 50 concurrent queries on your cluster.
Using Amazon Redshift Workload Management
Every time a user executes a query on an Amazon Redshift cluster, the query is assigned to a query queue. In the default situation, the query will be assigned to the default user queue if it is executed by any user without the superuser role. Each queue contains some slots, and it has allocated a portion of the clusters’ memory which is divided into the available slots. So what things you can configure on a custom defined queue,
- Concurrency level, which is the number of queries that can run at the same time on a particular queue.
- User Groups, you can specify specific user groups to specific queues, in this way the queries of these users will always be routed to a specific queue.
- Query groups, this is the way that Redshift has to route specific queries to specific queues. When you define your query, you attach a specific Query Group which is nothing more than a label, and this query will end up in a specific queue for execution.
- Memory Percent to use, Probably one of the most important parameters for a queue. That is the main mechanism that allows us to control how the resources of our cluster are consumed by the queries we execute. With it, we can declare to Redshift the amount of memory that will be available to our queries that run in a specific queue.
- Timeout, inevitably some queries will keep running for a long time, this might block other important queries from running. For this reason, it is important to set up a specific time threshold after which the queries will be canceled.
Working with the Amazon Redshift Workload Management Configuration
All the above-mentioned parameters can be altered by the user. The best way to do it is by:
Before you start experimenting with the Workload Management configuration, you should understand how Redshift assigns queries to the available queues. The way that queries get assigned to queues is very well explained on this flowchart. Make sure that you understand well the rules as it will help you troubleshoot your workload as you start working with a large number of queries on your cluster.
The assignment of queries to specific queues happens either by setting a user group or a query group to your query. The assignment is done by using the SET command, and you can see some examples of queries here.
The Workload Management Configuration properties are divided into two types, the static and the dynamic. The main difference is that the dynamic properties do not require from your to reboot your cluster while the static ones do.
- Static Properties
- User Groups
- User Group wildcard
- Query Groups
- Query group wildcard
- Dynamic Properties
- Percent of memory used
Based on the above, it is advised that you define your query and user groups at the beginning and make sure that you do not change them often. Restarting your cluster will add downtime to your services, and it is a time-consuming process.
Avoid using the superuser queue to execute regular queries; this queue has the highest priority compared to the rest of your queues, so it is better to use it mainly for debugging.
You might have a query that requires more memory than what is shared to the available queue slots. It is possible to assign more than one slots to a query to overcome this situation. That is something that can also happen temporarily, for example when you would like to execute a VACUUM command on your cluster.
Always keep an eye on the SVL_QUERY_SUMMARY view. If you see queries with the is_diskbased field set to true, you might have to revise your Workload strategy and assign more memory to it.
Some system tables will help you understand better how your workload management strategy works.
- STL_WLM_ERROR. Contains a log of WLM-related error events.
- STL_WLM_QUERY. Lists queries that are being tracked by WLM.
- STV_WLM_CLASSIFICATION_CONFIG. It shows the current classification rules for WLM.
- STV_WLM_QUERY_QUEUE_STATE. It records the current state of the query queues.
- STV_WLM_QUERY_STATE. Provides a snapshot of the present state of queries that are being tracked by WLM.
- STV_WLM_QUERY_TASK_STATE. It contains the current state of query tasks.
- STV_WLM_SERVICE_CLASS_CONFIG. It records the service class configurations for WLM.
- STV_WLM_SERVICE_CLASS_STATE. It contains the current state of the service classes.