Redshift WorkLoad Management(WLM)

5 min readApr 8, 2022

Overview

Amazon Redshift is a fully managed MPP based petabyte-scale database which primarily supports data warehousing and BI analytics use-cases.

Redshift supports various types of data processing like batch processing on large scale data using ETL tools, adhoc querying for interactive analytics, Data as a service using RedshiftDataAPI feature and machine learning use-cases for redshift ML feature.

Redshift is also integrated very well with S3 and AWS Glue Catalog which enables for build datalakehouse type of architecture.

When it comes to performance redshift has MPP architecture, Columnar data storage,Data Compression at column level, result caching and in built query optimizer which makes efficient for running analytical workloads on large scale datasets.

One of major bottlenecks which I have faced using Redshift is concurrency.(Number of queries which can run in parallel).Concurrency management is very important when cluster supports various use-cases for large set of users.Redshift support maximum of 50 connection in parallel but the recommended connection count is 15.

For a redshift cluster which has users in the range of few thousands will run into resource contention leading to query timeouts and longer waits(execution times).Redshift uses Workload Management to load balance and allocate available resources to submitted queries.

In article we will focus on various type of WLM configurations and deep dive on each of them.

Redshift WorkLoad Management (WLM)

Redshift workload management (WLM) is used to define multiple query queues and to route queries to the appropriate queues at runtime.For example there can separate queues created for ETL, reporting and adhoc use cases .Each queue will be configured to have its own memory % ,query slot count and Query monitoring rules(QMR). Default queue will handle queries which doesn't under any of the user created queue.

Properties of each Queues

Priority
Concurrency scaling mode
User groups
Query groups
Query monitoring rules

Types of WLM management

Manual WLM
Automatic WLM

Automatic WLM

Amazon Redshift manages the resources effectively to handle user queries when automatic WLM is enabled.Amazon Redshift determines how many queries run concurrently and how much memory is allocated to each dispatched query. Memory and concurrency settings are configured as auto .

When automatic WLM is enabled user can define upto 8 queues. Each queue can be assigned a priority based on the business priority.The default priority of queries is set to Normal

Queue Priorities

HIGHEST
HIGH
NORMAL
LOW
LOWEST

CRITICAL priority is available only for superusers.Only one CRITICAL query can run at a time.

Queries are routed to a specific queue based on query groups and user groups at runtime.

Manual WLM

Manual WLM provides full control to the user to manage the concurrency setting based on the query patterns and end users of the cluster. Manual WLM is suited for redshift clusters which has diverse set of business use cases and query patterns.For every user defined queue below configurations needs to made.

Concurrency scaling mode — Spins up new cluster based on the traffic in the WLM queue
Concurrency level — Number of queries to run in parallel. (set wlm_query_slot_count to [10])
User groups — Database user group (mapping different DB users to a group
Query groups —A query group is simply a label which is used to assign queries to a group
Memory percent to use — Percentage of memory allocated for WLM
Timeout — Query timeout to handle long running queries
Query monitoring rules — Rules for each queue to define timeouts, queue hopping etc

Short Query Acceleration(SQA):

Short query acceleration (SQA) prioritizes selected short-running queries ahead of longer-running queries. SQA runs short-running queries in a dedicated space, so that SQA queries aren’t forced to wait in queues behind longer queries. SQA only prioritizes queries that are short-running and are in a user-defined queue.

To find id SQA is enabled

SELECT * FROM stv_wlm_service_class_config where service_class = 14;

To find queries running in SQA queue

SELECT 
a.queue_start_time, 
a.total_exec_time,
label, 
trim(querytxt) 
FROM stl_wlm_query a, stl_query b where a.query = b.query and a.service_class = 14 and a.final_state = ‘Completed’ order by b.query desc limit 5;

WLM Service Class

Below are service class ID’s for WLM queues.

1–4 : Reserved for system use.

5: Used by the superuser queue.

6–13: Used by manual WLM queues that are defined in the WLM configuration.

14 :Used by short query acceleration.

15 :Reserved for maintenance activities run by Amazon Redshift.

100–107: Used by automatic WLM queue when auto_wlm is true.

Query Monitoring Rules

QMR rules can be defined at the queue level to manage to resources in effective way. Rules are added from the template or custom rules can be defined.

Rule actions:

Abort — Query is killed with error message
Hop —When a query is hopped, WLM attempts to route the query to the next matching queue based on the WLM queue assignment rules. If the query doesn’t match any other queue definition, the query is canceled. It’s not assigned to the default queue.
Log — Entry created in user activity logs

SQL View for WLM Monitoring

Below are some of the queue monitoring SQL’s

Queue User Mapping

WLM Query State

WLM Queue State

WLM System Tables

Below are system tables captures information about the WLM configuration and actions.

STL_WLM_ERROR — Contains a log of WLM-related error events

STL_WLM_QUERY — Lists queries that are being tracked by WLM.

STV_WLM_CLASSIFICATION_CONFIG — Shows the current classification rules for WLM.

STV_WLM_QUERY_QUEUE_STATE — Records the current state of the query queues.

STV_WLM_QUERY_STATE — Provides a snapshot of the current state of queries that are being tracked by WLM.

STV_WLM_QUERY_TASK_STATE — Contains the current state of query tasks.

STV_WLM_SERVICE_CLASS_CONFIG — Records the service class configurations for WLM.

STV_WLM_SERVICE_CLASS_STATE — Contains the current state of the service classes.

STL_WLM_RULE_ACTION — contains details for the query that triggered the rule and the resulting action.

STV_QUERY_METRICS — table displays the metrics for currently running queries.

STL_QUERY_METRICS — table records the metrics for completed queries.

SVL_QUERY_METRICS — view shows the metrics for completed queries.

SVL_QUERY_METRICS_SUMMARY — view shows the maximum values of metrics for completed queries.

Summary

Workload management is a critical component for the redshift cluster to work on its full potential. Choosing between manual and automatic is entirely based on numbers of end users and business use-cases.To summarize manual WLM will be beneficial only for teams who needs control over the number of concurrent queries to run and memory allocation.

Thanks!

Few Helpful Links for WLM management tools

amazon-redshift-utils/src/WorkloadManagementScheduler at master · awslabs/amazon-redshift-utils

Redshift supports a powerful Workload Management feature through the WLM system. This gives you the ability to create…

github.com

amazon-redshift-utils/src/QMRNotificationUtility at master · awslabs/amazon-redshift-utils

This utility uses a scheduled Lambda function to pull records from the QMR action system log table (…