How to setup a Redshift cluster

In this section, we will see how to setup a new Amazon Redshift cluster. Apparently, there are many things you may set up, but the main idea is as follows.

Your Amazon AWS dashboard

If you have an account in AWS, you can log in here (You may set up a new one too).

How to setup a Redshift Cluster

After your login to AWS, navigate to Databases and click on Redshift.

How to setup a Redshift Cluster

Click on Launch Cluster.

How to setup a Redshift Cluster

Let’s setup our cluster.

  • Cluster Identifier: Give it a name
  • Database Name: This optional. You can give one that you want.
  • Master User Name: Provide a Username of master user of the cluster
  • Master User Password: the Password 🙂

If you are ready, click on Continue.

How to setup a Redshift Cluster

Selecting Node Type

Now we are in the Node Configuration part of our setup. At this point, we are going to choose between Dense Compute (DC) and Dense Storage (DS). As a reminder:

The Dense Compute cluster provides less storage, but with better performance and speed. The more data you are querying, the more compute you need to keep queries fast. Dense Compute is the type of instance to use if you need a high-performance data warehouse.

The Dense Storage cluster is designed for big data warehouses. So, if you have too much data to fit.

How to setup a Redshift Cluster - Selecting Node Type

For our example, we will leave the Node Type as is.

Selecting Number of Nodes

You will need to choose the number of nodes that your cluster will work. That number depends on the size of your dataset and your desired query performance.

For example ds2.xlarge Dense Compute nodes have 2TB HDD storage per node. For 12 TB of data, you need  6 ds2.xlarge nodes or 1 ds2.8xlarge nodes. At the same time choosing dc1.8xlarge Compute Node will give you 2.56TB SSD storage per node.

How to setup a Redshift Cluster - Selecting Number of Nodes

For our case, we chose: 2 x dc1.large Compute Nodes.

Click on Continue.

Additional Configuration

In this setup, you configure some additional setup and the networking options.

Parameter groups:

A parameter group is a group of parameters that apply to all of the databases that you create in the cluster. You associate a parameter group with each Amazon Redshift cluster you create. Read more about Parameter Groups in AWS’s documentation here.

Database Encryption:

You may also enable database encryption for your Amazon Redshift cluster to protect your data. When you enable encryption for a cluster, the data blocks and system metadata are encrypted for the cluster and its snapshots. Read more about Amazon Redshift Database Encryption and the relevant best practices in AWS’s documentation here.

How to setup a Redshift Cluster - Additional Configuration - Database Encryption

For our example, we leave the default settings*.

*It is advised to read and review Amazon AWS’s documentation for both above cases.

Network Setup

Now let’s configure our networking options.

Virtual Private Cloud (VPC)

Amazon Virtual Private Cloud (Amazon VPC) resembles a traditional virtual network. You may define that virtual network, the VPC, and launch the AWS services you need to run into it. For more details about VPC check AWS’s documentation here.

So as a first step if you have a VPC, then you need to provide it. If you do not have a VPC, a default is created.

How to setup a Redshift Cluster - Network Setup - Virtual Private Cloud (VPC)

For our example, we leave the default*.

*It is advised to read and review Amazon AWS’s documentation for the above cases.

Cluster subnet group

If you are going to provision a Redshift cluster in a VPC, then you need to create a cluster subnet group. You can have multiple subnets that will help you organize your AWS resources. For more details on Amazon Redshift Cluster Subnet Groups check AWS’s documentation here.

How to setup a Redshift Cluster - Network Setup - Cluster subnet group

Configure the Publicly Accessible option

Configuring this is optional, but if you want to access your Redshift cluster from outside of AWS, then you need to add a public IP by setting Publicly Accessible to Yes. If you want your cluster to be accessible only from within your private VPC network, then choose No.

If you select Yes, then you have the option to select an Elastic IP address (EIP) to use for the external IP address.

An EIP is a static IP address that is associated with your AWS account. You can use an EIP to connect to your cluster from outside the VPC. Read more about EIP in AWS’s documentation here.

How to setup a Redshift Cluster - Network Setup - Configure the Publicly Accessible option

The choice of using a public IP or not is up to you.

Amazon Redshift Enhanced VPC Routing

If you select Yes, then Amazon Redshift forces all COPY and UNLOAD traffic between your cluster and your data repositories through your Amazon VPC. That is important as this routing affects the traffic between your services as it travels through the Internet (including traffic to other services within the AWS network).

Enhanced VPC Routing needs extra care, and you probably need to review AWS’s documentation here.

How to setup a Redshift Cluster - Network Setup - Amazon Redshift Enhanced VPC Routing

Availability Zone

Each region in AWS has multiple Availability Zones that are isolated locations known as Availability Zones. Read more about Regions and Availability Zones in AWS’s documentation here.

Select No Preference to have AWS select the availability zone that your Redshift cluster will be created. Otherwise, select a specific availability zone.

VPC Security Groups

Last but not least, Security Groups. A Security Group is a set of rules that control access to your Redshift cluster, for example, a range of IP addresses that allow a third party tool to connect to your Redshift. You can select this Security Group here, but you can also assign it later in your cluster configuration. For more details on Security Groups read AWS documentation here.

How to setup a Redshift Cluster - Network Setup - VPC Security Groups

Click on Continue.

Launch your Cluster

In the next page review your settings and click Launch Cluster.

Next, click close and in the next screen wait for your cluster to become Available and Healthy.

How to setup a Redshift Cluster

In the next section, we will see how to define networking and security settings for your Redshift instance.

This section contains parts of our knowledge base article “Setup Amazon Redshift”

load data into any data warehouse - Blendo