At Integrate.io, we run a fleet of over ten Amazon Redshift clusters. In this post, I’ll describe how we use AWS IAM authentication for our database users.
AWS access is managed by configuring policies and connecting them to IAM identities (users, groups of users, or roles) or AWS resources. A policy is essentially an object in AWS that defines their permissions when associated with an identity or resource. Evaluation of the policies occurs when an IAM principal (user or role) makes a request, and permissions within the policies determine whether the request is allowed or denied. Policies reside in AWS as JSON documents. There is support for six types of policies: identity-based policies, resource-based policies, permissions boundaries, organization SCPs, ACLs, and session policies.
By using IAM credentials, we can enforce security policies on users and passwords. It’s a scalable and secure way to manage access to your cluster(s).
The approach we describe is useful in environments where security is a top concern. Examples are industries with regulatory requirements such as Finance or Healthcare. Other use cases include enterprises with strict IT security compliance requirements.
Secure Ways to Manage Access to Amazon Redshift
Amazon Redshift started as a data warehouse in the cloud. A frequent initial use case was business intelligence. Over the years, though, the use cases for Amazon Redshift have evolved. Redshift is still used for reporting. But it’s now also part of mission-critical data services and applications. Integrate.io is an example – we built our application on top of Redshift. And so it’s critical to track and manage user access to a cluster.
The standard way for users to log onto Amazon Redshift is by providing a database username and password; this is because Amazon Redshift is based on PostgreSQL. But using a username/password combination has a few drawbacks. The biggest one is that there is no way to enforce security best practices for password changes; this may not be an issue when you are in a small company where some 2-3 people can access your cluster. But that’s different for enterprises with a large pool of users. It’s also different when you’re a small company where headcount is growing fast. It’s easy to lose track of everybody who has access to your most valuable data sets.
Luckily, AWS has recently developed alternatives to using a username and password. There are three options to maintaining login credentials for your Amazon Redshift database:
- Permit users to create user credentials and login with their IAM credentials.
- Permit users to login with a federated sign-on sign-on (SSO) through a SAML 2.0-compliant identity provider.
- Generate temporary database credentials. Permissions are granted through an AWS Identity and Access Management (IAM) permissions policy. By default, these credentials expire after 15 minutes, but you can configure them to expire up to an hour after creation.
This post will discuss #3: using IAM credentials to generate expiring database credentials.Amazon Redshift provides the GetClusterCredentials API operation and get-cluster-credentials command for the AWS Command Line Interface (AWS CLI). Both offer the ability to generate temporary database user credentials programmatically. It is also possible to configure your SQL client with Amazon Redshift JDBC or ODBC drivers who manage the process of calling the GetClusterCredentials operation; this retrieves the database user credentials establishing a connection between the SQL client and Amazon Redshift database. Checkout JDBC and ODBC Options for Creating Database User Credentials for more information.
How Integrate.io Uses IAM to Generate Temporary Passwords
The Old Way: Manual
Handling credentials used to be a manual process, and that’s a pain in the neck. We’re already using the AWS Secrets Manager. It could have been a pretty trivial exercise to add auto-rotation to our secrets, and trigger ALTER <user> queries to update them with new credentials. That would have been my initial approach. But one of my new colleagues pointed out the option of using IAM. Redshift allows you to get time-scoped IAM credentials associated with a role within Redshift itself.
The New Way: Using IAM to Generate Expiring Passwords
That turned into a somewhat larger undertaking. First, I had to understand how we were using Redshift across our platform. With a reasonable idea of how the change would work, I changed our Redshift connection logic to pull credentials before connecting. That turned out to be a pretty easy change. We’re a Python shop and Boto3 – the AWS SDK for Python – is exhaustive. Boto enables Python developers to create, configure, and manage AWS services.
We deployed the change into one of our testing environments. Everything went well until we hit the rate limit for the Redshift API. We were making too many requests to GetClusterCredentials. We have a lot of in-flight transformations that all connect to Redshift. Making those calls along-side the connect exhausted the rate-limit. But that wasn’t insurmountable. It was pretty easy to add a caching mechanism to our connection logic so that we didn’t need to generate a new credential every time.
Once we got the caching mechanism deployed, we were able to disable logins. Access to the cluster was now only available through IAM credentials.
That left us with an awkward situation, though. Our developers have to connect to our clusters to run queries or test new features. They needed a way to generate IAM credentials and connect to a remote cluster.
We already understood how to generate IAM credentials and had code that handled that. We then solved our need by creating a task in our execution and deployment framework. The task connects to a Redshift cluster in any of our environments using IAM credentials. You can see it in use below!
The code we used to do this isn’t that important because it’s made possible by the Redshift API and some IAM policies. So you can do this yourself, even without seeing what I’ve done. You need an AWS client for your preferred programming language and an IAM policy profile granting your users access to get IAM credentials on your clusters.
Getting Started
The first thing you’re going to want to do is create an IAM policy with the Redshift Actions needed to create an IAM credential on a Redshift cluster.
The following is a policy that allows the IAM role to call the GetClusterCredentials operation, which automatically creates a new user and specifies groups the user joins at login. The “Resource”: “*” wildcard grants the role access to any resource, including clusters, database users, or user groups.
1
2
3
4
5
6
7
8
9
10
11
12
|
{
"Version": "2012-10-17",
"Statement": {
"Effect": "Allow",
"Action": [
"redshift:GetClusterCredentials",
"redshift:CreateClusterUser",
"redshift:JoinGroup"
],
"Resource": "*"
}
}
|
Of course, this example is not secure and strictly for demonstration. Please see Resource policies for GetClusterCredentials for more information and examples to achieve more granular access control.
Once you have that policy created, go ahead and create a group to contain the users to whom you want to grant access; if you’ve already got that group created, great! Attach the policy you defined in the previous step. Add the users you’d like into the group and attach the policy under the Permissions tab. At this point, all users will now be able to generate IAM credentials for existing users on your clusters.
The following shows how to use Amazon Redshift CLI to generate temporary database credentials for an existing user named adam.
As a side note, if the user doesn’t exist in the database and AutoCreate is true, the creation of a new occurs with PASSWORD disabled. In the case where the user doesn’t exist, and AutoCreate is false, the request fails.
aws redshift get-cluster-credentials –cluster-identifier clusterA –db-user adam –db-name dbA –duration-seconds 7200. |
The result is the following:
{ “DbUser”: “IAM:adam”, “Expiration”: “2020-10-08T21:10:53Z”, “DbPassword”: “EXAMPLEjArE3hcnQj8zt4XQj9Xtma8oxYEM8OyxpDHwXVPyJYBDm/gqX2Eeaq6P3DgTzgPg==” } |
Check out the official documentation for a more in-depth exploration of the CLI method.
However, using the AWS CLI is manual and can be error-prone. My recommendation is to create a utility to generate the credentials and connect to the cluster on behalf of the user. This utility could also generate ODBC or JDBC connection strings if that’s how your users connect to a cluster.
Here’s a quick Python-based example that outputs a JDBC URL. The examples assume that:
- the AWS CLI is installed and configured
- boto3 is installed
- the user in question can call
DescribeClusters
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
|
import boto3
import argparse
import sys
def main(args):
parser = argparse.ArgumentParser(description='')
parser.add_argument('-r', '--region', default='us-east-1')
parser.add_argument('-d', '--database')
parser.add_argument('identifier') parser.add_argument('username')
args = parser.parse_args(args)
redshift = boto3.client('redshift', args.region)
clusters = {c['ClusterIdentifier']: c['Endpoint'] for c in redshift.describe_clusters()['Clusters']}
endpoint = clusters.get(args.identifier)
if not endpoint:
print("Unable to connect to {rsid}".format(rsid=args.identifier))
sys.exit(-1)
kwargs = dict(
DbUser=args.username,
ClusterIdentifier=args.identifier
)
if args.database:
kwargs['DbName'] = args.database
credentials = redshift.get_cluster_credentials(**kwargs)
print('jdbc:redshift://{address}{port}/{database}?user={user}&pass={password}'
.format(
address=endpoint['Address'],
port=endpoint['Port'],
database=args.database if args.database else '',
user=credentials['DbUser'],
password=credentials['DbPassword'],
))
if __name__ == '__main__':
main(sys.argv)
|
You’re now in a situation where you have an IAM policy and a group containing your users. You’ve also configured your Redshift users with passwords DISABLED.
In short, you’ve secured your Redshift cluster. Your security team can enforce credential rotation on users using standard IAM behavior vs. direct database implementation.
For more on best practices when working with Amazon Redshift, read our post on 3 Things to Avoid When Setting Up an Amazon Redshift Cluster. Or download our best-practices guide for more tips on enterprise-grade deployments for Amazon Redshift.