Event-driven, Serverless Architectures with AWS Lambda, SQS, DynamoDB, and API Gateway
In this post, we will explore modern application development using an event-driven, serverless architecture on AWS. To demonstrate this architecture, we will integrate several fully-managed services, all part of the AWS Serverless Computing platform, including Lambda, API Gateway, SQS, S3, and DynamoDB. The result will be an application composed of small, easily deployable, loosely coupled, independently scalable, serverless components.
What is ‘Event-Driven’?
According to Otavio Ferreira, Manager, Amazon SNS, and James Hood, Senior Software Development Engineer, in their AWS Compute Blog, Enriching Event-Driven Architectures with AWS Event Fork Pipelines, “Many customers are choosing to build event-driven applications in which subscriber services automatically perform work in response to events triggered by publisher services. This architectural pattern can make services more reusable, interoperable, and scalable.” This description of an event-driven architecture perfectly captures the essence of the following post. All interactions between application components in this post will be as a direct result of triggering an event.
What is ‘Serverless’?
Mistakingly, many of us think of serverless as just functions (aka Function-as-a-Service or FaaS). When it comes to functions on AWS, Lambda is only one of many fully-managed services that make up the AWS Serverless Computing platform. So, what is ‘ serverless’? According to AWS, “Serverless applications don’t require provisioning, maintaining, and administering servers for backend components such as compute, databases, storage, stream processing, message queueing, and more.”
As a Developer, one of my favorite features of serverless is the cost or lack thereof. With serverless on AWS, you pay for consistent throughput or execution duration rather than by server unit, and, at least on AWS, you don’t pay for idle resources. This is not always true of ‘serverless’ offerings on other leading Cloud platforms. Remember, if you’re paying for it but not using it, it’s not serverless.
If you’re paying for it but not using it, it’s not serverless.
To demonstrate an event-driven, serverless architecture, we will build, package, and deploy an application capable of extracting messages from CSV files placed in S3, transforming those messages, queueing them to SQS, and finally, writing the messages to DynamoDB, using Lambda functions. We will also expose a RESTful API, via API Gateway, to perform CRUD-like operations on those messages in DynamoDB.
AWS Technologies
In this demonstration, we will use several AWS serverless services, including the following.
- AWS Lambda
- Amazon Simple Storage Service (S3)
- Amazon DynamoDB
- Amazon API Gateway
- Amazon Simple Queue Service (SQS)
Each Lambda will use function-specific execution roles, part of AWS Identity and Access Management (IAM). We will log the event details and monitor services using Amazon CloudWatch.
To codify, build, package, deploy, and manage the Lambda functions and other AWS resources in a fully automated fashion, we will also use the following AWS services.
- AWS Serverless Application Model (SAM)
- AWS CloudFormation
- AWS Command Line Interface (CLI)
- AWS SDK for JavaScript in Node.js
- AWS SDK for Python (boto3)
Architecture
The high-level architecture for the platform provisioned and deployed in this post is illustrated in the diagram below. There are two separate workflows. In the first workflow (top), data is extracted from CSV files placed in S3, transformed, queued to SQS, and written to DynamoDB, using Python-based Lambda functions. In the second workflow (bottom), data is manipulated in DynamoDB through interactions with a RESTful API, exposed via an API Gateway, and backed by Node.js-based Lambda functions.
Using the vast array of current AWS services, there are several ways we could extract, transform, and load data from static files into DynamoDB. The demonstration’s event-driven, serverless architecture represents just one possible approach.
Source Code
All source code for this post is available on GitHub in a single public repository, serverless-sqs-dynamo-demo. To clone the GitHub repository, execute the following command.
git clone --branch master --single-branch --depth 1 --no-tags \
https://github.com/garystafford/serverless-sqs-dynamo-demo.git
The project files relevant to this demonstration are organized as follows.
.
├── README.md
├── lambda_apigtw_to_dynamodb
│ ├── app.js
│ ├── events
│ ├── node_modules
│ ├── package.json
│ └── tests
├── lambda_s3_to_sqs
│ ├── __init__.py
│ ├── app.py
│ ├── requirements.txt
│ └── tests
├── lambda_sqs_to_dynamodb
│ ├── __init__.py
│ ├── app.py
│ ├── requirements.txt
│ └── tests
├── requirements.txt
├── template.yaml
└── sample_data
├── data.csv
├── data_bad_msg.csv
└── data_good_msg.csv
Some source code samples in this post are GitHub Gists, which may not display correctly on all social media browsers, such as LinkedIn.
Prerequisites
The demonstration assumes you already have an AWS account. You will need the latest copy of the AWS CLI, SAM CLI, and Python 3 installed on your development machine.
Additionally, you will need two existing S3 buckets. One bucket will be used to store the packaged project files for deployment. The second bucket is where we will place CSV data files, which, in turn, will trigger events that invoke multiple Lambda functions.
Deploying the Project
Before diving into the code, we will deploy the project to AWS. Conveniently, the entire project’s resources are codified in an AWS SAM template. We are using the AWS Serverless Application Model (SAM). AWS SAM is a model used to define serverless applications on AWS. According to the official SAM GitHub project documentation, AWS SAM is based on AWS CloudFormation. A serverless application is defined in a CloudFormation template and deployed as a CloudFormation stack.
Template Parameter
CloudFormation will create and uniquely name the SQS queues and the DynamoDB table. However, to avoid circular references, a common issue when creating resources associated with S3 event notifications, it is easier to use a pre-existing bucket. To start, you will need to change the SAM template’s DataBucketName
parameter’s default value to your own S3 bucket name. Again, this bucket is where we will eventually push the CSV data files. Alternately, override the default values using the sam build command
, next.
Parameters:
DataBucketName:
Type: String
Description: S3 bucket where CSV files are processed
Default: your-data-bucket-nameSAM CLI Commands
With the DataBucketName
parameter set, proceed to validate, build, package, and deploy the project using the SAM CLI and the commands below. In addition to the sam validate
command, I also like to use the aws cloudformation validate-template
command to validate templates and catch any potential, additional errors.
Note the S3_BUCKET_BUILD
variable, below, refers to the name of the S3 bucket SAM will use package and deploy the project from, as opposed to the S3 bucket, which the CSV data files will be placed into (gist).
# change me
S3_BUILD_BUCKET=your_build_bucket_name
STACK_NAME=your_cloudformation_stack_name# validate
sam validate --template template.yamlaws cloudformation validate-template \
--template-body file://template.yaml# build
sam build --template template.yaml# package
sam package \
--output-template-file packaged.yaml \
--s3-bucket $S3_BUILD_BUCKET# deploy
sam deploy --template-file packaged.yaml \
--stack-name $STACK_NAME \
--capabilities CAPABILITY_IAM \
--debug
After validating the template, SAM will build and package each individual Lambda function and its associated dependencies. Below, we see each individual Lambda function being packaged with a copy of its dependencies.
Once packaged, SAM will deploy the project and create the AWS resources as a CloudFormation stack.
Once the stack creation is complete, use the CloudFormation management console to review the AWS resources created by SAM. There are approximately 14 resources defined in the SAM template, which result in 33 individual resources deployed as part of the CloudFormation stack.
Note the stack’s output values. You will need these values to interact with the deployed platform later in the demonstration.
Test the Deployed Application
Once the CloudFormation stack has deployed without error, copying a CSV file to the S3 bucket is the quickest way to confirm everything is working. The project includes test data files with 20 rows of test message data. Below is a sample of the CSV file, which is included in the project. The data was collected from IoT devices that measured response time from wired versus wireless devices on a LAN network; the message details are immaterial to this demonstration (gist).
timestamp,location,source,local_dest,local_avg,remote_dest,remote_avg
1559040909.3853335,location-03,wireless,router-1,4.39,device-1,9.09
1559040919.5273902,location-03,wireless,router-1,0.49,device-1,16.75
1559040929.6446512,location-03,wireless,router-1,0.56,device-1,8.31
1559040939.7712135,location-03,wireless,router-1,1.64,device-1,9.4
1559040949.891723,location-03,wireless,router-1,1.18,device-1,9.07
1559040960.011338,location-03,wireless,router-1,0.42,device-1,8.4
1559040970.1319716,location-03,wireless,router-1,1.73,device-1,8.66
1559040980.2533505,location-03,wireless,router-1,0.67,device-1,8.61
1559040990.3816211,location-03,wireless,router-1,1.27,device-1,10.87
1559041000.5105414,location-03,wireless,router-1,1.63,device-1,10.08
Run the following commands to copy the test data file to your S3 bucket.
S3_DATA_BUCKET=your_data_bucket_name aws s3 cp sample_data/data.csv s3://$S3_DATA_BUCKET
Visit the DynamoDB management console. You should observe a new DynamoDB table.
Within the new DynamoDB table, you should observe twenty items, corresponding to each of the twenty rows in the CSV file, uploaded to S3.
Drill into an individual item within the table and review its attributes. They should match the rows in the CSV file.
Both the Python- and Node.js-based Lambda functions have their default logging levels set to debug. The debug-level output from each Lambda function is streamed to individual Amazon CloudWatch Log Groups. We can use the CloudWatch logs to troubleshoot any issues with the deployed application. Below we see an example of CloudWatch log entries for the request and response payloads generated from GetMessageFunction Lambda function, which is querying DynamoDB for a single Item.
Event-Driven Patterns
There are three distinct and discrete event-driven dataflows within the demonstration’s architecture
- S3 Event Source for Lambda (S3 to SQS)
- SQS Event Source for Lambda (SQS to DynamoDB)
- API Gateway Event Source for Lambda (API Gateway to DynamoDB)
Let’s examine each event-driven dataflow and the Lambda code associated with that part of the architecture.
S3 Event Source for Lambda
Whenever a file is copied into the target S3 bucket, an S3 Event Notification triggers an asynchronous invocation of a Lambda. According to AWS, when you invoke a function asynchronously, the Lambda sends the event to the SQS queue. A separate process reads events from the queue and executes your Lambda function.
The Lambda’s function handler, written in Python, reads the CSV file, whose filename is contained in the event. The Lambda extracts the rows in the CSV file, transforms the data, and pushes each message to the SQS queue (gist).
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(
event['Records'][0]['s3']['object']['key'],
encoding='utf-8'
)
messages = read_csv_file(bucket, key)
process_messages(messages)
Below is an example of a message body, part an SQS message, extracted from a single row of the CSV file, and sent by the Lambda to the SQS queue. The timestamp has been converted to separate date and time fields by the Lambda. The DynamoDB table is part of the SQS message body. The key/value pairs in the Item JSON object reflect the schema of the DynamoDB table (gist).
{
"TableName": "your-dynamodb-table-name",
"Item": {
"date": {
"S": "2001-01-01"
},
"time": {
"S": "09:01:05"
},
"location": {
"S": "location-03"
},
"source": {
"S": "wireless"
},
"local_dest": {
"S": "router-1"
},
"local_avg": {
"N": "5.55"
},
"remote_dest": {
"S": "device-1"
},
"remote_avg": {
"N": "10.10"
}
}
}
SQS Event Source for Lambda
According to AWS, SQS offers two types of message queues, Standard and FIFO (First-In-First-Out). An SQS FIFO queue is designed to guarantee that messages are processed exactly once, in the exact order that they are sent. A Standard SQS queue offers maximum throughput, best-effort ordering, and at-least-once delivery.
Examining the SQS management console, you should observe that the CloudFormation stack creates two SQS Standard queues-a primary queue and a Dead Letter Queue (DLQ). According to AWS, Amazon SQS supports dead-letter queues, which other queues (source queues) can target for messages that cannot be processed (consumed) successfully.
Examining the SQS Lambda Triggers tab, you should observe the Lambda, which will be triggered by the SQS events.
When a message is pushed into the SQS queue by the previous process, an SQS event is fired, which synchronously triggers an invocation of the Lambda using the SQS Event Source for Lambda functionality. When a function is invoked synchronously, Lambda runs the function and waits for a response.
In the demonstration, the Lambda’s function handler, also written in Python, pulls the message off of the SQS queue and writes the message (DynamoDB put) to the DynamoDB table. Although writing is the primary use case in this demonstration, an event could also trigger a get, scan, update, or delete command to be executed on the DynamoDB table (gist).
def lambda_handler(event, context):
operations = {
'DELETE': lambda dynamo, x: dynamo.delete_item(**x),
'POST': lambda dynamo, x: dynamo.put_item(**x),
'PUT': lambda dynamo, x: dynamo.update_item(**x),
'GET': lambda dynamo, x: dynamo.get_item(**x),
'GET_ALL': lambda dynamo, x: dynamo.scan(**x),
}for record in event['Records']:
payload = loads(record['body'], parse_float=str)
operation = record['messageAttributes']['Method']['stringValue']
if operation in operations:
try:
operations[operation](dynamo_client, payload)
except Exception as e:
logger.error(e)
else:
logger.error('Unsupported method \'{}\''.format(operation))
API Gateway Event Source for Lambda
Examining the API Gateway management console, you should observe that CloudFormation created a new Edge-optimized API. The API contains several resources and their associated HTTP methods.
Each API resource is associated with a deployed Lambda function. Switching to the Lambda console, you should observe a total of seven new Lambda functions. There are five Lambda functions related to the API, in addition to the Lambda called by the S3 event notifications and the Lambda called by the SQS event notifications.
Examining one of the Lambda functions associated with the API Gateway, we should observe that the API Gateway trigger for the Lambda (lower left and bottom).
When an end-user makes an HTTP(S) request via the RESTful API exposed by the API Gateway, an event is fired, which synchronously invokes a Lambda using the API Gateway Event Source for Lambda functionality. The event contains details about the HTTP request that is received. The event triggers any one of five different Lambda functions, depending on the HTTP request method.
The Lambda code, written in Node.js, contains five function handlers. Each handler corresponds to an HTTP method, including GET (DynamoDB get) POST (put), PUT (update), DELETE (delete), and SCAN (scan). Below is an example of the getMessage
handler function. The function accepts two inputs. First, a path parameter, the date, which is the primary partition key for the DynamoDB table. Second, a query parameter, the time, which is the primary sort key for the DynamoDB table. Both the primary partition key and sort key must be passed to DynamoDB to retrieve the requested record (gist).
exports.getMessage = async (event, context) => {
if (tableName == null) {
tableName = process.env.TABLE_NAME;
}
params = {
TableName: tableName,
Key: {
"date": event.pathParameters.date,
"time": event.queryStringParameters.time
}
};
console.debug(params.Key);return await new Promise((resolve, reject) => {
docClient.get(params, (error, data) => {
if (error) {
console.error(`getMessage ERROR=${error.stack}`);
resolve({
statusCode: 400,
error: `Could not get messages: ${error.stack}`
});
} else {
console.info(`getMessage data=${JSON.stringify(data)}`);
resolve({
statusCode: 200,
body: JSON.stringify(data)
});
}
});
});
};
Test the API
To test the Lambda functions, called by our API, we can use the sam local invoke
command, part of the SAM CLI. Using this command, we can test the local Lambda functions, without them being deployed to AWS. The command allows us to trigger events, which the Lambda functions will handle. This is useful as we continue to develop, test, and re-deploy the Lambda functions to our Development, Staging, and Production environments.
The local Node.js-based, API-related Lambda functions, just like their deployed copies, will execute commands against the actual DynamoDB on AWS. The Github project contains a set of five sample events, corresponding to the five Lambda functions, which in turn are associated with five different HTTP methods and API resources. For example, the event_getMessage.json event is associated with the GET HTTP method and calls the /message/{date}?time={time}
resource endpoint, to return a single item. This event, shown below, triggers the GetMessageFunction Lambda (gist).
{
"body": "",
"resource": "/",
"path": "/message",
"httpMethod": "GET",
"isBase64Encoded": false,
"queryStringParameters": {
"time": "06:45:43"
},
"pathParameters": {
"date": "2000-01-01"
},
"stageVariables": {}
}
We can trigger all the events from using the CLI. The local Lambda expects the DynamoDB table name to exist as an environment variable. Make sure you set it locally, first, before executing the sam local invoke
commands (gist).
# change me (required by local lambda functions)
TABLE_NAME=your-dynamodb-table-name# local testing (All CRUD functions)
sam local invoke PostMessageFunction \
--event lambda_apigtw_to_dynamodb/events/event_postMessage.json
sam local invoke GetMessageFunction \
--event lambda_apigtw_to_dynamodb/events/event_getMessage.json
sam local invoke GetMessagesFunction \
--event lambda_apigtw_to_dynamodb/events/event_getMessages.json
sam local invoke PutMessageFunction \
--event lambda_apigtw_to_dynamodb/events/event_putMessage.json
sam local invoke DeleteMessageFunction \
--event lambda_apigtw_to_dynamodb/events/event_deleteMessage.json
If the events were successfully handled by the local Lambda functions, in the terminal, you should see the same HTTP response status codes you would expect from calling the RESTful resources via the API Gateway. Below, for example, we see the POST event being handled by the PostMessageFunction Lambda, adding a record to the DynamoDB table, and returning a successful status of 201 Created
.
Testing the Deployed API
To test the actual deployed API, we can call one of the API’s resources using an HTTP client, such as Postman. To locate the URL used to invoke the API resource, look at the ‘Prod’ Stage for the new API. This can be found in the Stages tab of the API Gateway console. For example, note the Invoke URL for the POST HTTP method of the /message
resource, shown below.
Below, we see an example of using Postman to make an HTTP GET request the /message/{date}?time={time}
resource. We pass the required query and path parameters for date and for time. The request should receive a single item in response from DynamoDB, via the API Gateway and the associated Lambda. Here, the request was successful, and the Lambda function returns a 200 OK
status.
Similarly, below, we see an example of calling the same /message
endpoint using the HTTP POST method. In the body of the POST request, we pass the DynamoDB table name and the Item object. Again, the POST is successful, and the Lambda function returns a 201 Created
status.
To complete the demonstration and remove the AWS resources, run the following commands. It is necessary to delete all objects from the S3 data bucket, first, before deleting the CloudFormation stack. Else, the stack deletion will fail.
S3_DATA_BUCKET=your_data_bucket_name STACK_NAME=your_stack_name aws s3 rm s3://$S3_DATA_BUCKET/data.csv # and any other objects aws cloudformation delete-stack \ --stack-name $STACK_NAME
In this post, we explored a simple example of building a modern application using an event-driven serverless architecture on AWS. We used several services, all part of the AWS Serverless Computing platform, including Lambda, API Gateway, SQS, S3, and DynamoDB. In addition to these, AWS has additional serverless services, which could enhance this demonstration, in particular, Amazon Kinesis, AWS Step Functions, Amazon SNS, and AWS AppSync.
In a future post, we will look at how to further test the individual components within this demonstration’s application stack, and how to automate its deployment using DevOps and CI/CD principals on AWS.
All opinions expressed in this post are my own and not necessarily the views of my current or past employers or their clients.
Originally published at http://programmaticponderings.com on October 4, 2019.