Building powerful mechanism for decoupling application components should be part of design considerations for any architecture in the cloud. AWS messaging services SQS and SNS can be applied at architectural level to build loosely coupled systems that facilitate multiple business use cases.
A fundamental thing for the project success is understanding the importance of making the systems highly cohesive and loosely coupled. Coupling can be present on many levels: platform, network or operation. This means that the project should deal with connecting heterogeneous systems components, resided on cloud or on-premise where operations can be synchronously blocked or asynchronously.
The AWS messaging services, Amazon SQS and Amazon SNS, can help you deal with these forms of coupling by:
- Reliable, durable, and fault-tolerant delivery of messages between components
- Creating unidirectional, non-blocking operations, temporarily decoupling system components at run-time
- Provide logical decomposition of systems
- Decreasing the dependencies that components have on each other through standard communication and network channels
In this article, we will look at some ways of SQS and SNS usage that can help you as an AWS engineer to decouple components in your architecture.
Let’s assume that we have business requirement that needs to process orders from clients. When a client makes an order, that request is processed by several components in the background:
- Additional data is retrieved from the database
- Processed data is saved to database
- One call is made to legacy system using API A
- Another call is made to third party system using API B
- Response is returned to the client
There are several potential points of vulnerability in the order processing flow:
- Retrieving and saving data to the database
- Availability and responsiveness of APIs A and B
- Overall response time of our order processing API
The business expects every order to be persistent into the database and clients expect every order to be processed. However, any potential deadlock or network issue could cause the persistence of the order to fail. Then, the order is lost. Good logging can save some of the data, but manual recreation of the order is time consuming operation, where time is crucial point in order processing flow.
This is a perfect scenario where you can introduce AWS SQS. SQS allows you to create a queue with capabilities which are already built in of:
- Temporary ‘database’ for the orders in form of messages
- Retry mechanism for messages that failed processing
- Mechanism for idempotent processing
- Easy Delete mechanism for processed messages
- Fast way to meet business change requirements
Simple Queuing Service
In the scenario of processing order request, the application that receives the order, or shortly named Producer, puts the order in form of message into the SQS queue. The first thing that we achieve here is quick response to the client. In time when the message is successfully placed in SQS we can return the response to the client.
Next is message processing – let’s call this event processor function Consumer. That can be a Lambda function that reads from SQS and process the read messages. In this step it is really important to know how long a typical processing takes time so you can set a message VisibilityTimeout that is long enough to complete your operation. Also, if you are using Lambda function, make sure the timeout of the Lambda is smaller than the visibility timeout of the queue. If the processing takes longer than the specified visibility timeout period, the message becomes visible on the queue and other nodes may pick it and process the same order twice, leading to unwanted consequences.
Note: Only a couple of days ago, AWS Lambda added SQS as an event source. Before this upgrade you could use CloudWatch rule that wakes up the Lambda function on time interval to check the SQS for new messages. For SQS as an event source you can find more info on: https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/
Another important thing to mention here is that the Lambda function as an Event source supports only standard SQS queue not FIFO ones. So, in our samples below please keep in mind that we use standard SQS. For SQS FIFO you will still need CloudWatch rule that will invoke Lambda function to read a message from a queue.
If you cannot measure the exact time, there is another way to handle duplicate messages or messages handled multiple times. Try to make your processing application idempotent. In mathematics, idempotent describes a function that produces the same result if it is applied to itself. In our scenario, you can apply another type of SQS, known as SQS FIFO. This type of queue provides the benefits of message sequencing, but also mechanisms for content-based deduplication. You can deduplicate using the MessageDeduplicationId property on the SendMessage request or by enabling content-based deduplication on the queue, which generates a hash for MessageDeduplicationId, based on the content of the message, not the attributes.
When a consumer receives and processes a message from a queue, the message remains in the queue. Amazon SQS doesn’t automatically delete the message and this needs to be handled from event processor function. Because AWS SQS is a part of a distributed system, there’s no guarantee that the consumer actually receives the message (for example, due to a connectivity issue, or due to an issue in the consumer application). Thus, the consumer must delete the message from the queue after receiving and processing it.
An Amazon SQS message has three basic states:
- Sent to a queue by a producer,
- Received from the queue by a consumer, and
- Deleted from the queue.
A message is considered to be in flight after it is received from a queue by a consumer, but not yet deleted from the queue – between states 2 and 3. There is no limit to the number of messages in a queue which are between states 1 and 2.
In case when a message is not processed because of a failure, it returns to the SQS. It stays in the SQS until the visibility timeout is finished and then it’s picked by another process. In order not to get stuck with unprocessed messages into the queue, you can configure a DLQ, specially designed queue that keeps failed messages. This allows you not to lose messages and to review them later for finding the reason of failure.
Another benefit that can be achieved with SQS is that you can accommodate new business requirements without dramatically affecting your application. For example, if you need to process orders from one client separately from the other clients, you can introduce new SQS and attach new consumer function to it, without interrupting the existing ongoing solution.
Simple Notification Service
A combination of SNS and SQS makes AWS messaging system even more powerful. You can introduce SNS to support reliable publish-subscribe (pub-sub) scenarios. From the producer, messages can be sent to the SNS topic and then replicated to numerous endpoints for processing like SQS or Lambda. This allows you one message to be processed in parallel in several different flows. In our scenario, calls to API A and API B can be made in parallel with saving data in database: Order as Message is published to SNS, then SNS sends that message to SQS or Lambda for API A, SQS or Lambda for API B and SQS or Lambda for database. These three processes are happening in parallel.
It is really important for the message to be saved and then processed. Producer can send the message first to SQS as a temporary database. Then Lambda function as a handler process can read the message from the SQS, structure it in the required format and publish it to the SNS. Subscribers to the SNS are 3 Lambda functions: one for sending request to API A, another for sending request to API B and third one for saving the data in database.
Conclusion
SQS and SNS are powerful messaging services that together can provide you with a powerful mechanism for decoupling application components.
SQS was the first service available on the AWS platform from 2004. During the years these services were upgraded to meet new business requirements. Their combination with Lambda function allows out-of-the-box concurrency and scaling. When the number of messages is trending up, the Lambda function is scaling automatically until it hits the concurrency limits. When the number of messages trends down, the Lambda function decreases the concurrency.
Using these services, you can implement powerful messaging capabilities in your systems, and provide loosely coupled systems that facilitate multiple business use cases.