How to design an Asynchronous REST API.

By default requests are processed synchronously. The server gets the request, does some other requests and maybe hits a database to do the required job, then returns back with the suitable response. Doing all of this stuff while the client is waiting for the response might not be the best experience. Hitting other servers adds communication latency which should be considered while evaluating the overall design.
What are the limitations of processing requests synchronously?
Imagine your server getting hit by 5 requests at the same time, it's okay, we still can make use of the multi-threading and have 5 threads process the incoming requests. To be clear, the main problem here is that the server got the 5 requests before finishing processing any of them.
Let's imagine that Elon Musk has tweeted about your service and your API server is now getting hit by 1000 requests per second, and you want to make sure that this scenario is handled wisely. Using multi-threading is not a very good idea right now. The reason is on receiving a new request, the server needs to start a new thread to serve the incoming request.
Handling requests asynchronously
Processing the requests asynchronously increases the complexity of your application, you should think wisely before choosing this way.
Let's introduce our new system's part.
- Request receiver server.
- Message queue.
- Workers.
- Key value store service.
Request receiver server
This component is responsible for receiving the request, generating a request tracking id, and pushing this request into the message queue.
By receiving and pushing the request directly into the queue -with the request tracking id attached to it-, we reduced the time spent in waiting for the request to be processed. We can now send a response to the client that means that the request has been accepted successfully.
But the client needs some way to be able to track the request status. Here comes the usage of the generated unique id, it is used to check the current status of the request, a request status can be in progress/completed/failed.
Using this design the client will get a response like
Status code: 202 (accepted request)
Check status url: .../123456/status
123456
is the request tracking id.
Generating this id can be achieved by using a UUID generator or by using Twitter snowflake id generator.
Message queue
The message queue is an independent service that organizes the flow between the request receiver (Producer) and the workers (Consumers). The producer pushes the requests with the request tracking id attached to it into the queue, the request waits until it is fetched by some worker to process it, which means that the request might take some time to be executed.
The message queue can be done in many ways, the most common way is to use an existing service like RabbitMQ.
Workers
The workers are the servers that actually process the request, it does not have to be implemented with the same framework used in the request receiving server. The number of workers can be determined according to the current number of incoming requests, which will help in scaling up/down the system. Once the request is processed, the worker stores the response in a key value store, where the key is the unique tracking id generated previously for this request, and the value is the response.
Key value store
The key value store can be something like Redis or a NoSQL database which can be persistent for an adjustable amount of time.
Pros and cons of choosing the asynchronous approach
Pros
The time to serve a single request decreases, so you can handle more requests easily.
You can scale up/down by controlling the number of workers.
Cons
The clients must check the status by themselves, which is not the case in the synchronous approach.
Also as a result of increasing the number of components in the system, fault tolerance can be a little bit challenging. What if the queue went down? Maybe we need some persistence in order to make sure that no accepted request is missed. What if a worker went down right after fetching a request?
A lot of scenarios can arise and it should be handled somehow, so do not rush and choose your design wisely.