How do load Balancers Work? What is Consistent Hashing? - System Design
Updated: Dec 9, 2022
Imagine that a system is horizontally scaled, i.e. the system makes use of many servers to handle requests. There needs to be some kind of mechanism to direct these incoming requests to the destination server. It is also required that any one of the servers should not be flooded with requests while the other server sits empty. Obviously, we would want all the requests to be uniformly distributed among the servers. All the above tasks are carried out by a separate entity which is called a Load Balancer.
Note: If you want to know more about system scaling and its types - Horizontal Scaling and Vertical Scaling. Refer to my blog :
To formalize - Load balancing is the process of distributing a set of incoming requests over a set of servers, with the aim of making their overall processing efficient. Load balancing can optimize the response time and avoid uneven distribution, i.e. overloading some serves while the other compute nodes are left idle.
So how do Load Balancers distribute these requests?
Modulo Hashing or Distributed Hashing
Modulo hashing is one of the ways by which load balancers distribute incoming requests among the servers. This involves hashing the unique id of the requests and taking the modulo of the hashed number with the number of servers.
For example: if my system consists of 4 servers (Server 0, Server 1, Server 2, Server 3). I get a request, I compute the hash of this request id and it gives me 34567. Since I have 4 servers, I take the modulo of 35467 with 4 and I get the remainder as 3 :
35467 mod 4 = 3.
This indicates that the Load Balancer should direct the request to Server 3 as per modulo hashing.
Since the hash function is uniformly random and the modulo operation is also uniformly random, the requests will be distributed randomly across all the servers.
In practice, this request id is generally the user-id or the client-id. Hash of the same user id/client id will give us the same server-id. This implies all the requests from the same client will be directed to the same server. Hence it is a common practice for these servers to actually cache the user/client information in their local memory.
Let's look at an example of how the requests are distributed among each server:
Request - Id | Hash of the request id | Hash mod 4 (Num of servers) | Server Id ( when 4 servers are used ) |
User-1 | 126 | 2 | Server - 2 |
User-2 | 426 | 2 | Server - 2 |
User-3 | 404 | 0 | Server - 0 |
User-4 | 543 | 3 | Server - 3 |
User-5 | 320 | 0 | Server - 0 |
User 6 | 641 | 1 | Server - 1 |
So far so good, all the requests are spread out evenly among the servers, and each user is mapped to a particular server that has the cached values of the user.
Now - Suppose the number of requests, this service gets, starts to increase and the admin of the system decides to add 2 new servers. Going by the above method, there is going to be some changes in the way these requests are distributed. The total number of servers now becomes 6. Let's have a look at how this mapping will change.
Request - Id | Hash of the request id | Hash mod 6 (Num of servers) | Server Id ( when 6 servers are used ) | Server Id ( when 4 servers are used ) | Did server Id change |
User-1 | 126 | 0 | Server - 0 | Server - 2 | Yes |
User-2 | 426 | 0 | Server - 0 | Server - 2 | Yes |
User-3 | 404 | 2 | Server - 2 | Server - 0 | Yes |
User-4 | 543 | 3 | Server - 3 | Server - 3 | No |
User-5 | 320 | 2 | Server - 2 | Server - 0 | Yes |
User-6 | 641 | 5 | Server - 5 | Server - 1 | Yes |
As you can see, in almost all the cases, the new server id where the request is mapped to is different from the previous server id. This is not a good thing, because this means that all the local information that each server was caching, needs to be removed for almost all the users. This information again needs to be fetched/calculated and stored in the cache on every server. This is not a good way to handle the system. There is a scope for improvement. Ideally, we would want a new server to be added with minimal changes in the "client to the server" mapping. So how do we solve this issue?
We can use Consistent Hashing to solve this problem!
Consistent Hashing
Consistent hashing is used when we want to distribute the incoming requests evenly among each server, and when any server is added or removed there are minimal changes in the distribution mapping. The concept of consistent hashing can be visualized by using a hash ring. Hash ring can be formed by taking a very large number (say 2^31) and turning it to form a circle.
The idea is, we hash the server ids and take the modulo of it with this very large number (which is 2^31). We do the same for the request Ids. All the blue points in the ring below can be thought of as the modulo of the hash of the requests, and all the red points can be thought of as the modulo hash of the servers. Each request will be served by the server which is nearest to it in a clockwise manner. For example, in the diagram below, request r1 will be served by server 2.
Since the request ids are random, hash functions are random, the requests will be spread randomly across each server.
Now let's see what happens when we remove a server. In the diagram below, you can see that the requests are skewed. Server 3 went down, All the requests that Server 3 was handling are now redirected to Server 4. We can see that server 4 is handling the majority of the load. Now let's add a new Server, server 5. It is visible in the diagram below that the distribution of the requests is clearly skewed.
What went wrong? Everything is distributed at random so this issue should not have arisen. The problem is in the number of servers, the number of servers isn't enough to maintain the randomness in the distribution of requests. To resolve this problem we need to increase the number of servers. Not really though! it can be done by creating a lot of virtual servers and placing them in the hash ring.
So how do we create these virtual entries in the hash ring. It's simple- change the hash function to create a new virtual server. For example, if we used a hash function h1 initially. Use hash functions h2, h3, h4, ... , hk to create k virtual servers for a single real server. Do this for all the servers. All the servers should use the same k hash functions. After placing all these servers on the hash ring, it is evident that adding a server or removing it won't place an unnecessary load on any one server and the distribution won't be skewed. Problem Solved!
The reason why consistent hashing works the way it works is that while adding a server, there are multiple points where the server will be added, and similarly in the case when we remove a server, it will be removed from various places on the hash ring and the change in the request to server mapping won't be a lot.
Closing thoughts
Consistent hashing is used in many places - suppose you have a cluster of databases or cache servers and you want to elastically scale them up or down, based on the traffic load, go for load balancing with consistent hashing.
Let me know in the comments what are your thoughts or your experiences related to load balancing and consistent hashing.
And that's a wrap! Hi, I am Gourav Dhar, a software developer and I write blogs on Backend Development and System Design. Subscribe to my Newsletter and learn something new every week - https://thegeekyminds.com/subscribe
Comments