How do load Balancers Work? What is Consistent Hashing? - System Design

Sep 26, 20225 min read

Updated: Dec 9, 2022

Imagine that a system is horizontally scaled, i.e. the system makes use of many servers to handle requests. There needs to be some kind of mechanism to direct these incoming requests to the destination server. It is also required that any one of the servers should not be flooded with requests while the other server sits empty. Obviously, we would want all the requests to be uniformly distributed among the servers. All the above tasks are carried out by a separate entity which is called a Load Balancer.

Note: If you want to know more about system scaling and its types - Horizontal Scaling and Vertical Scaling. Refer to my blog :

To formalize - Load balancing is the process of distributing a set of incoming requests over a set of servers, with the aim of making their overall processing efficient. Load balancing can optimize the response time and avoid uneven distribution, i.e. overloading some serves while the other compute nodes are left idle.

So how do Load Balancers distribute these requests?

Modulo Hashing or Distributed Hashing

Modulo hashing is one of the ways by which load balancers distribute incoming requests among the servers. This involves hashing the unique id of the requests and taking the modulo of the hashed number with the number of servers.

For example: if my system consists of 4 servers (Server 0, Server 1, Server 2, Server 3). I get a request, I compute the hash of this request id and it gives me 34567. Since I have 4 servers, I take the modulo of 35467 with 4 and I get the remainder as 3 :

35467 mod 4 = 3.

This indicates that the Load Balancer should direct the request to Server 3 as per modulo hashing.

Since the hash function is uniformly random and the modulo operation is also uniformly random, the requests will be distributed randomly across all the servers.

In practice, this request id is generally the user-id or the client-id. Hash of the same user id/client id will give us the same server-id. This implies all the requests from the same client will be directed to the same server. Hence it is a common practice for these servers to actually cache the user/client information in their local memory.

Let's look at an example of how the requests are distributed among each server:

Request - Id	Hash of the request id	Hash mod 4 (Num of servers)	Server Id ( when 4 servers are used )
User-1	126	2	Server - 2
User-2	426	2	Server - 2
User-3	404	0	Server - 0
User-4	543	3	Server - 3
User-5	320	0	Server - 0
User 6	641	1	Server - 1

So far so good, all the requests are spread out evenly among the servers, and each user is mapped to a particular server that has the cached values of the user.

Now - Suppose the number of requests, this service gets, starts to increase and the admin of the system decides to add 2 new servers. Going by the above method, there is going to be some changes in the way these requests are distributed. The total number of servers now becomes 6. Let's have a look at how this mapping will change.

Request - Id	Hash of the request id	Hash mod 6 (Num of servers)	Server Id ( when 6 servers are used )	Server Id ( when 4 servers are used )	Did server Id change
User-1	126	0	Server - 0	Server - 2	Yes
User-2	426	0	Server - 0	Server - 2	Yes
User-3	404	2	Server - 2	Server - 0	Yes
User-4	543	3	Server - 3	Server - 3	No
User-5	320	2	Server - 2	Server - 0	Yes
User-6	641	5	Server - 5	Server - 1	Yes

As you can see, in almost all the cases, the new server id where the request is mapped to is different from the previous server id. This is not a good thing, because this means that all the local information that each server was caching, needs to be removed for almost all the users. This information again needs to be fetched/calculated and stored in the cache on every server. This is not a good way to handle the system. There is a scope for improvement. Ideally, we would want a new server to be added with minimal changes in the "client to the server" mapping. So how do we solve this issue?

We can use Consistent Hashing to solve this problem!

Consistent Hashing

Consistent hashing is used when we want to distribute the incoming requests evenly among each server, and when any server is added or removed there are minimal changes in the distribution mapping. The concept of consistent hashing can be visualized by using a hash ring. Hash ring can be formed by taking a very large number (say 2^31) and turning it to form a circle.

The idea is, we hash the server ids and take the modulo of it with this very large number (which is 2^31). We do the same for the request Ids. All the blue points in the ring below can be thought of as the modulo of the hash of the requests, and all the red points can be thought of as the modulo hash of the servers. Each request will be served by the server which is nearest to it in a clockwise manner. For example, in the diagram below, request r1 will be served by server 2.

Since the request ids are random, hash functions are random, the requests will be spread randomly across each server.

Now let's see what happens when we remove a server. In the diagram below, you can see that the requests are skewed. Server 3 went down, All the requests that Server 3 was handling are now redirected to Server 4. We can see that server 4 is handling the majority of the load. Now let's add a new Server, server 5. It is visible in the diagram below that the distribution of the requests is clearly skewed.

What went wrong? Everything is distributed at random so this issue should not have arisen. The problem is in the number of servers, the number of servers isn't enough to maintain the randomness in the distribution of requests. To resolve this problem we need to increase the number of servers. Not really though! it can be done by creating a lot of virtual servers and placing them in the hash ring.

So how do we create these virtual entries in the hash ring. It's simple- change the hash function to create a new virtual server. For example, if we used a hash function h1 initially. Use hash functions h2, h3, h4, ... , hk to create k virtual servers for a single real server. Do this for all the servers. All the servers should use the same k hash functions. After placing all these servers on the hash ring, it is evident that adding a server or removing it won't place an unnecessary load on any one server and the distribution won't be skewed. Problem Solved!

The reason why consistent hashing works the way it works is that while adding a server, there are multiple points where the server will be added, and similarly in the case when we remove a server, it will be removed from various places on the hash ring and the change in the request to server mapping won't be a lot.

Closing thoughts

Consistent hashing is used in many places - suppose you have a cluster of databases or cache servers and you want to elastically scale them up or down, based on the traffic load, go for load balancing with consistent hashing.

Let me know in the comments what are your thoughts or your experiences related to load balancing and consistent hashing.

And that's a wrap! Hi, I am Gourav Dhar, a software developer and I write blogs on Backend Development and System Design. Subscribe to my Newsletter and learn something new every week - https://thegeekyminds.com/subscribe

See All

APIs and How to design them — All you need to know !

Publisher-Subscriber Model — System Design/Architecture

Horizontal vs Vertical Scaling - Which one is better? - System Design

1 Comment

luks Aly

3 days ago

Struggling with coursework? pay someone to take my class online and make your academic journey smooth. You can easily Hire someone to do my class and get professional help for better grades without the stress.

How do load Balancers Work? What is Consistent Hashing? - System Design

So how do Load Balancers distribute these requests?

Modulo Hashing or Distributed Hashing

Consistent Hashing

Closing thoughts

Related Posts

1 Comment

Subscribe to my Youtube Channel @codewithgd

Master MongoDB Aggregation Pipeline: Essential Operators & Real-World Examples

Remove duplicates from sorted array

Encryption in Typescript

Exploring Event-Driven Architecture - Its Pros and Cons

What are Message Queues? How do Message Queues work?

Achieving High Availability in Microservices: Best Practices and Strategies

A Comprehensive Guide to Achieving Scalability in Microservices

What is the role of cloud computing in Microservices?

How message queues increase the reliability of the system

What is an Idempotent API and How to Use it?

Microservices vs Monolithic Architecture - Which one should you choose

Exploring Stateful vs. Stateless Architecture

How WebSockets are different from HTTP?

Understanding Resiliency in Applications & Services: What It Is and How to Build It

Advantages and Disadvantages of Microservices

How do Microservices Communicate With Each Other?

9 Proven Strategies to Improve API Performance

What are the benefits of message queues?

Eventual Consistency in Microservices and Large-Scale Distributed Systems

What happens when the load balancer fails?

How does database sharding work in SQL server? Explained with examples

Is the Load Balancer a Reverse Proxy? Reverse Proxy vs Load Balancer

Database Sharding vs Partitioning - What are the differences

Can the load balancer be a single point of failure?

What is the CAP theorem? Is the CAP theorem still valid?

What is DNS? How does DNS work? Types of DNS Servers- Backend Development Series

What is a CDN (Content Delivery Network)? How does CDN work and where is it used?

How Database Indexing Makes Your Query Faster in a Relational Database - The Complete Guide

What is an SSL/TLS Certificate and How do they Secure Your Website? - Backend Development Series

What is OSI Model? 7 layers of the OSI Model Explained (2023)

What are WebSockets? Everything you need to know about WebSockets! (2023)

Beginners guide to RabbitMQ - (Backend Development Series)

What is a Proxy? The difference between a proxy and a reverse proxy. Use cases of proxies.

Build an Alexa or Siri Equivalent Bot in Python Using OpenAI in minutes!

Python Has A Major Scalability Flaw! - System Design

How to create the perfect Pull Request?

Videos you might like

Let's Get Social

Subscribe to our NewsLetter

Let's Get
Social