Database Sharding vs Partitioning - What are the differences
Updated: Feb 14, 2023
You can listen to the audio of this blog here
Let's dive right in -
Database Sharding vs Partitioning
Partitioning literally means the process of dividing something into 2 or more pieces.
Imagine that you purchased a cake. You have a container to store this cake. But there's a problem. The container is small and the entire cake can't accommodate in the container, so you purchase another container and decide to split the cake into both these containers.
As a result, you cut the cake into 9 pieces. You put 5 pieces of cake in container 1 and the remaining pieces in container 2. You may wonder - "So Gourav, what has this to do with sharding and partitioning"
Well, in terms of the database, each piece of the cake is a partition and the containers are called shards. The process of creating partitions is called partitioning and the process of creating shards is called sharding.
Let me elaborate. Suppose you own a company and use a SQL table to store all the employees' data. This SQL table sits on a server. Now imagine that your company hired a lot of employees. So many, that the SQL database server cannot handle the load of operations performed on the database.
Clearly, it's time to scale up your database. Sharding is the way to do it. You purchase another SQL server. Now you have a total of 2 shards. The big question is how do you divide the data into these 2 shards (SQL servers)? For this, you would need to devise a strategy to partition your data. The simplest strategy, off the top of my head, is to partition based on the name initials. By this, I mean all the employees whose name starts with the alphabet 'A' can be in one partition. Similarly, all the employees, whose name starts with 'B' can be another partition, and so on. Since there are 26 alphabets, we would have 26 partitions. We can place the first 13 partitions in shard 1 and the next 13 partitions in shard 2.
It may be clear that a shard can have multiple partitions in it. But a partition can reside in only one shard. Also if a database is partitioned, it does not imply that the database is definitely sharded. But if a database is sharded, it implies that the database has definitely been partitioned.
To sum it up
The process of dividing data in a database into logical parts is called partitioning.
Sharding is a database server partitioning technique that can be used to distribute data across different servers in order to improve performance and scalability. Sharding is an alternative approach for scaling databases, which divides the database into smaller pieces called shards. Each shard can then be hosted on a separate server, which helps distribute the load among them. Sharding is not the only way of scaling databases, but it’s one of the most popular techniques due to its simplicity and effectiveness.
Pros and Cons of Database Sharding
Database sharding splits a database server into multiple smaller data stores. This is done to improve the performance of the system and reduce the load on each individual database server. Sharding is often used with databases that have high read/write workloads and high throughput requirements. It can also be used to allow for horizontal scaling of databases, which allows for increased storage capacity without having to increase the number of servers.
The Pros of Database Sharding
Scalability: Sharding enables horizontal scaling, which allows a database to grow beyond the capacity of a single server.
Improved Performance: Sharding can improve the performance of a database by distributing the load among multiple servers, reducing the risk of bottlenecks.
High Availability: By replicating data across multiple servers, sharding can increase the availability of a database, ensuring that it continues to operate even if one or more servers fail.
Cost Savings: By distributing the load, sharding can reduce the costs associated with running a large database on a single, high-end server.
Flexibility: Sharding can enable a database to be optimized for specific workloads, improving the performance of specific queries and operations.
The Cons of Database Sharding
Complexity: Sharding adds complexity to a database, requiring significant planning and coordination to ensure data is properly partitioned and distributed.
Increased Latency: Sharding can result in increased latency as queries may need to be executed on multiple servers, leading to increased response times.
Maintenance Overhead: Maintaining a sharded database can be more difficult and time-consuming than maintaining a single, non-sharded database.
Increased Cost: The cost of setting up, configuring, and maintaining a sharded database can be higher than the cost of maintaining a single, non-sharded database.
Data Consistency: Maintaining data consistency across multiple servers can be challenging, as updates to one server may not be immediately reflected on other servers.
Query Complexity: Queries can become more complex when working with a sharded database, as they may need to be executed on multiple servers.
Downtime: Database sharding can lead to downtime. If one of the shards goes down, then all other shards will go down as well.
Performance: Database sharding can lead to performance issues because of the extra work needed for queries and updates.
Pros and Cons of Partitioning
Partitioning is the process of splitting up a database into smaller, more manageable pieces. Partitioning has many benefits, but it also has notable drawbacks.
Pros of Partitioning
It allows databases to scale out and grow as needed without having to redesign the original database schema.
Allows for better utilization of hardware resources by spreading out workloads and decreasing contention on system resources such as CPU time, memory, disk space, etc.
Drawbacks of Partitioning
It can cause performance degradation if not properly configured.
If a partition is lost or damaged, all of the data stored in that partition is lost or damaged as well.
And that's a wrap! Hi, I am Gourav Dhar, a software developer and I write blogs on Backend Development and System Design. Subscribe to my Newsletter and learn something new every week - https://thegeekyminds.com/subscribe
Comments