Understanding Database Replication

Lurdhu Reddy Ponnapati
4 min readJul 6, 2021

Introduction:

Let’s say you are starting a company and chose “X” as your database for storing all your application data (e.g. Users, Products etc). Over the period of time your company gets bigger and bigger with a lot of user base across the world. As the user base increases, the size of the data that you need to store/maintain also increases. Let’s say your database has 1 million records. Let’s say 5000 users want to retrieve their data concurrently.

Now your database “X” has to handle all 5000 requests and process them. This certainly slows down the system as the load increases on the single database server node. If your database crashes for some reason, your entire system goes down. So we need to solve 2 problems here.

1.System slowness

2.Database crashes

The above 2 problems can be solved using Database Replication.

What is Database Replication?

Database Replication is the technique/Design principle which is used to scale the databases by maintaining copies of table data in multiple database server instances as shown in the below picture.

There are 2 types of data replication with respect to the timing of the data transfer.

Let’s say we have 3 database servers in our distributed system.

Synchronous

In Synchronous replication, Client sends data to Server A and the data is replicated to Server B and Server C. Once data is replicated to all servers then Server A sends acknowledgement to the Client.

Pros

This technique makes sure that the data is replicated in all database servers and data is consistent across servers.

Cons

  1. One issue here is Server A takes some extra time to respond to clients as it has to wait until data is replicated in all other servers.
  2. If any one of the servers doesn’t acknowledge then the transaction will be rolled back from all other servers.

Asynchronous

In Asynchronous replication, Client sends data to Server A and Server A sends acknowledgement to the Client. Once acknowledgement is sent to the client then the data is replicated to Server B and Server C asynchronously.

Pros

Response is given to the client immediately.

Cons

One issue here is that there could be a chance that data may not be replicated to Server B and Server C due to network issues

Types of Replication Environment

Multi-Master Replication

As shown in the above picture in the multi-master replication model, each node acts as a master node and the client can write the data to any node. Data from one node is replicated to another node either Synchronous way or Asynchronous way.

In Asynchronous replication approach there is a chance of data conflicts as the same row might have been updated at multiple master nodes. Oracle database uses data conflicts resolution concepts (https://docs.oracle.com/database/121/REPLN/repconflicts.htm#REPLN005) to resolve the issues.

Master-Slave Replication

As we see in the below picture, Client only writes to the master node. Master node replicates the data among slaves either synchronous way or asynchronous way. Clients can read from all the nodes in the system.

MongoDB

Mongo db uses Master-slave replication technique as shown in the below picture

If the master node goes down, then one of the slaves will be elected as master in order to make the system available to clients. If the master node is unable to communicate to slaves for 10 seconds, then it is considered as the master node is down and leader election happens. Typically in Mongo db it takes 12 seconds to elect a new leader. This 12 seconds includes time required to mark master node as unavailable and leader election.

During the election process, Client can not execute writes as the master node is down but can perform read operations on the slaves.

Load Balancing Replicas:

In Mongo db, Clients read data from Primary node by default. We can load balance read operations by routing requests to secondary nodes. Mongo driver knows which node is primary and which nodes are secondary. We just need to tell the mongo driver that we want to perform read operations on the secondary nodes. Command for changing the read preference is

db.getMongo().setReadPref(‘secondary’)

--

--