Database replication involves copying, transferring, or integrating data from one database in a server or computer to another, eventually creating a distributed database. After data replication takes place, users have access to the same information, which improves consistency, reliability, and performance.
How Does Database Replication Work?
Data replication is a technique that involves the copying, transferring or integration of a partial or complete copy of a database to a receiving database. This is known as partial or full replication, respectively.
Data replication can either happen once or it can be a continuous process. The result is one or more distributed databases, where users have access to the same information across all database nodes.
Data replication works like this:
- A distributed database management system (DDBMS) replicates and distributes (or "syncs") data from one database to one or more receiving databases.
- The DDBMS ensures changes made to data in the original database reflect changes in the replicated database(s).
- The DDBMS shares the replicated database(s) over one or more physical machines.
- The result is one or more distributed databases.
- Users access the same information from the distributed database as the original database.
Note that, in a data replication context:
- The original database is called the "Publisher."
- The replicated database is called the "Subscriber."
Change Data Capture (CDC), which typically takes place during data replication, identifies and captures changes made to a database. Users then apply these changes to a new data repository or a data integration tool like Extract, Transform, Load (ETL).
Data Replication Types
Data replication has three types.
1) Transactional Replication
The DDBMS replicates changes (or "transactions") made to the original database on the receiving database in a sequence in near-real-time. Users on the replicated database(s) experience changes made to the original database almost instantly.
2) Snapshot Replication
The DDBMS captures a "snapshot" of data from the original database and overwrites it on the receiving database via the same server.
3) Merge Replication
The DDBMS merges data from two or more databases and combines it into a new receiving database.
Data Replication Benefits and Challenges
Organizations use data replication to:
- Improve read performance
- Improve disaster recovery
- Make data available to other employees
- Make data more durable
- Make applications more reliable
Data replication can also make it easier to analyze data.
Challenges exist when syncing data from the original database to the replicated database. All replicated databases need to "agree" with the original database, so organizations require the right technologies. Otherwise, data loss and/or data inconsistency occurs.
Partial replication — where organizations replicate selected database elements elsewhere — can cause "fragmentation," where data values don't sync correctly.