Replicate Data for High Availability
ScalarDB Cluster can replicate its managed data to remote sites for high availability and workload distribution. The remote replication feature provides near-real-time replication of write operations from a primary site to one or more backup sites.
This feature ensures business continuity by enabling failover to a backup site in the event of disasters or critical failures affecting the primary site. Additionally, the backup sites can function as read replicas, helping to offload analytical queries, reporting, and business intelligence workloads.
What is remote replication in ScalarDB?​
Remote replication in ScalarDB uses a hybrid approach, combining synchronous and asynchronous replication. This ensures zero data loss (a recovery point objective, or RPO, of zero) while minimizing performance impact at the primary site. The recovery time objective (RTO) can be flexibly adjusted by controlling the amount of computing resources. This feature is built on top of ScalarDB Cluster, making it cloud-agnostic and database-agnostic. This allows replication from one database in one cloud vendor to another, possibly different kind of database in a different cloud vendor.
Key benefits​
Remote replication provides several key advantages:
- Guarantees zero data loss (RPO of 0) for all committed transactions.
- Minimizes performance impact through the combination of synchronous and asynchronous processing.
- Enables backup site deployment in different regions, availability zones, or data centers from the primary site.
- Supports replication between different cloud service providers and database types.
- Provides built-in crash tolerance and automatic recovery mechanisms.
Architecture overview​
The following diagram illustrates the remote replication architecture:
Remote replication consists of the components and tools listed in this section.
Primary site components​
The primary site comprises three components: primary site database, client applications, and ScalarDB Cluster nodes. Each runs as follows:
- A primary site database contains the application tables used by client applications via ScalarDB Cluster.
- Client applications perform database operations.
- ScalarDB Cluster nodes manage transaction states in the Coordinator database and use a module called LogWriter to capture transaction operations and write them to the replication database.
Shared components (between primary and backup sites)​
Two components span the primary and backup sites: the Coordinator database and the replication database. These databases can be hosted in single database instances. However, because they maintain transaction information, it is required to replicate them across several sites as follows:
- A Coordinator database manages transaction states across the sites in a highly available way.
- A replication database stores transaction groups containing write operations from the primary site in a highly available way.
Backup site components​
The backup site comprises two components: backup site database and ScalarDB Cluster nodes. Each runs as follows:
- A backup site database contains the same application tables as the primary site. It also contains replication record metadata tables, which are internal tables that track replication metadata and unapplied operations. These tables are located in the same namespaces as the application tables (with the suffix
__recordsby default). - ScalarDB Cluster nodes use a module called LogApplier to apply replicated data. Specifically, LogApplier checks the Coordinator database for transaction states, reads and removes write operations from the replication database, calculates dependencies by using the replication record metadata tables, and applies operations to the backup site tables.
Administrative tools​
Remote replication uses the following administrative tools: Schema Loader and Replication CLI. Each runs as follows:
- Schema Loader creates the replication tables in the replication database by using the
--replication-tablesoption via ScalarDB Cluster endpoints. - Replication CLI monitors and administers ScalarDB Cluster nodes that replicate data through remote replication.
How remote replication works​
Remote replication employs a hybrid approach that combines synchronous and asynchronous replication to ensure zero data loss (RPO = 0) with minimal impact on performance at the primary site. It comprises two phases, the synchronous phase and the asynchronous phase, as follows:
- In the synchronous phase, write operations are copied from the primary site to the replication database during transaction commit.
- In the asynchronous phase, these operations are processed from the replication database and applied to the backup site tables.
The replication process follows these steps:
- When a transaction commits on the primary site, the LogWriter captures all write operations and stores them in the replication database.
- The LogApplier on the backup site continuously scans the replication database for new transaction data.
- The LogApplier checks the Coordinator database to verify transaction completion.
- The LogApplier orders and applies write operations based on transaction dependencies at the record level by using the replication record metadata tables.
- The LogApplier applies the processed operations to the backup site tables, with the replication record metadata tables updated to track progress.
Limitations and characteristics​
This section describes the limitations and characteristics of remote replication.
Private preview limitations​
The current private preview version has the following limitations, but they are going to be relaxed when it becomes public preview or general availability (GA):
- The specification may be changed in future releases.
- Multiple backup sites are not supported.
- Starting remote replication with restored data is not supported. Both primary and backup sites need to start from the beginning.
- This feature does not work with the one-phase commit optimization. This optimization must be disabled for replication to function properly.
- Creating the replication tables via ScalarDB SQL is not supported.
- The combination of the encryption feature and remote replication is not officially supported because it has not been verified.
Architectural limitations​
Remote replication has the following architectural limitations, which are inherently challenging to relax due to the architecture:
- Only transactions in read-only mode with the read-committed isolation level are permitted on backup sites until failover.
- DDL operations are not replicated. Schema changes must be applied manually to both primary and backup sites.
- You cannot use the two-phase commit interface if this feature is enabled.
- There may be a slight performance impact on the primary site, depending on replication database specifications and configurations.
Failure handling​
Remote replication is designed to handle various failure scenarios gracefully, ensuring data consistency and system availability where possible.
Primary site failure​
When the primary site becomes unavailable, all committed transactions are safely stored in the replication database or the backup site databases, ensuring zero data loss. The backup site can be promoted to serve application traffic once the LogApplier has processed all pending write operations from the replication database.
Backup site failure​
If the backup site fails, the primary site continues operating normally without any disruption to application traffic. However, write operations will accumulate in the replication database since LogApplier cannot process them. During extended backup site downtime, there is a risk that the replication database may reach capacity limits, so monitoring is important. Once the backup site is restored and LogApplier resumes, replication will catch up with the accumulated operations.
Coordinator database failure​
The impact of Coordinator database failure depends on its scope. In non-critical failures (such as a single region failure in a multi-region setup), remote replication continues operating normally. However, if the Coordinator database becomes completely unavailable, both ScalarDB Cluster in the primary site and LogApplier are affected: ScalarDB Cluster in the primary site cannot commit new transactions, and LogApplier cannot verify transaction states to proceed with replication.
Replication database failure​
Similar to Coordinator database failures, the impact depends on the scope of the failure. Non-critical failures (like single region failures in multi-region deployments) don't affect the remote replication operation. However, complete replication database unavailability prevents ScalarDB Cluster in the primary site from committing new transactions and blocks LogApplier from reading write operations, effectively halting both primary operations and replication processing.
Configuration​
This section describes the configuration options for remote replication in ScalarDB Cluster.
Base replication configurations​
These configurations apply to the overall replication setup:
partition_count​
- Field:
scalar.db.replication.partition_count - Description: Number of partitions for the
transaction_groupstable. The tables in the replication database are partitioned for performance and scalability, and write operations are distributed evenly across partitions. This field must be identical between primary and backup sites. Changing the partition count requires restarting the ScalarDB Clusters in both sites. - Default value:
256
repl_db.namespace​
- Field:
scalar.db.replication.repl_db.namespace - Description: Namespace name of replication tables. This field must be identical between primary and backup sites.
- Default value:
replication
record_table_suffix​
- Field:
scalar.db.replication.record_table_suffix - Description: Suffix for replication record metadata tables.
- Default value:
__records
LogWriter configurations (primary site)​
LogWriter configurations control how write operations are captured and stored in the replication database during transaction commits.
log_writer.enabled​
- Field:
scalar.db.replication.log_writer.enabled - Description: Enable or disable LogWriter functionality.
- Default value:
false
log_writer.compression_type​
- Field:
scalar.db.replication.log_writer.compression_type - Description: Compression type for stored write operations in the replication database. Available values:
NONE,GZIP. - Default value:
GZIP
log_writer.group_commit.retention.time_millis​
- Field:
scalar.db.replication.log_writer.group_commit.retention.time_millis - Description: Maximum time to wait before committing a transaction group for the replication database.
- Default value:
100
log_writer.group_commit.retention.values​
- Field:
scalar.db.replication.log_writer.group_commit.retention.values - Description: Maximum number of transactions to batch together for the replication database.
- Default value:
32
log_writer.group_commit.timeout_check_interval_millis​
- Field:
scalar.db.replication.log_writer.group_commit.timeout_check_interval_millis - Description: Interval for checking group commit timeouts for the replication database.
- Default value:
20
log_writer.group_commit.max_thread_pool_size​
- Field:
scalar.db.replication.log_writer.group_commit.max_thread_pool_size - Description: Maximum thread pool size for group commit processing for the replication database.
- Default value:
4096
LogApplier configurations (backup site)​
LogApplier configurations control how replication data is processed and applied to the backup site tables.