Version: 3.9

How to Back Up and Restore Databases Used Through ScalarDB

Since ScalarDB provides transaction capabilities on top of non-transactional or transactional databases non-invasively, you need to take special care to back up and restore the databases in a transactionally consistent way.

This guide describes how to back up and restore the databases that ScalarDB supports.

Create a backup

How you create a backup depends on which database you're using and whether or not you're using multiple databases. The following decision tree shows which approach you should take.

Back up without explicit pausing

If you're using ScalarDB with a single database with support for transactions, you can create a backup of the database even while ScalarDB continues to accept transactions.

warning

Before creating a backup, you should consider the safest way to create a transactionally consistent backup of your databases and understand any risks that are associated with the backup process.

One requirement for creating a backup in ScalarDB is that backups for all the ScalarDB-managed tables (including the Coordinator table) need to be transactionally consistent or automatically recoverable to a transactionally consistent state. That means that you need to create a consistent backup by dumping all tables in a single transaction.

How you create a transactionally consistent backup depends on the type of database that you're using. Select a database to see how to create a transactionally consistent backup for ScalarDB.

note

The backup methods by database listed below are just examples of some of the databases that ScalarDB supports.

Amazon RDS or Azure Database for MySQL or PostgreSQL
MySQL
PostgreSQL
SQLite

You can restore to any point within the backup retention period by using the automated backup feature.

Use the mysqldump command with the --single-transaction option.

Use the pg_dump command.

Use the .backup command with the .timeout command as specified in Special commands to sqlite3 (dot-commands)

For an example, see BASH: SQLite3 .backup command.

Back up with explicit pausing

Another way to create a transactionally consistent backup is to create a backup while a cluster of ScalarDB instances does not have any outstanding transactions. Creating the backup depends on the following:

If the underlying database has a point-in-time snapshot or backup feature, you can create a backup during the period when no outstanding transactions exist.
If the underlying database has a point-in-time restore or recovery (PITR) feature, you can set a restore point to a time (preferably the mid-time) in the pause duration period when no outstanding transactions exist.

note

When using a PITR feature, you should minimize the clock drifts between clients and servers by using clock synchronization, such as NTP. Otherwise, the time you get as the paused duration might be too different from the time in which the pause was actually conducted, which could restore the backup to a point where ongoing transactions exist.

In addition, you should pause for a sufficient amount of time (for example, five seconds) and use the mid-time of the paused duration as a restore point since clock synchronization cannot perfectly synchronize clocks between nodes.

To make ScalarDB drain outstanding requests and stop accepting new requests so that a pause duration can be created, you should implement the Scalar Admin interface properly in your application that uses ScalarDB or use ScalarDB Cluster (redirects to the Enterprise docs site), which implements the Scalar Admin interface.

By using the Scalar Admin client tool, you can pause nodes, servers, or applications that implement the Scalar Admin interface without losing ongoing transactions.

How you create a transactionally consistent backup depends on the type of database that you're using. Select a database to see how to create a transactionally consistent backup for ScalarDB.

note

The backup methods by database listed below are just examples of some of the databases that ScalarDB supports.

DynamoDB
Cosmos DB for NoSQL
Cassandra

You must enable the PITR feature for DynamoDB tables. If you're using ScalarDB Schema Loader to create schemas, the tool enables the PITR feature for tables by default.

To specify a transactionally consistent restore point, pause your application that is using ScalarDB with DynamoDB as described in Back up with explicit pausing.

Cassandra has a built-in replication feature, so you do not always have to create a transactionally consistent backup. For example, if the replication factor is set to 3 and only the data of one of the nodes in a Cassandra cluster is lost, you won't need a transactionally consistent backup (snapshot) because the node can be recovered by using a normal, transactionally inconsistent backup (snapshot) and the repair feature.

However, if the quorum of cluster nodes loses their data, you will need a transactionally consistent backup (snapshot) to restore the cluster to a certain transactionally consistent point.

To create a transactionally consistent cluster-wide backup (snapshot), pause the application that is using ScalarDB or ScalarDB Cluster (redirects to the Enterprise docs site) and create backups (snapshots) of the nodes as described in Back up with explicit pausing or stop the Cassandra cluster, take copies of all the data in the nodes, and start the cluster.

Restore a backup

How you restore a transactionally consistent backup depends on the type of database that you're using. Select a database to see how to create a transactionally consistent backup for ScalarDB.

note

The restore methods by database listed below are just examples of some of the databases that ScalarDB supports.

Amazon RDS or Azure Database for MySQL or PostgreSQL
Cassandra
Cosmos DB for NoSQL
DynamoDB
MySQL
PostgreSQL
SQLite

You can restore to any point within the backup retention period by using the automated backup feature.

First, stop all the nodes of the Cassandra cluster. Then, clean the data, commitlog, and hints directories, and place the backups (snapshots) in each node.

After placing the backups (snapshots) in each node, start all the nodes of the Cassandra Cluster.

Follow the official Azure documentation for restore an account by using Azure portal. After restoring a backup, configure the default consistency level of the restored databases to STRONG. In addition, you should use the mid-time of the paused duration as the restore point as previously explained.

ScalarDB implements the Cosmos DB adapter by using its stored procedures, which are installed when creating schemas by using ScalarDB Schema Loader. However, the PITR feature of Cosmos DB doesn't restore stored procedures. Because of this, you need to re-install the required stored procedures for all tables after restoration. You can do this by using ScalarDB Schema Loader with the --repair-all option. For details, see Repair tables.

If you used mysqldump to create the backup file, use the mysql command to restore the backup as specified in Reloading SQL-Format Backups.

If you used pg_dump to create the backup file, use the psql command to restore the backup as specified in Restoring the Dump.

Use the .restore command as specified in Special commands to sqlite3 (dot-commands).

How to Back Up and Restore Databases Used Through ScalarDB

Create a backup​

Back up without explicit pausing​

Back up with explicit pausing​

Restore a backup​

Create a backup

Back up without explicit pausing

Back up with explicit pausing

Restore a backup