How to Back Up and Restore Databases Used Through ScalarDB
Since ScalarDB provides transaction capabilities on top of non-transactional or transactional databases non-invasively, you need to take special care to back up and restore the databases in a transactionally consistent way.
This guide describes how to back up and restore the databases that ScalarDB supports.
Create a backup
How you create a backup depends on which database you're using and whether or not you're using multiple databases. The following decision tree shows which approach you should take.
Back up without explicit pausing
If you're using ScalarDB with a single database with support for transactions, you can create a backup of the database even while ScalarDB continues to accept transactions.
Before creating a backup, you should consider the safest way to create a transactionally consistent backup of your databases and understand any risks that are associated with the backup process.
One requirement for creating a backup in ScalarDB is that backups for all the ScalarDB-managed tables (including the Coordinator table) need to be transactionally consistent or automatically recoverable to a transactionally consistent state. That means that you need to create a consistent backup by dumping all tables in a single transaction.
How you create a transactionally consistent backup depends on the type of database that you're using. Select a database to see how to create a transactionally consistent backup for ScalarDB.
The backup methods by database listed below are just examples of some of the databases that ScalarDB supports.
- Amazon RDS or Azure Database for MySQL or PostgreSQL
- MySQL
- PostgreSQL
- SQLite
You can restore to any point within the backup retention period by using the automated backup feature.
Use the mysqldump
command with the --single-transaction
option.
Use the pg_dump
command.
Use the .backup
command with the .timeout
command as specified in Special commands to sqlite3 (dot-commands)
For an example, see BASH: SQLite3 .backup command.
Back up with explicit pausing
Another way to create a transactionally consistent backup is to create a backup while a cluster of ScalarDB instances does not have any outstanding transactions. Creating the backup depends on the following:
- If the underlying database has a point-in-time snapshot or backup feature, you can create a backup during the period when no outstanding transactions exist.
- If the underlying database has a point-in-time restore or recovery (PITR) feature, you can set a restore point to a time (preferably the mid-time) in the pause duration period when no outstanding transactions exist.
When using a PITR feature, you should minimize the clock drifts between clients and servers by using clock synchronization, such as NTP. Otherwise, the time you get as the paused duration might be too different from the time in which the pause was actually conducted, which could restore the backup to a point where ongoing transactions exist.
In addition, you should pause for a sufficient amount of time (for example, five seconds) and use the mid-time of the paused duration as a restore point since clock synchronization cannot perfectly synchronize clocks between nodes.
To make ScalarDB drain outstanding requests and stop accepting new requests so that a pause duration can be created, you should implement the Scalar Admin interface properly in your application that uses ScalarDB or use ScalarDB Server, which implements the Scalar Admin interface.
By using the Scalar Admin client tool, you can pause nodes, servers, or applications that implement the Scalar Admin interface without losing ongoing transactions.
How you create a transactionally consistent backup depends on the type of database that you're using. Select a database to see how to create a transactionally consistent backup for ScalarDB.
The backup methods by database listed below are just examples of some of the databases that ScalarDB supports.
- Cassandra
- Cosmos DB for NoSQL
- DynamoDB
Cassandra has a built-in replication feature, so you do not always have to create a transactionally consistent backup. For example, if the replication factor is set to 3
and only the data of one of the nodes in a Cassandra cluster is lost, you won't need a transactionally consistent backup (snapshot) because the node can be recovered by using a normal, transactionally inconsistent backup (snapshot) and the repair feature.
However, if the quorum of cluster nodes loses their data, you will need a transactionally consistent backup (snapshot) to restore the cluster to a certain transactionally consistent point.
To create a transactionally consistent cluster-wide backup (snapshot), pause the application that is using ScalarDB or ScalarDB Server and create backups (snapshots) of the nodes as described in Back up with explicit pausing or stop the Cassandra cluster, take copies of all the data in the nodes, and start the cluster.
You must create a Cosmos DB for NoSQL account with a continuous backup policy that has the PITR feature enabled. After enabling the feature, backups are created continuously.
To specify a transactionally consistent restore point, pause your application that is using ScalarDB with Cosmos DB for NoSQL as described in Back up with explicit pausing.
You must enable the PITR feature for DynamoDB tables. If you're using ScalarDB Schema Loader to create schemas, the tool enables the PITR feature for tables by default.
To specify a transactionally consistent restore point, pause your application that is using ScalarDB with DynamoDB as described in Back up with explicit pausing.
Restore a backup
How you restore a transactionally consistent backup depends on the type of database that you're using. Select a database to see how to create a transactionally consistent backup for ScalarDB.
The restore methods by database listed below are just examples of some of the databases that ScalarDB supports.
- Amazon RDS or Azure Database for MySQL or PostgreSQL
- Cassandra
- Cosmos DB for NoSQL
- DynamoDB
- MySQL
- PostgreSQL
- SQLite
You can restore to any point within the backup retention period by using the automated backup feature.
First, stop all the nodes of the Cassandra cluster. Then, clean the data
, commitlog
, and hints
directories, and place the backups (snapshots) in each node.
After placing the backups (snapshots) in each node, start all the nodes of the Cassandra Cluster.
Follow the official Azure documentation for restore an account by using Azure portal. After restoring a backup, configure the default consistency level of the restored databases to STRONG
. In addition, you should use the mid-time of the paused duration as the restore point as previously explained.
ScalarDB implements the Cosmos DB adapter by using its stored procedures, which are installed when creating schemas by using ScalarDB Schema Loader. However, the PITR feature of Cosmos DB doesn't restore stored procedures. Because of this, you need to re-install the required stored procedures for all tables after restoration. You can do this by using ScalarDB Schema Loader with the --repair-all
option. For details, see Repair tables.
Follow the official AWS documentation for restoring a DynamoDB table to a point in time, but keep in mind that a table can only be restored with an alias. Because of this, you will need to restore the table with an alias, delete the original table, and rename the alias to the original name to restore the tables with the same name.
To do this procedure:
- Create a backup.
- Select the mid-time of the paused duration as the restore point.
- Restore by using the PITR of table A to table B.
- Create a backup of the restored table B (assuming that the backup is named backup B).
- Remove table B.
- Restore the backup.
- Remove table A.
- Create a table named A by using backup B.
- You must do the steps mentioned above for each table because tables can only be restored one at a time.
- Configurations such as PITR and auto-scaling policies are reset to the default values for restored tables, so you must manually configure the required settings. For details, see the official AWS documentation for How to restore DynamoDB tables with DynamoDB.
If you used mysqldump
to create the backup file, use the mysql
command to restore the backup as specified in Reloading SQL-Format Backups.
If you used pg_dump
to create the backup file, use the psql
command to restore the backup as specified in Restoring the Dump.
Use the .restore
command as specified in Special commands to sqlite3 (dot-commands).