Requirements and Recommendations for the Underlying Databases of ScalarDB
This document explains the requirements and recommendations in the underlying databases of ScalarDB to make ScalarDB applications work correctly and efficiently.
Requirements​
ScalarDB requires each underlying database to provide certain capabilities to run transactions and analytics on the databases. This document explains the general requirements and how to configure each database to achieve the requirements.
General requirements​
Transactions​
ScalarDB requires each underlying database to provide at least the following capabilities to run transactions on the databases:
- Linearizable read and conditional mutations (write and delete) on a single database record.
- Durability of written database records.
- Ability to store arbitrary data besides application data in each database record.
Analytics​
ScalarDB requires each underlying database to provide the following capability to run analytics on the databases:
- Ability to return only committed records.
You need to have database accounts that have enough privileges to access the databases through ScalarDB since ScalarDB runs on the underlying databases not only for CRUD operations but also for performing operations like creating or altering schemas, tables, or indexes. ScalarDB basically requires a fully privileged account to access the underlying databases.
How to configure databases to achieve the general requirements​
Select your database for details on how to configure it to achieve the general requirements.
- JDBC databases
- DynamoDB
- Cosmos DB for NoSQL
- Cassandra
Transactions
- Use a single primary server or synchronized multi-primary servers for all operations (no read operations on read replicas that are asynchronously replicated from a primary database).
- Use read-committed or stricter isolation levels.
Analytics
- Use read-committed or stricter isolation levels.
Transactions
- Use a single primary region for all operations. (No read and write operations on global tables in non-primary regions.)
- There is no concept for primary regions in DynamoDB, so you must designate a primary region by yourself.
Analytics
- Not applicable. DynamoDB always returns committed records, so there are no DynamoDB-specific requirements.
Transactions
- Use a single primary region for all operations with
Strong
orBounded Staleness
consistency.
Analytics
- Not applicable. Cosmos DB always returns committed records, so there are no Cosmos DB–specific requirements.
Transactions
- Use a single primary cluster for all operations (no read or write operations in non-primary clusters).
- Use
batch
orgroup
forcommitlog_sync
. - If you're using Cassandra-compatible databases, those databases must properly support lightweight transactions (LWT).
Analytics
- Not applicable. Cassandra always returns committed records, so there are no Cassandra-specific requirements.
Recommendations​
Properly configuring each underlying database of ScalarDB for high performance and high availability is recommended. The following recommendations include some knobs and configurations to update.
ScalarDB can be seen as an application of underlying databases, so you may want to try updating other knobs and configurations that are commonly used to improve efficiency.
- JDBC databases
- DynamoDB
- Cosmos DB for NoSQL
- Cassandra
- Use read-committed isolation for better performance.
- Follow the performance optimization best practices for each database. For example, increasing the buffer size (for example,
shared_buffers
in PostgreSQL) and increasing the number of connections (for example,max_connections
in PostgreSQL) are usually recommended for better performance.
- Increase the number of read capacity units (RCUs) and write capacity units (WCUs) for high throughput.
- Enable point-in-time recovery (PITR).
Since DynamoDB stores data in multiple availability zones by default, you don’t need to adjust any configurations to improve availability.
- Increase the number of Request Units (RUs) for high throughput.
- Enable point-in-time restore (PITR).
- Enable availability zones.
- Increase
concurrent_reads
andconcurrent_writes
for high throughput. For details, see the official Cassandra documentation aboutconcurrent_writes
.