Version: 3.16

ScalarDB Benchmarking Tools

This tutorial describes how to run benchmarking tools for ScalarDB. Database benchmarking is helpful for evaluating how databases perform against a set of standards.

Benchmark workloads

TPC-C
YCSB (Workloads A, C, and F)
Multi-storage YCSB (Workloads C and F)
- This YCSB variant is for a multi-storage environment that uses ScalarDB.
- Workers in a multi-storage YCSB execute the same number of read and write operations in two namespaces: ycsb_primary and ycsb_secondary.

Prerequisites

One of the following Java Development Kits (JDKs):
- Oracle JDK LTS version 8
- OpenJDK LTS version 8
Gradle
Kelpie
- Kelpie is a framework for performing end-to-end testing, such as system benchmarking and verification. Get the latest version from Kelpie Releases, and unzip the archive file.
A client to run the benchmarking tools
A target database
- For a list of databases that ScalarDB supports, see Databases.

note

Currently, only JDK 8 can be used when running the benchmarking tools.

Set up the benchmarking tools

The following sections describe how to set up the benchmarking tools.

Clone the ScalarDB benchmarks repository

Open Terminal, then clone the ScalarDB benchmarks repository by running the following command:

git clone https://github.com/scalar-labs/scalardb-benchmarks

Then, go to the directory that contains the benchmarking files by running the following command:

cd scalardb-benchmarks

Build the tools

To build the benchmarking tools, run the following command:

./gradlew shadowJar

Load the schema

Before loading the initial data, the tables must be defined by using the ScalarDB Schema Loader. You can download the ScalarDB Schema Loader on the ScalarDB Releases page. Select the Schema Loader based on how you access ScalarDB:

Using the ScalarDB Core library (Community edition)?: Choose scalardb-schema-loader-<VERSION>.jar for the version of ScalarDB that you're using. Then, save the .jar file in the scalardb-benchmarks root directory.
Using ScalarDB Cluster (Enterprise edition)?: Choose scalardb-cluster-schema-loader-<VERSION>-all.jar for the version of ScalarDB Cluster that you're using. Then, save the .jar file in the scalardb-benchmarks root directory.

In addition, you need a properties file for accessing ScalarDB via the Java CRUD interface. For details about configuring the ScalarDB properties file, see ScalarDB Core Configurations or the Client configurations section in ScalarDB Cluster Configurations.

After applying the schema and configuring the properties file, select a benchmark and follow the instructions to create the tables.

TPC-C
YCSB
Multi-storage YCSB

To create tables for TPC-C benchmarking (tpcc-schema.json), run the following command, replacing the contents in the angle brackets as described:

java -jar scalardb-schema-loader-<VERSION>.jar --config <PATH_TO_SCALARDB_PROPERTIES_FILE> -f tpcc-schema.json --coordinator

If you are using ScalarDB Cluster, run the following command instead:

java -jar scalardb-cluster-schema-loader-<VERSION>-all.jar --config <PATH_TO_SCALARDB_PROPERTIES_FILE> -f tpcc-schema.json --coordinator

To create tables for YCSB benchmarking (ycsb-schema.json), run the following command, replacing the contents in the angle brackets as described:

java -jar scalardb-schema-loader-<VERSION>.jar --config <PATH_TO_SCALARDB_PROPERTIES_FILE> -f ycsb-schema.json --coordinator

If you are using ScalarDB Cluster, run the following command instead:

java -jar scalardb-cluster-schema-loader-<VERSION>-all.jar --config <PATH_TO_SCALARDB_PROPERTIES_FILE> -f ycsb-schema.json --coordinator

To create tables for multi-storage YCSB benchmarking (ycsb-multi-storage-schema.json), run the following command, replacing the contents in the angle brackets as described:

java -jar scalardb-schema-loader-<VERSION>.jar --config <PATH_TO_SCALARDB_PROPERTIES_FILE> -f ycsb-multi-storage-schema.json --coordinator

If you are using ScalarDB Cluster, run the following command instead:

java -jar scalardb-cluster-schema-loader-<VERSION>-all.jar --config <PATH_TO_SCALARDB_PROPERTIES_FILE> -f ycsb-multi-storage-schema.json --coordinator

Prepare a benchmarking configuration file

To run a benchmark, you must first prepare a benchmarking configuration file. The configuration file requires at least the locations of the workload modules to run and the database configuration.

The following is an example configuration for running the TPC-C benchmark. The ScalarDB properties file specified for config_file should be the properties file that you created as one of the steps in Load the schema.

note

Alternatively, instead of using the ScalarDB properties file, you can specify each database configuration item in the .toml file. If config_file is specified, all other configurations under [database_config] will be ignored even if they are uncommented.

[modules]
[modules.preprocessor]
name = "com.scalar.db.benchmarks.tpcc.TpccLoader"
path = "./build/libs/scalardb-benchmarks-all.jar"
[modules.processor]
name = "com.scalar.db.benchmarks.tpcc.TpccBench"
path = "./build/libs/scalardb-benchmarks-all.jar"
[modules.postprocessor]
name = "com.scalar.db.benchmarks.tpcc.TpccReporter"
path = "./build/libs/scalardb-benchmarks-all.jar"

[database_config]
config_file = "<PATH_TO_SCALARDB_PROPERTIES_FILE>"
#contact_points = "localhost"
#contact_port = 9042
#username = "cassandra"
#password = "cassandra"
#storage = "cassandra"

You can define parameters to pass to modules in the configuration file. For details, see the sample configuration files below and available parameters in Common parameters:

TPC-C: tpcc-benchmark-config.toml
YCSB: ycsb-benchmark-config.toml
Multi-storage YCSB: ycsb-multi-storage-benchmark-config.toml

Run a benchmark

Select a benchmark, and follow the instructions to run the benchmark.

TPC-C
YCSB
Multi-storage YCSB

To run the TPC-C benchmark, run the following command, replacing <PATH_TO_KELPIE> with the path to the Kelpie directory:

/<PATH_TO_KELPIE>/bin/kelpie --config tpcc-benchmark-config.toml

To run the YCSB benchmark, run the following command, replacing <PATH_TO_KELPIE> with the path to the Kelpie directory:

/<PATH_TO_KELPIE>/bin/kelpie --config ycsb-benchmark-config.toml

To run the multi-storage YCSB benchmark, run the following command, replacing <PATH_TO_KELPIE> with the path to the Kelpie directory:

/<PATH_TO_KELPIE>/bin/kelpie --config ycsb-multi-storage-benchmark-config.toml

In addition, the following options are available:

--only-pre. Only loads the data.
--only-process. Only runs the benchmark.
--except-pre Runs a job without loading the data.
--except-process. Runs a job without running the benchmark.

Common parameters

The following parameters are common to all workloads.

`concurrency`

Description: Number of worker threads that concurrently execute benchmark transactions against the database. This parameter controls the level of parallelism during the actual benchmark execution phase. Increasing this value simulates more concurrent client accesses and higher workload intensity.
Default value: 1

`run_for_sec`

Description: Duration of the benchmark execution phase (in seconds). This parameter defines how long the benchmark will run and submit transactions to the database.
Default value: 60

`ramp_for_sec`

Description: Duration of the ramp-up period before the benchmark measurement phase begins (in seconds). During this warm-up period, the system executes transactions but does not record performance metrics. This allows the system to reach a steady state before collecting benchmark results.
Default value: 0

Workload-specific parameters

Select a benchmark to see its available workload parameters.

TPC-C
YCSB and multi-storage YCSB

`num_warehouses`

Description: Number of warehouses to create for the TPC-C benchmark workload. This value is the scale factor that determines the dataset size. Increasing this value creates a larger working set and enables enterprise-scale testing.
Default value: 1

`load_concurrency`

Description: Number of parallel threads used to load initial benchmark data into the database. This parameter controls how fast the data-loading phase completes. Increasing this value can significantly reduce data-loading time, especially for larger numbers of warehouses. This is separate from the concurrency parameter used during benchmark execution.
Default value: 1

`load_start_warehouse`

Description: Start ID of the warehouse range to load. This option can be useful with skip_item_load when loading large-scale data with multiple clients or adding additional warehouses.
Default value: 1

`load_end_warehouse`

Description: End ID of the warehouse range to load. You can use either num_warehouses or load_end_warehouse to specify the number of loading warehouses.
Default value: 1

`skip_item_load`

Description: If set to true, skips loading the item table. This can be useful when loading data with multiple clients, as the item table is shared across all warehouses and only needs to be loaded once.
Default value: false

`use_table_index`

Description: If set to true, uses a table-based secondary index instead of the secondary index of ScalarDB. The table-based secondary index builds its index using regular ScalarDB tables, allowing efficient lookups of specific customers or orders via partition keys. In contrast, the ScalarDB secondary index leverages the native secondary index feature of the underlying database, and thus its behavior depends on its implementation. Since secondary indexes in NoSQL databases such as Cassandra often require careful handling from a performance perspective, this option allows you to observe the performance characteristics of your target database when running workloads that involve secondary indexes.
Default value: false

`np_only`

Description: If set to true, runs the benchmark with only New-Order and Payment transactions (50% each), excluding other TPC-C transaction types. This setting can be useful for focused performance testing under a write-heavy workload without having long-running reads.
Default value: false

`rate_new_order`

Description: Percentage of New-Order transactions in the transaction mix. When specifying this percentage based on your needs, you must specify the percentages for all other rate parameters. In that case, the total of all rate parameters must equal 100%.
Default value: N/A

`rate_payment`

Description: Percentage of Payment transactions in the transaction mix. When specifying this percentage based on your needs, you must specify the percentages for all other rate parameters. In that case, the total of all rate parameters must equal 100%.
Default value: N/A

`rate_order_status`

Description: Percentage of Order-Status transactions in the transaction mix. When specifying this percentage based on your needs, you must specify the percentages for all other rate parameters. In that case, the total of all rate parameters must equal 100%.
Default value: N/A

`rate_delivery`

Description: Percentage of Delivery transactions in the transaction mix. When specifying this percentage based on your needs, you must specify the percentages for all other rate parameters. In that case, the total of all rate parameters must equal 100%.
Default value: N/A

`rate_stock_level`

Description: Percentage of Stock-Level transactions in the transaction mix. When specifying this percentage based on your needs, you must specify the percentages for all other rate parameters. In that case, the total of all rate parameters must equal 100%.
Default value: N/A

`backoff`

Description: Sleep time in milliseconds inserted after a transaction is aborted due to a conflict. This parameter can help reduce contention by introducing a delay before retrying failed transactions.
Default value: 0

`load_concurrency`

Description: Number of parallel threads used to load initial benchmark data into the database. This parameter controls how fast the initial data-loading phase completes. Increasing this value can significantly reduce data-loading time for large datasets. This is separate from the concurrency parameter used during benchmark execution.
Default value: 1

`load_batch_size`

Description: Number of records to insert within a single transaction during the initial data-loading phase. Larger batch sizes can improve loading performance by reducing the number of transactions, but may increase the execution time of each transaction.
Default value: 1

`load_overwrite`

Description: If set to true, overwrites existing records when loading data. When set to true, the initial data-loading phase will update existing records instead of failing on conflicts.
Default value: false

`ops_per_tx`

Description: Number of read or write operations to execute within a single transaction. This parameter affects transaction size and execution time. Higher values create longer-running transactions.
Default value: 2 (Workloads A and C), 1 (Workload F)

`record_count`

Description: Number of records to create for the YCSB benchmark workload. This parameter determines the size of the dataset and affects the working-set size during benchmark execution.
Default value: 1000

`use_read_modify_write`

Description: If set to true, uses read-modify-write operations instead of blind writes in Workload A. The default value is true because ScalarDB doesn't allow a blind write for an existing record when you're using the default transaction manager, Consensus Commit. Note that the original Workload A assumes false.
Default value: true

Benchmark workloads​

Prerequisites​

Set up the benchmarking tools​

Clone the ScalarDB benchmarks repository​

Build the tools​

Load the schema​

Prepare a benchmarking configuration file​

Run a benchmark​

Common parameters​

concurrency​

run_for_sec​

ramp_for_sec​

Workload-specific parameters​

num_warehouses

load_concurrency

load_start_warehouse

load_end_warehouse

skip_item_load

use_table_index

np_only

rate_new_order

rate_payment

rate_order_status

rate_delivery

rate_stock_level

backoff

load_concurrency

load_batch_size

load_overwrite

ops_per_tx

record_count

use_read_modify_write

Benchmark workloads

Prerequisites

Set up the benchmarking tools

Clone the ScalarDB benchmarks repository

Build the tools

Load the schema

Prepare a benchmarking configuration file

Run a benchmark

Common parameters

`concurrency`

`run_for_sec`

`ramp_for_sec`

Workload-specific parameters

`num_warehouses`

`load_concurrency`

`load_start_warehouse`

`load_end_warehouse`

`skip_item_load`

`use_table_index`

`np_only`

`rate_new_order`

`rate_payment`

`rate_order_status`

`rate_delivery`

`rate_stock_level`

`backoff`

`load_concurrency`

`load_batch_size`

`load_overwrite`

`ops_per_tx`

`record_count`

`use_read_modify_write`