Getting Started with ScalarDB Analytics
This getting-started tutorial guide explains how to set up ScalarDB Analytics and run federated queries across different databases, including PostgreSQL, MySQL, and Cassandra. For an overview of ScalarDB Analytics and its key benefits, refer to the ScalarDB Overview and ScalarDB Design pages.
What you'll build​
In this tutorial, you'll set up a sample e-commerce analytics environment where:
- Customer data resides in PostgreSQL
- Order data is managed by ScalarDB in MySQL
- Line item details are stored in Cassandra, which are updated through ScalarDB transactions
You'll run analytical queries that join data across all three databases to gain business insights. The source code is available at https://github.com/scalar-labs/scalardb-samples/tree/main/scalardb-analytics-sample.
Prerequisites​
- Docker 20.10 or later with Docker Compose V2 or later
You need to have a license key (trial license or commercial license) to use ScalarDB Analytics. If you don't have a license key, please contact us.
Step 1: Set up the environment​
This section describes how to set up a ScalarDB Analytics environment.
Clone the repository​
Open Terminal, and clone the ScalarDB samples repository:
git clone https://github.com/scalar-labs/scalardb-samples
cd scalardb-samples/scalardb-analytics-sample
Configure your license​
To add your ScalarDB Analytics license, open config/scalardb-analytics-server.properties. Then, uncomment and update the license configuration lines, replacing <YOUR_LICENSE_KEY> and <YOUR_LICENSE_CERT_PEM> with your actual license information:
# License configuration (required for production)
scalar.db.analytics.server.licensing.license_key=<YOUR_LICENSE_KEY>
scalar.db.analytics.server.licensing.license_check_cert_pem=<YOUR_LICENSE_CERT_PEM>
Step 2: Set up the sample databases​
To set up the sample databases, run the following command:
docker compose up -d --wait
This command starts the following services locally:
- ScalarDB Analytics components:
- ScalarDB Analytics server: Manages metadata about all data sources and provides a unified interface for querying.
- Sample databases:
- PostgreSQL: Used as a non-ScalarDB-managed database (accessed directly)
- Cassandra and MySQL: Used as ScalarDB-managed databases (accessed through ScalarDB's transaction layer)
In this guide, PostgreSQL is referred to as a non-ScalarDB-managed database, which is not managed by ScalarDB transactions, while Cassandra and MySQL are referred to as ScalarDB-managed databases, which are managed by ScalarDB transactions.
The sample data is automatically loaded into all databases during the initial setup. After completing the setup, the following tables should be available:
- In PostgreSQL:
sample_ns.customer
- In ScalarDB (backed by Cassandra):
cassandrans.lineitem
- In ScalarDB (backed by MySQL):
mysqlns.orders
According to the above, within ScalarDB, cassandrans and mysqlns are mapped to Cassandra and MySQL, respectively.
For details about the table schema, including column definitions and data types, refer to Schema details. Ensure that the sample data has been successfully loaded into these tables.