Create a ScalarDB Analytics Catalog
This guide explains how to create a ScalarDB Analytics catalog. The ScalarDB Analytics catalog serves as the central hub that organizes information from various underlying data sources, including database schemas and contact points, enabling you to run analytical queries across these data sources through a unified interface. This information is referred to as catalog information.
You need to have a license key (trial license or commercial license) to use ScalarDB Analytics. If you don't have a license key, please contact us.
Set up ScalarDB Analytics server​
Catalog information is managed by a component called a ScalarDB Analytics server. So, you first need to set up a ScalarDB Analytics server. The ScalarDB Analytics server also performs several other tasks, such as collecting usage metering information and storing it in a file system or cloud blob storage.
Prerequisites​
The ScalarDB Analytics server requires a database to store catalog information. We refer to this database as the metadata database throughout this documentation. ScalarDB Analytics supports the following databases for the metadata database:
- PostgreSQL
- MySQL
- SQL Server
- Oracle
Create a database and user with appropriate privileges before starting the ScalarDB Analytics server. The specific commands vary by database type.
Configure the ScalarDB Analytics server​
Create a ScalarDB Analytics server configuration file (for example, scalardb-analytics-server.properties
). The following example uses PostgreSQL as the metadata database:
# Metadata database configuration (required)
scalar.db.analytics.server.db.url=jdbc:postgresql://localhost:5432/scalardb_analytics
scalar.db.analytics.server.db.username=analytics_user
scalar.db.analytics.server.db.password=your_secure_password
# gRPC server configuration (optional)
scalar.db.analytics.server.catalog.port=11051 # default
scalar.db.analytics.server.metering.port=11052 # default
# TLS configuration (optional but recommended for production)
scalar.db.analytics.server.tls.enabled=true
scalar.db.analytics.server.tls.cert_chain_path=/path/to/server.crt
scalar.db.analytics.server.tls.private_key_path=/path/to/server.key
# License configuration (required)
scalar.db.analytics.server.licensing.license_key=<YOUR_LICENSE_KEY>
scalar.db.analytics.server.licensing.license_check_cert_pem=<YOUR_LICENSE_CERT_PEM>
# Metering storage configuration (required)
scalar.db.analytics.server.metering.storage.provider=filesystem
scalar.db.analytics.server.metering.storage.path=/var/scalardb-analytics/metering
For production deployments, configure metering storage to use object storage (for example, Amazon S3, Google Cloud Storage, or Azure Blob Storage) instead of the local filesystem.For detailed configuration options, see the Configuration reference.
Start the ScalarDB Analytics server​
Start the ScalarDB Analytics server with your configuration:
docker run -d \
--name scalardb-analytics-server \
-p 11051:11051 \
-p 11052:11052 \
-v /path/to/scalardb-analytics-server.properties:/scalardb-analytics-server/server.properties \
ghcr.io/scalar-labs/scalardb-analytics-server:<VERSION>
Replace <VERSION>
with the ScalarDB Analytics version you want to use. You can find available versions at the container registry page.
The container uses the configuration file at /scalardb-analytics-server/server.properties
by default.
The ScalarDB Analytics server will perform the following during startup:
- Validate the license
- Connect to the metadata database
- Start gRPC services on the configured ports
- Begin accepting client connections
Keep note of your server configuration (hostname and ports) as you will need this information later when configuring Spark applications to connect to your catalog.
Check server health (optional)​
If you want to verify the server is running properly, you can use grpc-health-probe (included in the container image):
# Check catalog service health
docker exec scalardb-analytics-server grpc-health-probe -addr=localhost:11051
# Check metering service health
docker exec scalardb-analytics-server grpc-health-probe -addr=localhost:11052
# For TLS-enabled servers
docker exec scalardb-analytics-server grpc-health-probe -addr=localhost:11051 -tls -tls-ca-cert=/path/to/ca.crt
Set up ScalarDB Analytics CLI​
ScalarDB Analytics CLI is a command-line tool that communicates with the ScalarDB Analytics server to manage catalogs, register data sources, and perform administrative tasks.
Install the CLI​
The scalardb-analytics-cli
tool is available as a container image:
# Pull the CLI image
docker pull ghcr.io/scalar-labs/scalardb-analytics-cli:<VERSION>
Replace <VERSION>
with the ScalarDB Analytics version you want to use. Available versions can be found at the container registry page.
To run CLI commands, you'll need to mount your configuration file into the container:
# Example: List catalogs
docker run --rm \
-v $(pwd)/client.properties:/config/client.properties:ro \
ghcr.io/scalar-labs/scalardb-analytics-cli:<VERSION> \
-c /config/client.properties \
catalog list
Configure the client​
Create a configuration file named client.properties
in your current directory:
# Server connection
scalar.db.analytics.client.server.host=localhost
scalar.db.analytics.client.server.catalog.port=11051
scalar.db.analytics.client.server.metering.port=11052
# TLS/SSL configuration (if enabled on server)
scalar.db.analytics.client.server.tls.enabled=true
scalar.db.analytics.client.server.tls.ca_root_cert_path=/path/to/ca.crt
scalar.db.analytics.client.server.tls.override_authority=analytics.example.com
For detailed configuration options, see the Configuration reference.
Set up an alias (optional)​
For convenience, you can create an alias to avoid typing the long Docker command each time:
alias scalardb-analytics-cli='docker run --rm -v $(pwd)/client.properties:/config/client.properties:ro ghcr.io/scalar-labs/scalardb-analytics-cli:<VERSION> -c /config/client.properties'
With this alias, you can run commands more simply:
scalardb-analytics-cli catalog list
Create your catalog​
This section describes how to create a catalog container, add data sources to your catalog, and verify your catalog.
Create a catalog container​
A catalog serves as a logical container for organizing data sources. Create your first catalog:
scalardb-analytics-cli catalog create production
Remember the catalog name you choose here (for example, production
). You will need to use this exact same name when configuring your Spark applications to connect to this catalog.
Verify the catalog was created:
scalardb-analytics-cli catalog list
Add data sources to your catalog​
Create a data source registration file for your database. Here's an example for PostgreSQL:
Create postgres-datasource.json
:
{
"catalog": "production",
"name": "postgres_customers",
"type": "postgres",
"provider": {
"host": "postgres.example.com",
"port": 5432,
"username": "analytics_user",
"password": "secure_password",
"database": "customers"
}
}
For detailed configuration options and examples for other database types (MySQL, ScalarDB, Oracle, SQL Server, DynamoDB), see the Data Source Reference.
Register the data source:
scalardb-analytics-cli data-source register --data-source-json postgres-datasource.json
Verify your catalog​
List all data sources in your catalog:
scalardb-analytics-cli data-source list --catalog production
List namespaces in your catalog:
scalardb-analytics-cli namespace list --catalog production
List tables in your catalog:
scalardb-analytics-cli table list --catalog production
Next steps​
You now have a fully functional ScalarDB Analytics catalog with registered data sources.
To develop analytical applications using this catalog:
- Run analytical queries: See Run Analytical Queries Through ScalarDB Analytics
- Add more data sources: See Data Source Reference
- Deploy in public clouds: See Deploy ScalarDB Analytics in Public Cloud Environments
- Explore configuration details: See ScalarDB Analytics Configurations