バージョン: 3.15

Run Analytical Queries Through ScalarDB Analytics

This guide explains how to develop ScalarDB Analytics applications. For details on the architecture and design, see ScalarDB Analytics Design

ScalarDB Analytics currently uses Spark as an execution engine and provides a Spark custom catalog plugin to provide a unified view of ScalarDB-managed and non-ScalarDB-managed data sources as Spark tables. This allows you to execute arbitrary Spark SQL queries seamlessly.

Preparation

This section describes the prerequisites, setting up ScalarDB Analytics in the Spark configuration, and adding the ScalarDB Analytics dependency.

Prerequisites

ScalarDB Analytics works with Apache Spark 3.4 or later. If you don't have Spark installed yet, please download the Spark distribution from Apache's website.

注記

Apache Spark are built with either Scala 2.12 or Scala 2.13. ScalarDB Analytics supports both versions. You need to be sure which version you are using so that you can select the correct version of ScalarDB Analytics later. You can refer to Version Compatibility for more details.

Set up ScalarDB Analytics in the Spark configuration

The following sections describe all available configuration options for ScalarDB Analytics. These configurations control:

How ScalarDB Analytics integrates with Spark
How data sources are connected and accessed
How license information is provided

For example configurations in a practical scenario, see the sample application configuration.

Spark plugin configurations

Configuration Key	Required	Description
`spark.jars.packages`	No	A comma-separated list of Maven coordinates for the required dependencies. User need to include the ScalarDB Analytics package you are using, otherwise, specify it as the command line argument when running the Spark application. For details about the Maven coordinates of ScalarDB Analytics, refer to Add ScalarDB Analytics dependency.
`spark.sql.extensions`	Yes	Must be set to `com.scalar.db.analytics.spark.Extensions`
`spark.sql.catalog.<CATALOG_NAME>`	Yes	Must be set to `com.scalar.db.analytics.spark.ScalarCatalog`

You can specify any name for <CATALOG_NAME>. Be sure to use the same catalog name throughout your configuration.

License configurations

Configuration Key	Required	Description
`spark.sql.catalog.<CATALOG_NAME>.license.key`	Yes	JSON string of the license key for ScalarDB Analytics
`spark.sql.catalog.<CATALOG_NAME>.license.cert_pem`	Yes	A string of PEM-encoded certificate of ScalarDB Analytics license. Either `cert_pem` or `cert_path` must be set.
`spark.sql.catalog.<CATALOG_NAME>.license.cert_path`	Yes	A path to the PEM-encoded certificate of ScalarDB Analytics license. Either `cert_pem` or `cert_path` must be set.

Data source configurations

ScalarDB Analytics supports multiple types of data sources. Each type requires specific configuration parameters:

ScalarDB
MySQL
PostgreSQL
Oracle
SQL Server

注記

ScalarDB Analytics supports ScalarDB as a data source. This table describes how to configure ScalarDB as a data source.

Configuration Key	Required	Description
`spark.sql.catalog.<CATALOG_NAME>.data_source.<DATA_SOURCE_NAME>.type`	Yes	Always set to `scalardb`
`spark.sql.catalog.<CATALOG_NAME>.data_source.<DATA_SOURCE_NAME>.config_path`	Yes	The path to the configuration file for ScalarDB

ヒント