Skip to main content
Version: 3.17

Data Source Reference

This reference guide provides detailed information about data source configuration formats, provider-specific settings, and data type mappings for ScalarDB Analytics.

warning

You need to have a license key (trial license or commercial license) to use ScalarDB Analytics. If you don't have a license key, please contact us.

Data source registration file format​

Data sources are registered to catalogs by using the CLI with provider configuration files. These files define the connection settings for each data source type. For CLI command details, see CLI command reference.

The provider configuration file has the following structure:

{
"type": "<database-type>", // Database type: postgresql, mysql, scalardb, sqlserver, oracle, dynamodb, databricks, snowflake
// Type-specific connection configuration
// Configuration varies by database type
}

The catalog name and data source name are specified as CLI options. (--catalog and --data-source)

File reference syntax

You can use the ${file:path} syntax to load configuration values from an external file. This is useful for reusing existing configuration files or separating sensitive information.

Supported file formats:

  • .properties files: Loaded and converted to a JSON object with string values
  • .json files: Loaded as-is (any valid JSON structure)

For example:

{
"type": "scalardb",
"configs": "${file:/path/to/scalardb.properties}"
}

Provider configuration by type​

The following sections show the provider configuration for each supported database type:

Configurations

The following configuration is for ScalarDB.

configs

  • Field: configs
  • Description: A map of ScalarDB configuration properties. These are the same properties that would be specified in a ScalarDB configuration file. You can specify the configuration inline as a JSON object or use a file reference with the ${file:path} syntax.

Example

Inline configuration:

{
"type": "scalardb",
"configs": {
"scalar.db.contact_points": "localhost",
"scalar.db.username": "admin",
"scalar.db.password": "admin",
"scalar.db.storage": "jdbc"
}
}

Using file reference with a properties file:

{
"type": "scalardb",
"configs": "${file:/path/to/scalardb.properties}"
}

Data access method

ScalarDB Analytics reads data from ScalarDB by using the ScalarDB Core library directly, not through ScalarDB Cluster. As a result, features that are available only in ScalarDB Cluster (such as encryption) cannot be used with the ScalarDB data source.

Scan behavior

Internally, the ScalarDB data source uses the Scan operation with all() to read data. This operation requires cross-partition scan to be enabled. Filtering and ordering are not applied at the ScalarDB level. The relevant settings are as follows:

  • scalar.db.cross_partition_scan.enabled must be true (the default is true).
  • scalar.db.cross_partition_scan.filtering.enabled has no effect.
  • scalar.db.cross_partition_scan.ordering.enabled has no effect.
note

Filter push-down and other optimizations may be supported in future releases.

ScalarDB Core configuration overrides

ScalarDB Core configuration properties are generally respected when used as a ScalarDB data source. However, the following properties are overridden by ScalarDB Analytics:

  • scalar.db.scan_fetch_size: If not explicitly set by the user, defaults to 4096 instead of the ScalarDB Core default of 10.
  • scalar.db.consensus_commit.isolation_level: Always overridden to READ_COMMITTED, regardless of the user-specified value.

Catalog information reference​

This section describes catalog structure mappings by data source and data type mappings.

Catalog structure mappings by data source​

When registering a data source to ScalarDB Analytics, the catalog structure of the data source, that is, namespaces, tables, and columns, are resolved and registered to the universal data catalog. To resolve the catalog structure of the data source, a particular object on the data sources side are mapped to the universal data catalog object.

Catalog-level mappings​

The catalog-level mappings are the mappings of the namespace names, table names, and column names from the data sources to the universal data catalog. To see the catalog-level mappings in each data source, select a data source.

The catalog structure of ScalarDB is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:

  • The ScalarDB namespace is mapped to the namespace. Therefore, the namespace of the ScalarDB data source is always single level, consisting of only the namespace name.
  • The ScalarDB table is mapped to the table.
  • The ScalarDB column is mapped to the column.

Data type mappings​

The following sections show how native types from each data source are mapped to ScalarDB Analytics types:

warning

Columns with data types that are not included in the mapping tables below will be ignored during data source registration. These columns will not appear in the ScalarDB Analytics catalog and cannot be queried. Information about ignored columns is logged in the ScalarDB Analytics server logs.

ScalarDB Data TypeScalarDB Analytics Data Type
BOOLEANBOOLEAN
INTINT
BIGINTBIGINT
FLOATFLOAT
DOUBLEDOUBLE
TEXTTEXT
BLOBBLOB
DATEDATE
TIMETIME
TIMESTAMPTIMESTAMP
TIMESTAMPTZTIMESTAMPTZ