Data Source Reference
This reference guide provides detailed information about data source configuration formats, provider-specific settings, and data type mappings for ScalarDB Analytics.
You need to have a license key (trial license or commercial license) to use ScalarDB Analytics. If you don't have a license key, please contact us.
Data source registration file format​
Data sources are registered to catalogs using the CLI with data source registration files. These files have the following structure. For CLI command details, see CLI command reference.
{
"catalog": "<catalog-name>", // The catalog to register the data source in
"name": "<data-source-name>", // A unique name for this data source
"type": "<database-type>", // Database type: postgres, mysql, scalardb, sqlserver, oracle, dynamodb, databricks, snowflake
"provider": {
// Type-specific connection configuration
// Configuration varies by database type
}
}
The provider section contains data source-specific connection settings that vary based on the type field.
Provider configuration by type​
The following sections show the provider configuration for each supported database type:
- ScalarDB
- PostgreSQL
- MySQL
- Oracle
- SQL Server
- Databricks
- Snowflake
- DynamoDB
Configurations
The following configuration is for ScalarDB.
configPath
- Field:
configPath - Description: Path to the ScalarDB configuration file.
Example
{
"catalog": "production",
"name": "scalardb_source",
"type": "scalardb",
"provider": {
"configPath": "/path/to/scalardb.properties"
}
}
Configuration
The following configurations are for PostgreSQL.
host
- Field:
host - Description: PostgreSQL server hostname.
port
- Field:
port - Description: Port number.
username
- Field:
username - Description: Database user.
password
- Field:
password - Description: Database password.
database
- Field:
database - Description: Database name to connect to.
Example
{
"catalog": "production",
"name": "postgres_customers",
"type": "postgres",
"provider": {
"host": "postgres.example.com",
"port": 5432,
"username": "analytics_user",
"password": "secure_password",
"database": "customers"
}
}
Configuration
The following configurations are for MySQL.
host
- Field:
host - Description: MySQL server hostname.
port
- Field:
port - Description: Port number.
username
- Field:
username - Description: Database user.
password
- Field:
password - Description: Database password.
database
- Field:
database - Description: Specific database to import. If omitted, all databases will be imported.
- Default value: None (imports all databases)
Example
{
"catalog": "production",
"name": "mysql_orders",
"type": "mysql",
"provider": {
"host": "mysql.example.com",
"port": 3306,
"username": "analytics_user",
"password": "secure_password",
"database": "orders" // Optional - if omitted, all databases will be imported
}
}
Configuration
The following configurations are for Oracle.
host
- Field:
host - Description: Oracle server hostname.
port
- Field:
port - Description: Port number.
username
- Field:
username - Description: Database user.
password
- Field:
password - Description: Database password.
serviceName
- Field:
serviceName - Description: Oracle service name.
Example
{
"catalog": "production",
"name": "oracle_warehouse",
"type": "oracle",
"provider": {
"host": "oracle.example.com",
"port": 1521,
"username": "analytics_user",
"password": "secure_password",
"serviceName": "ORCL"
}
}
Configuration
The following configurations are for SQL Server.
host
- Field:
host - Description: SQL Server hostname.
port
- Field:
port - Description: Port number.
username
- Field:
username - Description: Database user.
password
- Field:
password - Description: Database password.
database
- Field:
database - Description: Specific database to connect to.
- Default value: None (connects to default database)
secure
- Field:
secure - Description: Enable encryption.
- Default value:
false
Example
{
"catalog": "production",
"name": "sqlserver_analytics",
"type": "sqlserver",
"provider": {
"host": "sqlserver.example.com",
"port": 1433,
"username": "sa",
"password": "secure_password",
"database": "analytics", // Optional - if specified, only this database will be imported
"secure": true // Optional - enable encryption
}
}
Configuration
The following configurations are for Databricks (Databricks SQL/JDBC).
host
- Field:
host - Description: Databricks workspace hostname (for example,
adb-1234567890123.4.azuredatabricks.net).
port
- Field:
port - Description: Port number.
- Default value: Driver default. (Optional)
httpPath
- Field:
httpPath - Description: HTTP path of your SQL warehouse or cluster (for example,
/sql/1.0/warehouses/xxxxxxxxxxxxxx).
oAuthClientId
- Field:
oAuthClientId - Description: OAuth machine-to-machine (M2M) service principal's UUID or Application ID for Databricks SQL/JDBC authentication.
oAuthSecret
- Field:
oAuthSecret - Description: OAuth M2M service principal's secret for Databricks SQL/JDBC authentication.
catalog
- Field:
catalog - Description: Default catalog to use. (Optional)
Example
{
"catalog": "production",
"name": "databricks_analytics",
"type": "databricks",
"provider": {
"host": "adb-1234567890123.4.azuredatabricks.net",
"port": 443,
"httpPath": "/sql/1.0/warehouses/xxxxxxxxxxxxxx",
"oAuthClientId": "YOUR_CLIENT_ID",
"oAuthSecret": "YOUR_CLIENT_SECRET",
"catalog": "main"
}
}
Configuration
The following configurations are for Snowflake.
account
- Field:
account - Description: Snowflake account identifier (for example,
xy12345.ap-northeast-1).
username
- Field:
username - Description: A Snowflake user.
password
- Field:
password - Description: A Snowflake user's programmatic access token.
database
- Field:
database - Description: Default database to resolve/import. (Optional)
Example
{
"catalog": "production",
"name": "snowflake_dwh",
"type": "snowflake",
"provider": {
"account": "YOUR-ACCOUNT",
"username": "analytics_user",
"password": "secure_password",
"database": "ANALYTICS"
}
}
Configuration
The following configurations are for DynamoDB.
DynamoDB authentication uses the standard AWS SDK credential provider chain. Credentials can be configured through:
- Environment variables (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY) - AWS credentials file (
~/.aws/credentials) - IAM roles (when running on EC2, ECS, or Lambda)
- AWS SSO or other credential providers supported by the AWS SDK
For more information, see the AWS SDK documentation on credential providers.
region
- Field:
region - Description: AWS region (e.g., us-east-1). Either
regionorendpointmust be specified (not both).
endpoint
- Field:
endpoint - Description: Custom endpoint URL. Either
regionorendpointmust be specified (not both).
schema
- Field:
schema - Description: Complete schema definition. Since DynamoDB is schema-less, you must provide a complete schema definition.
Schema structure
The schema field must contain the following structure:
.schema.namespaces[]
- Field:
.schema.namespaces[] - Description: Array of namespace definitions.
.schema.namespaces[].names[]
- Field:
.schema.namespaces[].names[] - Description: Array of namespace names (strings).
.schema.namespaces[].tables[]
- Field:
.schema.namespaces[].tables[] - Description: Array of table definitions.
.schema.namespaces[].tables[].name
- Field:
.schema.namespaces[].tables[].name - Description: Table name.
.schema.namespaces[].tables[].columns[]
- Field:
.schema.namespaces[].tables[].columns[] - Description: Array of column definitions.
.schema.namespaces[].tables[].columns[].name
- Field:
.schema.namespaces[].tables[].columns[].name - Description: Column name.
.schema.namespaces[].tables[].columns[].type
- Field:
.schema.namespaces[].tables[].columns[].type - Description: Data type.
.schema.namespaces[].tables[].columns[].nullable
- Field:
.schema.namespaces[].tables[].columns[].nullable - Description: Whether column can contain null values.
- Default value:
true
Example
{
"catalog": "production",
"name": "dynamodb_events",
"type": "dynamodb",
"provider": {
"region": "us-east-1",
"schema": {
"namespaces": [
{
"names": ["production"],
"tables": [
{
"name": "user_events",
"columns": [
{ "name": "user_id", "type": "TEXT", "nullable": false },
{
"name": "event_time",
"type": "TIMESTAMP",
"nullable": false
},
{ "name": "event_type", "type": "TEXT" },
{ "name": "event_data", "type": "TEXT" }
]
}
]
}
]
}
}
}
Catalog information reference​
This section describes catalog structure mappings by data source and data type mappings.
Catalog structure mappings by data source​
When registering a data source to ScalarDB Analytics, the catalog structure of the data source, that is, namespaces, tables, and columns, are resolved and registered to the universal data catalog. To resolve the catalog structure of the data source, a particular object on the data sources side are mapped to the universal data catalog object.
Catalog-level mappings​
The catalog-level mappings are the mappings of the namespace names, table names, and column names from the data sources to the universal data catalog. To see the catalog-level mappings in each data source, select a data source.
- ScalarDB
- PostgreSQL
- MySQL
- Oracle
- SQL Server
- Databricks
- Snowflake
- DynamoDB
The catalog structure of ScalarDB is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
- The ScalarDB namespace is mapped to the namespace. Therefore, the namespace of the ScalarDB data source is always single level, consisting of only the namespace name.
- The ScalarDB table is mapped to the table.
- The ScalarDB column is mapped to the column.
The catalog structure of PostgreSQL is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
- The PostgreSQL schema is mapped to the namespace. Therefore, the namespace of the PostgreSQL data source is always single level, consisting of only the schema name.
- Only user-defined schemas are mapped to namespaces. The following system schemas are ignored:
information_schemapg_catalog
- Only user-defined schemas are mapped to namespaces. The following system schemas are ignored:
- The PostgreSQL table is mapped to the table.
- The PostgreSQL column is mapped to the column.
The catalog structure of MySQL is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
- The MySQL database is mapped to the namespace. Therefore, the namespace of the MySQL data source is always single level, consisting of only the database name.
- Only user-defined databases are mapped to namespaces. The following system databases are ignored:
mysqlsysinformation_schemaperformance_schema
- Only user-defined databases are mapped to namespaces. The following system databases are ignored:
- The MySQL table is mapped to the table.
- The MySQL column is mapped to the column.
The catalog structure of Oracle is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
- The Oracle schema is mapped to the namespace. Therefore, the namespace of the Oracle data source is always single level, consisting of only schema name.
- Only user-defined schemas are mapped to namespaces. The following system schemas are ignored:
ANONYMOUSAPPQOSSYSAUDSYSCTXSYSDBSNMPDGPDB_INTDBSFWUSERDVFDVSYSGGSYSGSMADMIN_INTERNALGSMCATUSERGSMROOTUSERGSMUSERLBACSYSMDSYSOJVMSYSORDDATAORDPLUGINSORDSYSOUTLNREMOTE_SCHEDULER_AGENTSI_INFORMTN_SCHEMASYSSYS$UMFSYSBACKUPSYSDGSYSKMSYSRACSYSTEMWMSYSXDBDIPMDDATAORACLE_OCMXS$NULL
- Only user-defined schemas are mapped to namespaces. The following system schemas are ignored:
The catalog structure of SQL Server is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
- Each SQL Server database-schema pair is mapped to a namespace in ScalarDB Analytics. Therefore, the namespace of the SQL Server data source is always two-level, consisting of the database name and the schema name.
- Only user-defined databases are mapped to namespaces. The following system databases are ignored:
sysguestINFORMATION_SCHEMAdb_accessadmindb_backupoperatordb_datareaderdb_datawriterdb_ddladmindb_denydatareaderdb_denydatawriterdb_ownerdb_securityadmin
- Only user-defined schemas are mapped to namespaces. The following system schemas are ignored:
mastermodelmsdbtempdb
- Only user-defined databases are mapped to namespaces. The following system databases are ignored:
- The SQL Server table is mapped to the table.
- The SQL Server column is mapped to the column.
The catalog structure of Databricks is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
- Each Databricks catalog-schema pair is mapped to a namespace in ScalarDB Analytics. Therefore, the namespace of the Databricks data source always has two levels, consisting of the catalog name and the schema name.
- The following system catalogs/schemas are ignored:
- Catalogs:
system - Schemas:
information_schema,global_temp,sys,routines
- Catalogs:
- The following system catalogs/schemas are ignored:
- The Databricks table is mapped to the table.
- The Databricks column is mapped to the column.
The catalog structure of Snowflake is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
- Each Snowflake database-schema pair is mapped to a namespace in ScalarDB Analytics. Therefore, the namespace of the Snowflake data source always has two levels, consisting of the database name and the schema name.
- The following system databases/schemas are ignored:
- Databases:
SNOWFLAKE - Schemas:
INFORMATION_SCHEMA
- Databases:
- The following system databases/schemas are ignored:
- The Snowflake table is mapped to the table.
- The Snowflake column is mapped to the column.
Since DynamoDB is schema-less, you need to specify the catalog structure explicitly when registering a DynamoDB data source by using the following format JSON:
{
"namespaces": [
{
"name": "<NAMESPACE_NAME>",
"tables": [
{
"name": "<TABLE_NAME>",
"columns": [
{
"name": "<COLUMN_NAME>",
"type": "<COLUMN_TYPE>"
},
...
]
},
...
]
},
...
]
}
In the specified JSON, you can use any arbitrary namespace names, but the table names must match the table names in DynamoDB and column name and type must match field names and types in DynamoDB.
Data type mappings​
The following sections show how native types from each data source are mapped to ScalarDB Analytics types:
Columns with data types that are not included in the mapping tables below will be ignored during data source registration. These columns will not appear in the ScalarDB Analytics catalog and cannot be queried. Information about ignored columns is logged in the ScalarDB Analytics server logs.
- ScalarDB
- PostgreSQL
- MySQL
- Oracle
- SQL Server
- Databricks
- Snowflake
- DynamoDB
| ScalarDB Data Type | ScalarDB Analytics Data Type |
|---|---|
BOOLEAN | BOOLEAN |
INT | INT |
BIGINT | BIGINT |
FLOAT | FLOAT |
DOUBLE | DOUBLE |
TEXT | TEXT |
BLOB | BLOB |
DATE | DATE |
TIME | TIME |
TIMESTAMP | TIMESTAMP |
TIMESTAMPTZ | TIMESTAMPTZ |
| PostgreSQL Data Type | ScalarDB Analytics Data Type |
|---|---|
integer | INT |
bigint | BIGINT |
real | FLOAT |
double precision | DOUBLE |
smallserial | SMALLINT |
serial | INT |
bigserial | BIGINT |
char | TEXT |
varchar | TEXT |
text | TEXT |
bpchar | TEXT |
boolean | BOOLEAN |
bytea | BLOB |
date | DATE |
time | TIME |
time with time zone | TIME |
time without time zone | TIME |
timestamp | TIMESTAMP |
timestamp with time zone | TIMESTAMPTZ |
timestamp without time zone | TIMESTAMP |
| MySQL Data Type | ScalarDB Analytics Data Type |
|---|---|
bit | BOOLEAN |
bit(1) | BOOLEAN |
bit(x) if x >= 2 | BLOB |
tinyint | SMALLINT |
tinyint(1) | BOOLEAN |
boolean | BOOLEAN |
smallint | SMALLINT |
smallint unsigned | INT |
mediumint | INT |
mediumint unsigned | INT |
int | INT |
int unsigned | BIGINT |
bigint | BIGINT |
float | FLOAT |
double | DOUBLE |
real | DOUBLE |
char | TEXT |
varchar | TEXT |
text | TEXT |
binary | BLOB |
varbinary | BLOB |
blob | BLOB |
date | DATE |
time | TIME |
datetime | TIMESTAMP |
timestamp | TIMESTAMPTZ |
| Oracle Data Type | ScalarDB Analytics Data Type |
|---|---|
NUMBER if scale = 0 | BIGINT |
NUMBER if scale > 0 | DOUBLE |
FLOAT if precision ≤ 53 | DOUBLE |
BINARY_FLOAT | FLOAT |
BINARY_DOUBLE | DOUBLE |
CHAR | TEXT |
NCHAR | TEXT |
VARCHAR2 | TEXT |
NVARCHAR2 | TEXT |
CLOB | TEXT |
NCLOB | TEXT |
BLOB | BLOB |
BOOLEAN | BOOLEAN |
DATE | DATE |
TIMESTAMP | TIMESTAMPTZ |
TIMESTAMP WITH TIME ZONE | TIMESTAMPTZ |
TIMESTAMP WITH LOCAL TIME ZONE | TIMESTAMP |
RAW | BLOB |
| SQL Server Data Type | ScalarDB Analytics Data Type |
|---|---|
bit | BOOLEAN |
tinyint | SMALLINT |
smallint | SMALLINT |
int | INT |
bigint | BIGINT |
real | FLOAT |
float | DOUBLE |
float(n) if n ≤ 24 | FLOAT |
float(n) if n ≥ 25 | DOUBLE |
binary | BLOB |
varbinary | BLOB |
char | TEXT |
varchar | TEXT |
nchar | TEXT |
nvarchar | TEXT |
ntext | TEXT |
text | TEXT |
date | DATE |
time | TIME |
datetime | TIMESTAMP |
datetime2 | TIMESTAMP |
smalldatetime | TIMESTAMP |
datetimeoffset | TIMESTAMPTZ |
| Databricks SQL Data Type | ScalarDB Analytics Data Type |
|---|---|
TINYINT | SMALLINT |
SMALLINT | SMALLINT |
INT / INTEGER | INT |
BIGINT | BIGINT |
FLOAT | FLOAT |
DOUBLE | DOUBLE |
DECIMAL(p,0) | BYTE (p ≤ 2), SMALLINT (3–4), INT (5–9), BIGINT (10–18), DECIMAL (p > 18) |
STRING / VARCHAR | TEXT |
BINARY | BLOB |
BOOLEAN | BOOLEAN |
DATE | DATE |
TIMESTAMP | TIMESTAMPTZ |
TIMESTAMP_NTZ | TIMESTAMP |
| Snowflake Data Type | ScalarDB Analytics Data Type |
|---|---|
NUMBER(p,0) | BYTE (p ≤ 2), SMALLINT (3–4), INT (5–9), BIGINT (10–18), DECIMAL (p > 18) |
NUMBER / NUMERIC | DECIMAL |
INT / INTEGER / BIGINT / SMALLINT / TINYINT / BYTEINT | DECIMAL |
FLOAT / FLOAT4 / FLOAT8 / DOUBLE / DOUBLE PRECISION / REAL | DOUBLE |
VARCHAR / STRING / TEXT / NVARCHAR / NVARCHAR2 / CHAR VARYING / NCHAR VARYING / CHAR / CHARACTER / NCHAR | TEXT |
BINARY / VARBINARY | BLOB |
BOOLEAN | BOOLEAN |
DATE | DATE |
TIME | TIME |
TIMESTAMP_NTZ / DATETIME | TIMESTAMP |
TIMESTAMP_LTZ | TIMESTAMPTZ |
TIMESTAMP_TZ | TIMESTAMPTZ |
| DynamoDB Data Type | ScalarDB Analytics Data Type |
|---|---|
String | TEXT |
Number | DOUBLE |
Binary | BLOB |
Boolean | BOOLEAN |
Null | NULL |
String Set | TEXT |
Number Set | TEXT |
Binary Set | TEXT |
List | TEXT |
Map | TEXT |
DynamoDB complex data types (String Set, Number Set, Binary Set, List, Map) are mapped to TEXT for compatibility. The actual values are serialized as JSON strings in ScalarDB Analytics queries.