Data Source Reference
This reference guide provides detailed information about data source configuration formats, provider-specific settings, and data type mappings for ScalarDB Analytics.
You need to have a license key (trial license or commercial license) to use ScalarDB Analytics. If you don't have a license key, please contact us.
Data source registration file format​
Data sources are registered to catalogs using the CLI with data source registration files. These files have the following structure. For CLI command details, see CLI command reference.
{
"catalog": "<catalog-name>", // The catalog to register the data source in
"name": "<data-source-name>", // A unique name for this data source
"type": "<database-type>", // Database type: postgres, mysql, scalardb, sqlserver, oracle, dynamodb
"provider": {
// Type-specific connection configuration
// Configuration varies by database type
}
}
The provider
section contains data source-specific connection settings that vary based on the type
field.
Provider configuration by type​
The following sections show the provider configuration for each supported database type:
- ScalarDB
- PostgreSQL
- MySQL
- Oracle
- SQL Server
- DynamoDB
Configurations
The following configuration is for ScalarDB.
configPath
- Field:
configPath
- Description: Path to the ScalarDB configuration file.
Example
{
"catalog": "production",
"name": "scalardb_source",
"type": "scalardb",
"provider": {
"configPath": "/path/to/scalardb.properties"
}
}
Configuration
The following configurations are for PostgreSQL.
host
- Field:
host
- Description: PostgreSQL server hostname.
port
- Field:
port
- Description: Port number.
username
- Field:
username
- Description: Database user.
password
- Field:
password
- Description: Database password.
database
- Field:
database
- Description: Database name to connect to.
Example
{
"catalog": "production",
"name": "postgres_customers",
"type": "postgres",
"provider": {
"host": "postgres.example.com",
"port": 5432,
"username": "analytics_user",
"password": "secure_password",
"database": "customers"
}
}
Configuration
The following configurations are for MySQL.
host
- Field:
host
- Description: MySQL server hostname.
port
- Field:
port
- Description: Port number.
username
- Field:
username
- Description: Database user.
password
- Field:
password
- Description: Database password.
database
- Field:
database
- Description: Specific database to import. If omitted, all databases will be imported.
- Default value: None (imports all databases)
Example
{
"catalog": "production",
"name": "mysql_orders",
"type": "mysql",
"provider": {
"host": "mysql.example.com",
"port": 3306,
"username": "analytics_user",
"password": "secure_password",
"database": "orders" // Optional - if omitted, all databases will be imported
}
}
Configuration
The following configurations are for Oracle.
host
- Field:
host
- Description: Oracle server hostname.
port
- Field:
port
- Description: Port number.
username
- Field:
username
- Description: Database user.
password
- Field:
password
- Description: Database password.
serviceName
- Field:
serviceName
- Description: Oracle service name.
Example
{
"catalog": "production",
"name": "oracle_warehouse",
"type": "oracle",
"provider": {
"host": "oracle.example.com",
"port": 1521,
"username": "analytics_user",
"password": "secure_password",
"serviceName": "ORCL"
}
}
Configuration
The following configurations are for SQL Server.
host
- Field:
host
- Description: SQL Server hostname.
port
- Field:
port
- Description: Port number.
username
- Field:
username
- Description: Database user.
password
- Field:
password
- Description: Database password.
database
- Field:
database
- Description: Specific database to connect to.
- Default value: None (connects to default database)
secure
- Field:
secure
- Description: Enable encryption.
- Default value:
false
Example
{
"catalog": "production",
"name": "sqlserver_analytics",
"type": "sqlserver",
"provider": {
"host": "sqlserver.example.com",
"port": 1433,
"username": "sa",
"password": "secure_password",
"database": "analytics", // Optional - if specified, only this database will be imported
"secure": true // Optional - enable encryption
}
}
Configuration
The following configurations are for DynamoDB.
DynamoDB authentication uses the standard AWS SDK credential provider chain. Credentials can be configured through:
- Environment variables (
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
) - AWS credentials file (
~/.aws/credentials
) - IAM roles (when running on EC2, ECS, or Lambda)
- AWS SSO or other credential providers supported by the AWS SDK
For more information, see the AWS SDK documentation on credential providers.
region
- Field:
region
- Description: AWS region (e.g., us-east-1). Either
region
orendpoint
must be specified (not both).
endpoint
- Field:
endpoint
- Description: Custom endpoint URL. Either
region
orendpoint
must be specified (not both).
schema
- Field:
schema
- Description: Complete schema definition. Since DynamoDB is schema-less, you must provide a complete schema definition.
Schema structure
The schema field must contain the following structure:
.schema.namespaces[]
- Field:
.schema.namespaces[]
- Description: Array of namespace definitions.
.schema.namespaces[].names[]
- Field:
.schema.namespaces[].names[]
- Description: Array of namespace names (strings).
.schema.namespaces[].tables[]
- Field:
.schema.namespaces[].tables[]
- Description: Array of table definitions.
.schema.namespaces[].tables[].name
- Field:
.schema.namespaces[].tables[].name
- Description: Table name.
.schema.namespaces[].tables[].columns[]
- Field:
.schema.namespaces[].tables[].columns[]
- Description: Array of column definitions.
.schema.namespaces[].tables[].columns[].name
- Field:
.schema.namespaces[].tables[].columns[].name
- Description: Column name.
.schema.namespaces[].tables[].columns[].type
- Field:
.schema.namespaces[].tables[].columns[].type
- Description: Data type.
.schema.namespaces[].tables[].columns[].nullable
- Field:
.schema.namespaces[].tables[].columns[].nullable
- Description: Whether column can contain null values.
- Default value:
true
Example
{
"catalog": "production",
"name": "dynamodb_events",
"type": "dynamodb",
"provider": {
"region": "us-east-1",
"schema": {
"namespaces": [
{
"names": ["production"],
"tables": [
{
"name": "user_events",
"columns": [
{ "name": "user_id", "type": "TEXT", "nullable": false },
{
"name": "event_time",
"type": "TIMESTAMP",
"nullable": false
},
{ "name": "event_type", "type": "TEXT" },
{ "name": "event_data", "type": "TEXT" }
]
}
]
}
]
}
}
}
Catalog information reference​
This section describes catalog structure mappings by data source and data type mappings.
Catalog structure mappings by data source​
When registering a data source to ScalarDB Analytics, the catalog structure of the data source, that is, namespaces, tables, and columns, are resolved and registered to the universal data catalog. To resolve the catalog structure of the data source, a particular object on the data sources side are mapped to the universal data catalog object.
Catalog-level mappings​
The catalog-level mappings are the mappings of the namespace names, table names, and column names from the data sources to the universal data catalog. To see the catalog-level mappings in each data source, select a data source.
- ScalarDB
- PostgreSQL
- MySQL
- Oracle
- SQL Server
- DynamoDB
The catalog structure of ScalarDB is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
- The ScalarDB namespace is mapped to the namespace. Therefore, the namespace of the ScalarDB data source is always single level, consisting of only the namespace name.
- The ScalarDB table is mapped to the table.
- The ScalarDB column is mapped to the column.
The catalog structure of PostgreSQL is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
- The PostgreSQL schema is mapped to the namespace. Therefore, the namespace of the PostgreSQL data source is always single level, consisting of only the schema name.
- Only user-defined schemas are mapped to namespaces. The following system schemas are ignored:
information_schema
pg_catalog
- Only user-defined schemas are mapped to namespaces. The following system schemas are ignored:
- The PostgreSQL table is mapped to the table.
- The PostgreSQL column is mapped to the column.
The catalog structure of MySQL is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
- The MySQL database is mapped to the namespace. Therefore, the namespace of the MySQL data source is always single level, consisting of only the database name.
- Only user-defined databases are mapped to namespaces. The following system databases are ignored:
mysql
sys
information_schema
performance_schema
- Only user-defined databases are mapped to namespaces. The following system databases are ignored:
- The MySQL table is mapped to the table.
- The MySQL column is mapped to the column.
The catalog structure of Oracle is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
- The Oracle schema is mapped to the namespace. Therefore, the namespace of the Oracle data source is always single level, consisting of only schema name.
- Only user-defined schemas are mapped to namespaces. The following system schemas are ignored:
ANONYMOUS
APPQOSSYS
AUDSYS
CTXSYS
DBSNMP
DGPDB_INT
DBSFWUSER
DVF
DVSYS
GGSYS
GSMADMIN_INTERNAL
GSMCATUSER
GSMROOTUSER
GSMUSER
LBACSYS
MDSYS
OJVMSYS
ORDDATA
ORDPLUGINS
ORDSYS
OUTLN
REMOTE_SCHEDULER_AGENT
SI_INFORMTN_SCHEMA
SYS
SYS$UMF
SYSBACKUP
SYSDG
SYSKM
SYSRAC
SYSTEM
WMSYS
XDB
DIP
MDDATA
ORACLE_OCM
XS$NULL
- Only user-defined schemas are mapped to namespaces. The following system schemas are ignored:
The catalog structure of SQL Server is automatically resolved by ScalarDB Analytics. The catalog-level objects are mapped as follows:
- The SQL Server database and schema are mapped to the namespace together. Therefore, the namespace of the SQL Server data source is always two-level, consisting of the database name and the schema name.
- Only user-defined databases are mapped to namespaces. The following system databases are ignored:
sys
guest
INFORMATION_SCHEMA
db_accessadmin
db_backupoperator
db_datareader
db_datawriter
db_ddladmin
db_denydatareader
db_denydatawriter
db_owner
db_securityadmin
- Only user-defined schemas are mapped to namespaces. The following system schemas are ignored:
master
model
msdb
tempdb
- Only user-defined databases are mapped to namespaces. The following system databases are ignored:
- The SQL Server table is mapped to the table.
- The SQL Server column is mapped to the column.
Since DynamoDB is schema-less, you need to specify the catalog structure explicitly when registering a DynamoDB data source by using the following format JSON:
{
"namespaces": [
{
"name": "<NAMESPACE_NAME>",
"tables": [
{
"name": "<TABLE_NAME>",
"columns": [
{
"name": "<COLUMN_NAME>",
"type": "<COLUMN_TYPE>"
},
...
]
},
...
]
},
...
]
}
In the specified JSON, you can use any arbitrary namespace names, but the table names must match the table names in DynamoDB and column name and type must match field names and types in DynamoDB.
Data type mappings​
The following sections show how native types from each data source are mapped to ScalarDB Analytics types:
Columns with data types that are not included in the mapping tables below will be ignored during data source registration. These columns will not appear in the ScalarDB Analytics catalog and cannot be queried. Information about ignored columns is logged in the ScalarDB Analytics server logs.
- ScalarDB
- PostgreSQL
- MySQL
- Oracle
- SQL Server
- DynamoDB
ScalarDB Data Type | ScalarDB Analytics Data Type |
---|---|
BOOLEAN | BOOLEAN |
INT | INT |
BIGINT | BIGINT |
FLOAT | FLOAT |
DOUBLE | DOUBLE |
TEXT | TEXT |
BLOB | BLOB |
DATE | DATE |
TIME | TIME |
TIMESTAMP | TIMESTAMP |
TIMESTAMPTZ | TIMESTAMPTZ |
PostgreSQL Data Type | ScalarDB Analytics Data Type |
---|---|
integer | INT |
bigint | BIGINT |
real | FLOAT |
double precision | DOUBLE |
smallserial | SMALLINT |
serial | INT |
bigserial | BIGINT |
char | TEXT |
varchar | TEXT |
text | TEXT |
bpchar | TEXT |
boolean | BOOLEAN |
bytea | BLOB |
date | DATE |
time | TIME |
time with time zone | TIME |
time without time zone | TIME |
timestamp | TIMESTAMP |
timestamp with time zone | TIMESTAMPTZ |
timestamp without time zone | TIMESTAMP |
MySQL Data Type | ScalarDB Analytics Data Type |
---|---|
bit | BOOLEAN |
bit(1) | BOOLEAN |
bit(x) if x >= 2 | BLOB |
tinyint | SMALLINT |
tinyint(1) | BOOLEAN |
boolean | BOOLEAN |
smallint | SMALLINT |
smallint unsigned | INT |
mediumint | INT |
mediumint unsigned | INT |
int | INT |
int unsigned | BIGINT |
bigint | BIGINT |
float | FLOAT |
double | DOUBLE |
real | DOUBLE |
char | TEXT |
varchar | TEXT |
text | TEXT |
binary | BLOB |
varbinary | BLOB |
blob | BLOB |
date | DATE |
time | TIME |
datetime | TIMESTAMP |
timestamp | TIMESTAMPTZ |
Oracle Data Type | ScalarDB Analytics Data Type |
---|---|
NUMBER if scale = 0 | BIGINT |
NUMBER if scale > 0 | DOUBLE |
FLOAT if precision ≤ 53 | DOUBLE |
BINARY_FLOAT | FLOAT |
BINARY_DOUBLE | DOUBLE |
CHAR | TEXT |
NCHAR | TEXT |
VARCHAR2 | TEXT |
NVARCHAR2 | TEXT |
CLOB | TEXT |
NCLOB | TEXT |
BLOB | BLOB |
BOOLEAN | BOOLEAN |
DATE | DATE |
TIMESTAMP | TIMESTAMPTZ |
TIMESTAMP WITH TIME ZONE | TIMESTAMPTZ |
TIMESTAMP WITH LOCAL TIME ZONE | TIMESTAMP |
RAW | BLOB |
SQL Server Data Type | ScalarDB Analytics Data Type |
---|---|
bit | BOOLEAN |
tinyint | SMALLINT |
smallint | SMALLINT |
int | INT |
bigint | BIGINT |
real | FLOAT |
float | DOUBLE |
float(n) if n ≤ 24 | FLOAT |
float(n) if n ≥ 25 | DOUBLE |
binary | BLOB |
varbinary | BLOB |
char | TEXT |
varchar | TEXT |
nchar | TEXT |
nvarchar | TEXT |
ntext | TEXT |
text | TEXT |
date | DATE |
time | TIME |
datetime | TIMESTAMP |
datetime2 | TIMESTAMP |
smalldatetime | TIMESTAMP |
datetimeoffset | TIMESTAMPTZ |
DynamoDB Data Type | ScalarDB Analytics Data Type |
---|---|
String | TEXT |
Number | DOUBLE |
Binary | BLOB |
Boolean | BOOLEAN |
Null | NULL |
String Set | TEXT |
Number Set | TEXT |
Binary Set | TEXT |
List | TEXT |
Map | TEXT |
DynamoDB complex data types (String Set, Number Set, Binary Set, List, Map) are mapped to TEXT
for compatibility. The actual values are serialized as JSON strings in ScalarDB Analytics queries.