Version: 3.10 (unsupported)

Getting Started with ScalarDB Analytics with PostgreSQL

This document explains how to get started with ScalarDB Analytics with PostgreSQL. We assume that you have already installed ScalarDB Analytics with PostgreSQL and that all required services are running. If you don't have such an environment, please follow the instructions in How to Install ScalarDB Analytics with PostgreSQL in Your Local Environment by Using Docker. Because ScalarDB Analytics with PostgreSQL executes queries via PostgreSQL, we also assume that you already have a psql client or another PostgreSQL client to send queries to PostgreSQL.

What is ScalarDB Analytics with PostgreSQL?

ScalarDB, as a universal transaction manager, targets mainly transactional workloads and therefore supports limited subsets of relational queries.

ScalarDB Analytics with PostgreSQL extends the functionality of ScalarDB to process analytical queries on ScalarDB-managed data by using PostgreSQL and its foreign data wrapper (FDW) extension.

ScalarDB Analytics with PostgreSQL mainly consists of two components: PostgreSQL and Schema Importer.

PostgreSQL runs as a service, accepting queries from users to process. FDW extensions are used to read data from the back-end storages that ScalarDB manages. Schema Importer is a tool to import the schema of the ScalarDB database into PostgreSQL so that users can see tables on the PostgreSQL side, which are identical to the tables on the ScalarDB side.

Set up a ScalarDB database

First, you need one or more ScalarDB databases to run analytical queries with ScalarDB Analytics with PostgreSQL. If you have your own ScalarDB database, you can skip this section and use your database instead. If you use the scalardb-samples/scalardb-analytics-postgresql-sample project, you can set up a sample database by running the following command in the project directory.

docker compose run --rm schema-loader \
  -c /etc/scalardb.properties \
  --schema-file /etc/schema.json \
  --coordinator \
  --no-backup \
  --no-scaling

This command sets up multiple storage instances that consist of DynamoDB, PostgreSQL, and Cassandra. Then, the command creates namespaces for dynamons, postgresns, and cassandrans that are mapped to those storages, creates tables for dynamons.customer, postgresns.orders, and cassandrans.lineitem by using ScalarDB Schema Loader.

Multi-storage overview

You can load sample data into the created tables by running the following command.

docker compose run --rm sample-data-loader

Import the schemas from ScalarDB into PostgreSQL

Next, let's import the schemas of the ScalarDB databases into PostgreSQL that processes analytical queries. ScalarDB Analytics with PostgreSQL provides a tool, Schema Importer, for this purpose. It'll get everything in place to run analytical queries for you.

docker compose run --rm schema-importer \
  import \
  --config /etc/scalardb.properties \
  --host analytics \
  --port 5432 \
  --database test \
  --user postgres \
  --password postgres \
  --namespace cassandrans \
  --namespace postgresns \
  --namespace dynamons \
  --config-on-postgres-host /etc/scalardb.properties

If you use your own ScalarDB database, you must replace the --config and --config-on-postgres-host options with your ScalarDB configuration file and the --namespace options with your ScalarDB namespaces to import.

This creates tables (in precise, views) with the same names as the tables in the ScalarDB databases. In this example, the tables of dynamons.customer, postgresns.orders, and cassandrans.lineitem are created. The column definitions are also identical to the ScalarDB databases. These tables are foreign tables connected to the underlying storage of the ScalarDB databases using FDW. Therefore, you can equate those tables in PostgreSQL with the tables in the ScalarDB databases.

Imported schema

Run analytical queries

Now, you have all tables to read the same data in the ScalarDB databases and can run any arbitrary analytical queries supported by PostgreSQL. To run queries, please connect to PostgreSQL with psql or other client.

psql -U postgres -h localhost test
Password for user postgres:

> select c_mktsegment, count(*) from dynamons.customer group by c_mktsegment;
 c_mktsegment | count
--------------+-------
 AUTOMOBILE   |     4
 BUILDING     |     2
 FURNITURE    |     1
 HOUSEHOLD    |     2
 MACHINERY    |     1
(5 rows)

For details about the sample data and additional practical work, see the sample application page.

Caveats

Isolation level

ScalarDB Analytics with PostgreSQL reads data with the Read Committed isolation level set. This isolation level ensures that the data you read has been committed in the past but does not guarantee that you can read consistent data at a particular point in time.

Write operations are not supported

ScalarDB Analytics with PostgreSQL only supports read-only queries. INSERT, UPDATE, and other write operations are not supported.

What is ScalarDB Analytics with PostgreSQL?​

Set up a ScalarDB database​

Import the schemas from ScalarDB into PostgreSQL​

Run analytical queries​

Caveats​

Isolation level​

Write operations are not supported​