Run Analytical Queries on Sample Data by Using ScalarDB Analytics with Spark
This tutorial describes how to run analytical queries on sample data by using ScalarDB Analytics with Spark. The source code is available at https://github.com/scalar-labs/scalardb-samples/tree/main/scalardb-analytics-spark-sample.
What you can do in this sample application
This sample tutorial shows how you can run interactive analysis in the Spark shell by using ScalarDB Analytics with Spark. In particular, you'll learn how to run the following two types of queries:
- Read data and calculate summaries.
- Join tables that span multiple storages.
Prerequisites
- Docker 20.10 or later with Docker Compose V2 or later
You need to have a license key (trial license or commercial license) to use ScalarDB Analytics with Spark. If you don't have a license key, please contact us.
Set up ScalarDB Analytics with Spark
Clone the ScalarDB samples repository
Open Terminal, then clone the ScalarDB samples repository by running the following command:
git clone https://github.com/scalar-labs/scalardb-samples
Then, go to the directory that contains the sample application by running the following command:
cd scalardb-samples/scalardb-analytics-spark-sample
Add your license certificate to the sample directory
Copy your license certificate (cert.pem
) to the sample directory by running the following command, replacing <PATH_TO_YOUR_LICENSE>
with the path to your license:
cp /<PATH_TO_YOUR_LICENSE>/cert.pem cert.pem