ScalarDB Design Document
Introduction
ScalarDB is a library that makes non-ACID databases/storages ACID-compliant. This design document briefly explains its background, design, and implementation.
Background and Objectives
Distributed storage is widely adopted in real-world applications and recent open-source distributed storages such as Cassandra and HBase have accelerated the trend. They are particularly used by large and sometimes mission-critical applications because of their high performance, high availability and high scalability. However, they often lack transaction capability, which is particularly important in mission-critical applications. Transaction capability can be added to HBase via third-party libraries, but they tend to sacrifice some availability property due to the master-slave architecture. Some companies have ended up creating yet another distributed transactional databases from scratch (such as CockroachDB and TiDB) to overcome such problem.
ScalarDB is a simple and practical solution to solve the above-mentioned problem in a different way. It provides a simple storage abstraction layer on the existing storage implementations and a storage-agnostic universal transaction manager on top of the storage abstraction[1]. So, it can fully utilize, not only battle-tested existing implementations, operational/management tools and good properties of storages but also the eco-system, the best practices, and the community which have grown for a long time.
Design Goals
The primary design goal of ScalarDB is achieving ACID transaction capability without losing high availability, horizontal scalability, and strong consistency of underlying storage. The performance of the ScalarDB is highly dependent on the underlying storage performance and is usually slower than other scratch-built distributed databases since it adds a storage abstraction layer and storage-oblivious transaction layer, which makes various optimizations hard.
High-level Architecture
ScalarDB is composed of universal transaction manager, storage abstraction, and storage adapters. Universal transaction manager and storage abstraction are storage-agnostic. On the other hand, storage adapters are storage-specific so there is an adapter for each storage implementation.

Data Model
The data model of ScalarDB is a multi-dimensional map based on the key-value data model. A logical record is composed of partition-key, clustering-key and a set of values. The value is uniquely mapped by a primary key composed of partition-key, clustering-key and value-name as described in the following scheme.
(partition-key, clustering-key, value-name) -> value-content

Physical Data Model
ScalarDB is a multi-dimensional map distributed to multiple nodes by key-based hash partitioning. Records are assumed to be hash-partitioned by partition-key (even though an underlying implementation may support range partitioning). Records with the same partition-key define a partition. A partition is clustered (sorted) by the clustering-key. It is similar to Google BigTable [2] but it differs in clustering-key structure and partitioning scheme.
Limitation of the data model
Since records in ScalarDB are assumed to be hash-partitioned, global range scan is not supported. Range scan is only supported for clustering-key access within the same partition.