SenseiDB - Data Guarantees

Data guarantees

Things we promised on how we manage your data

ACID-ity

Atomicity & Isolation

Each Sensei cluster talks to a single data stream. And properties of a data stream guarantees versioning and ordering, atomicity and isolation concerns are left to the data event producers when writing into the data stream.

Durability

Sensei provides a high-level of durability by maintaining N replicas of each shard to guarantee a level of availability and fault-tolerance. Which means your data has N replications or backups.

Consistency

Each Sensei node consumes from the data stream independently without a quorum to maintain a high-level of indexing performance. As a trade-off, consistency between replicas are relaxed, but the eventual consistency is still guaranteed via the semantic of the data stream being versioned and ordered.

To prevent subsequent queries in one search session (e.g. result paging) to produce inconsistent results, a routing parameter indicating a search session can be set on the request, and the request routing logic will route to the same replica via consistent hashing thus guarantee consistency of results in a given search session.

Elasticity

Elasticity is a scalability measurement on how easily a data system grows to accomodate data volume.

In single-shard systems, dealing with growing request load is typically handled by adding replications to the cluster.

As data size grows, there comes a point to shard data and adding shards to the system poses a challenging problem, and is a typical concern for data elasticity.

In some sharding strategies like by recency on time intervals, or on continuous primary id ranges, new shard can just be added to the Sensei cluster, and Zookeeper events are propagated, resulting in partition to node list maps in brokers all being updated dynamically. Scatter-gather logic would immediately route to the new shards and the cluster expands in a seemless fashion.

The above scenario is possible because the sharding strategy assumes no data re-balancing is needed, and it is not necessarily true for all applications. In situations where data needs to be re-balanced when new shards are introduced, e.g. sharding by mod some id or key, and rebalancing step needs to be applied to the entire cluster. This is currently not supported in this version of Sensei, but the functionality will be avaiable in future releases.

Some comparisons with a traditional RDBMS

RDBMS:

scales vertically
strong ACID guarantee
relational support
performance cost with full-text integration
high query latency with large dataset, e.g. Group By
indexes needs to be built for all sort possibilities offline

Sensei:

scales horizontally
relaxed Consistency with high durability guarantees
data is streamed in, so Atomicity and Isolation is to be handled by the data producer
deep full-text integration
low query latency with arbitrarily large dataset
dynamic sorting, index is already built for all sortable fields and their combinations

Documentation

Developer

Project