Data guarantees
Things we promised on how we manage your data
ACID-ity
Atomicity & Isolation
Each Sensei cluster talks to a single data stream. And properties of a data stream guarantees versioning and ordering, atomicity and isolation concerns are left to the data event producers when writing into the data stream.
Durability
Sensei provides a high-level of durability by maintaining N replicas of each shard to guarantee a level of availability and fault-tolerance. Which means your data has N replications or backups.
Consistency
Each Sensei node consumes from the data stream independently without a quorum to maintain a high-level of indexing performance. As a trade-off, consistency between replicas are relaxed, but the eventual consistency is still guaranteed via the semantic of the data stream being versioned and ordered.
To prevent subsequent queries in one search session (e.g. result paging) to produce inconsistent results, a routing parameter indicating a search session can be set on the request, and the request routing logic will route to the same replica via consistent hashing thus guarantee consistency of results in a given search session.
Elasticity
Elasticity is a scalability measurement on how easily a data system grows to accomodate data volume.
In single-shard systems, dealing with growing request load is typically handled by adding replications to the cluster.
As data size grows, there comes a point to shard data and adding shards to the system poses a challenging problem, and is a typical concern for data elasticity.
In some sharding strategies like by recency on time intervals, or on continuous primary id ranges, new shard can just be added to the Sensei cluster, and Zookeeper events are propagated, resulting in partition to node list maps in brokers all being updated dynamically. Scatter-gather logic would immediately route to the new shards and the cluster expands in a seemless fashion.
The above scenario is possible because the sharding strategy assumes no data re-balancing is needed, and it is not necessarily true for all applications. In situations where data needs to be re-balanced when new shards are introduced, e.g. sharding by mod some id or key, and rebalancing step needs to be applied to the entire cluster. This is currently not supported in this version of Sensei, but the functionality will be avaiable in future releases.
Some comparisons with a traditional RDBMS
RDBMS:
- scales vertically
- strong ACID guarantee
- relational support
- performance cost with full-text integration
- high query latency with large dataset, e.g. Group By
- indexes needs to be built for all sort possibilities offline
Sensei:
- scales horizontally
- relaxed Consistency with high durability guarantees
- data is streamed in, so Atomicity and Isolation is to be handled by the data producer
- deep full-text integration
- low query latency with arbitrarily large dataset
- dynamic sorting, index is already built for all sortable fields and their combinations