Table of Contents
Sensei is an open-source, real-time, full-text searchable distributed database that is designed to handle the following type of queries:
SELECT f1,f2...fn FROM members WHERE c1 AND c2 AND c3.. MATCH (fulltext query, e.g. "java engineer") GROUP BY fx,fy,fz... ORDER BY fa,fb... LIMIT offset,count
Sensei is written in Java and is built on top of several other open-source software systems (see Figure 1.1, “Sensei and Its Foundation”):
Bobo (http://sna-projects.com/bobo/): a faceted search implementation written in Java, using Lucene as the underlying search and indexing engine.
Zoie (http://sna-projects.com/zoie/): a real-time search and indexing system built on Lucene.
Lucene (http://lucene.apache.org/): a high-performance, full-featured text search engine library written entirely in Java.
Norbert (http://sna-projects.com/norbert/): a library that provides easy cluster management and workload distribution. Norbert is built on ZooKeeper (http://zookeeper.apache.org/) and Netty (http://www.jboss.org/netty).
As another NoSQL system (http://nosql-database.org/), Sensei is designed and built with the following considerations:
Data
Fault tolerance - when one replication is down, data is still accessible
Durability - N copies of data is stored
Through-put - Parallelizable request-handling on different nodes/data replicas, designed to handle internet traffic
Consistency - Eventally consistent
Data recovery - each shared/replica is noted with a watermark for data recovery
Large dataset - designed to handle 100s millions - billions of rows
Horizontally Scalable
Data is partitioned - so work-load is also distributed
Elasticity - Nodes can be added to accomodate data growth
Online expansion - Cluster can grow while handling online requests
Online cluster management - Cluster topology can change while handling online requests
Low operational/maintenance costs - Push it, leave it and forget it
Performance
Low indexing latency - real-time update
Low search latency - millisecond query response time
Low volatility - low variance in both indexing and search latency
Customizability