Chapter 1. Introduction

Table of Contents

Design Considerations
Comparing to Traditional RDBMS
Architecture
Architectural Diagram

Sensei is an open-source, real-time, full-text searchable distributed database that is designed to handle the following type of queries:

    SELECT f1,f2...fn FROM members
    WHERE c1 AND c2 AND c3..
    MATCH (fulltext query, e.g. "java engineer")
    GROUP BY fx,fy,fz...
    ORDER BY fa,fb...
    LIMIT offset,count

Sensei is written in Java and is built on top of several other open-source software systems (see Figure 1.1, “Sensei and Its Foundation”):

Figure 1.1. Sensei and Its Foundation

Sensei and Its Foundation

Design Considerations

As another NoSQL system (http://nosql-database.org/), Sensei is designed and built with the following considerations:

  • Data

    • Fault tolerance - when one replication is down, data is still accessible

    • Durability - N copies of data is stored

    • Through-put - Parallelizable request-handling on different nodes/data replicas, designed to handle internet traffic

    • Consistency - Eventally consistent

    • Data recovery - each shared/replica is noted with a watermark for data recovery

    • Large dataset - designed to handle 100s millions - billions of rows

  • Horizontally Scalable

    • Data is partitioned - so work-load is also distributed

    • Elasticity - Nodes can be added to accomodate data growth

    • Online expansion - Cluster can grow while handling online requests

    • Online cluster management - Cluster topology can change while handling online requests

    • Low operational/maintenance costs - Push it, leave it and forget it

  • Performance

    • Low indexing latency - real-time update

    • Low search latency - millisecond query response time

    • Low volatility - low variance in both indexing and search latency

  • Customizability

    • Plug-in framework - custom query handling logic

    • Routing factory - custom routing logic, default: round-robin

    • Index sharding strategy - different sharding strategy for different applications, e.g. time, mod etc.