SenseiDB - Indexing & Gateway

Indexing & Gateway

Getting data into Sensei.

Data Events

Data events are units of indexing activities. Each data event is a tuple of (type,data,version). (versions are interpreted by Gateway semantics)

Types of data events:

add (default) - inserts data into Sensei
delete - deletes data from Sensei

Examples of data events

Add event:

{"type":"add","data":{"id":1,"contents":"sensei is cool","attrib":"opensource"}}

since add is the default event type, this is equivalent to:

{"id":1,"contents":"sensei is cool","attrib":"opensource"}

for add events with the same id, newer events overwrite existing events

Delete event:

{"type":"delete","id":1}

If no such event exists, this event is a no-op.

We will be supporting partial updates in the next release.

Data Stream

Stream of data events that Sensei consumes from via Gateways.

Some properties of Data Streams:

Versioned - each event on the stream has a monotonically increasing value indicating a unique point in the stream
Ordered - ordering of all events should reflect the semantics of the application

Some examples of data streams:

Twitter stream
Activity tracking data, e.g. LinkedIn profile creation and update events
Server logs

Gateway

Gateways are integration components between Sensei and Data Streams. Gateways serve the following purposes:

Read data events from data streams and proxy between Sensei and the data stream
Interpret version semantics of data events read from the data stream

The following diagram illustrates how gateways fit into Sensei:

Sensei comes with the following pre-written gateways:

File - each line is a data event, and line number is the version.

Configuration:

    sensei.gateway.class=com.senseidb.gateway.file.LinedFileDataProviderBuilder
    sensei.gateway.file.path = <file-name>

JMS - each message is a data event, System.nanoTime() is used as version.

Configuration:

    sensei.gateway.class=com.senseidb.gateway.jms.JmsDataProviderBuilder
    sensei.gateway.jms.topic = <topic name>
    sensei.gateway.jms.clientId = <client id>
    sensei.gateway.jms.topicFactory = <topic factory class name>
    sensei.gateway.jms.connectionFactory = <connection factory class name>

JDBC - each element in the ResultSet is a data event, developer implements version semantics.

Configuration:

    sensei.gateway.class=com.senseidb.gateway.jdbc.JdbcDataProviderBuilder
    sensei.gateway.jdbc.url=<url name>	
    sensei.gateway.jdbc.username=<username>	
    sensei.gateway.jdbc.password=<password>	
    sensei.gateway.jdbc.driver=<driver class name>	
    sensei.gateway.jdbc.username=<url name>	
    sensei.gateway.jdbc.username=<url name>	
    jdbc.adaptor.class=<The class name of the custom jdbc adapter>

Kafka - each message is a data event, and offset is the version.

Configuration:

    sensei.gateway.class=com.senseidb.gateway.kafka.KafkaDataProviderBuilder
    sensei.gateway.kafka.zookeeperUrl=<zookeeper Url>
    sensei.gateway.kafka.consumerGroupId=<consumerGroupId>
    sensei.gateway.kafka.topic=<kafka topic name>
    sensei.gateway.kafka.timeout=<connection timeout>
    sensei.gateway.kafka.batchsize=<kafka batch size>

Custom Gateways

Data comes in from many different sources. While we are busy supporting more sources by writing Gateways, you can write your own custom gateways.

To write your own custom gateway, follow the following steps:

Implement SenseiGateway interface:


public class MyAwesomeGateway extends SenseiGateway {

  public MyAwesomeGateway(Configuration conf) {
    super(conf);
  }

  @Override
  public StreamDataProvider buildDataProvider(
      DataSourceFilter dataFilter, String oldSinceKey,
      ShardingStrategy shardingStrategy, Set partitions)
      throws Exception {
      
      // build a StreamDataProvider instance
      
      ...
  }

  @Override
  public Comparator getVersionComparator() {

    // tell us versioning semantics
    
    ...
  }

}

Like any other extensions to Sensei, build it into a jar and copy it into the conf/ext directory, where conf contains configuration files.

Edit sensei.properties to configure your Gateway:

    
    sensei.gateway.class = <custom gateway class>
    
    # parameters set to the gateway
    sensei.gateway.param1 = <value1>
    sensei.gateway.param2 = <value2>

Now when you start Sensei, e.g.

./bin/start-sensei-node.sh <conf-dir>

data should be flowing into Sensei.

To query Sensei, go to: Clients & APIs »

Batch indexing

Batch indexing over Hadoop is supported by Sensei. Go to: Hadoop Bootstrap »

Documentation

Developer

Project

Indexing & Gateway

Data Events

Examples of data events

Add event:

Delete event:

Data Stream

Gateway

Custom Gateways

Batch indexing