Indexing & Gateway

Getting data into Sensei.

Data Events

Data events are units of indexing activities. Each data event is a tuple of (type,data,version). (versions are interpreted by Gateway semantics)

Types of data events:

Examples of data events

Add event:

{"type":"add","data":{"id":1,"contents":"sensei is cool","attrib":"opensource"}}

since add is the default event type, this is equivalent to:

{"id":1,"contents":"sensei is cool","attrib":"opensource"}

for add events with the same id, newer events overwrite existing events

Delete event:

{"type":"delete","id":1}

If no such event exists, this event is a no-op.

We will be supporting partial updates in the next release.

Data Stream

Stream of data events that Sensei consumes from via Gateways.

Some properties of Data Streams:

Some examples of data streams:

Gateway

Gateways are integration components between Sensei and Data Streams. Gateways serve the following purposes:

The following diagram illustrates how gateways fit into Sensei:

Sensei comes with the following pre-written gateways:

Custom Gateways

Data comes in from many different sources. While we are busy supporting more sources by writing Gateways, you can write your own custom gateways.

To write your own custom gateway, follow the following steps:

  1. Implement SenseiGateway interface:
    
    public class MyAwesomeGateway extends SenseiGateway {
    
      public MyAwesomeGateway(Configuration conf) {
        super(conf);
      }
    
      @Override
      public StreamDataProvider buildDataProvider(
          DataSourceFilter dataFilter, String oldSinceKey,
          ShardingStrategy shardingStrategy, Set partitions)
          throws Exception {
          
          // build a StreamDataProvider instance
          
          ...
      }
    
      @Override
      public Comparator getVersionComparator() {
    
        // tell us versioning semantics
        
        ...
      }
    
    }
    
         
  2. Like any other extensions to Sensei, build it into a jar and copy it into the conf/ext directory, where conf contains configuration files.
  3. Edit sensei.properties to configure your Gateway:
        
        sensei.gateway.class = <custom gateway class>
        
        # parameters set to the gateway
        sensei.gateway.param1 = <value1>
        sensei.gateway.param2 = <value2>
         

Now when you start Sensei, e.g.

./bin/start-sensei-node.sh <conf-dir>
data should be flowing into Sensei.

To query Sensei, go to: Clients & APIs »

Batch indexing

Batch indexing over Hadoop is supported by Sensei. Go to: Hadoop Bootstrap »