SenseiDB - FAQ

Frequently Asked Questions

Why did Sensei model Rest API after ElasticSearch Query DSL

Aside from a few differences that are specific to Sensei, the Sensei Rest API is largely inspired by ElasticSearch's Query DSL. ElasticSearch's Query DSL is very well designed in exposing Lucene's query capabilities and features in an elegant way. We didn't think it was necessary to create yet another API that is completely different.

We have been engaging in dialogs with the ElasticSearch team in standardizing on a common API.

Why is Sensei use a pull-model for indexing instead of a push model like many other data systems, e.g. Solr, HBase, Cassandra, MongoDB etc.?

One requirement for Sensei is to have extremely fast update rate (thousands per second) while not compromising on search performance. Having a push-model while maintaining data consistency between replications implies a cost per udpate event. Furthermore, this is an anti-scaling pattern as number of replications grows to accomodate high avalability, the cost for update increases. So as you add more machines to handle more traffic, this slows down update rate.

Therefore, it is a conscious design decision to avoid this cost. Though each replication is consuming from the data stream at presumablly different rates, each at Consistency is resolved a query time by consistent hashing on the routing parameter specified on the request.

By having a data stream for consumption also provides the benefit of have a replaying mechanism which is very helpful in the cases of data re-balancing and re-indexing.

Are there plans to support dynamic schema like MongoDB or ElasticSearch?

Yes. We are planning for early 2012 to have design in place.

Is BQL only for illustration purposes or is it a supported method to query Sensei?

No, BQL is not only for illustration purposes, it is real.

We are in the process of finalizing specs. for BQL.

Our intention is to support BQL as a first-class citizen in querying Sensei.

Is Sensei being developed outside of LinkedIn?

Sensei is mostly under actively development within LinkedIn although we have been working with the community for quite a while, and there are some deployements outside of LinkedIn.

While driving the project within LinkedIn benefits from traffic and data resources, we do believe letting the community drive and planning the roadmap is beneficial in the long run. So if you have passion and energy, visit Contribute page and contribute.

How mature is Sensei?

Sensei is powering LinkedIn's homepage and the Signal application for the past year in production. Sensei components also have been powering many LinkedIn's Search properties, e.g. People, Jobs, Company etc. for years.

We are interested in your help with other usages and deployments to make Sensei even more stable and performant!

Seems indexing has stopped for no reason, what is wrong?

It is possible one of the data events is erroneous, and by default Sensei stops indexing to maintain a consistent view. Look in the logs and see if you see an indexing exception.

In applications where it is ok to skip bad records, you can set the sensei.index.skipBadRecords property in your sensei.properties, e.g.:

	sensei.index.skipBadRecords = true  # default false

Be very careful with this setting because this may potentially cause inconsistency between node replicas

After running a while, Sensei throws OOM (OutOfMemoryError), what is going on?

By default, Sensei only allocates 1GB if heap space. As you accumulate documents and grow your corpus size, you need to configure the HEAP_OPTS in the sensei.properties file accordingly.

	HEAP_OPTS="-Xmx1g -Xms1g -XX:NewSize=256m"

Got following compilation error, what is wrong?

	sensei-core/src/main/java/com/senseidb/search/node/AbstractConsistentHashBroker.java:[217,47] type parameters of <T> T cannot be determined; no unique maximal instance exists for type variable T with upper bounds REQUEST,java.lang.Object

This is a known Java 1.6 bug found in earlier builds. Upgrade your JDK to a more recent version.

When building the sensei-gateway.jar, got compilation errors with Maven not able to find Kafka, what is wrong?

Apache Kafka is not yet mavenized. You would have to deploy that into your local maven repository before compiling Sensei gateways. To do this:

	./bin/mvn-install.sh

Documentation

Developer

Project