System Configuration

A Sensei node is configured via the sensei.properties, which uses the format supported by Apache Commons Configuration (http:/commons.apache.org/). This file consists of the following five parts:

  1. server: port to listen on, rpc parameters, etc.

  2. cluster: cluster manager, sharding, request routing, etc.

  3. indexing: data interpretation, tokenization, indexer type, etc.

  4. broker and client: e.g. entry into Sensei system

  5. plugins: e.g. customized facet handlers

Below is the configuration file for the demo (available from https://github.com/javasoze/sensei/blob/master/conf/sensei.properties):

# sensei node parameters
sensei.node.id=1
sensei.node.partitions=0,1

# sensei network server parameters
sensei.server.port=1234
sensei.server.requestThreadCorePoolSize=20
sensei.server.requestThreadMaxPoolSize=70
sensei.server.requestThreadKeepAliveTimeSecs=300

# sensei cluster parameters
sensei.cluster.name=sensei
sensei.cluster.url=localhost:2181
sensei.cluster.timeout=30000

# sensei indexing parameters
sensei.index.directory = index/cardata

sensei.index.batchSize = 10000
sensei.index.batchDelay = 300000
sensei.index.maxBatchSize = 10000
sensei.index.realtime = true
sensei.index.freshness = 10000

# index manager parameters

sensei.index.manager.default.maxpartition.id = 1
sensei.index.manager.default.type = file
sensei.index.manager.default.file.path = data/cars.json

# plugins: from plugins.xml

# analyzer, default: StandardAnalyzer
# sensei.index.analyzer = myanalyzer

# similarity, default: DefaultSimilarity
# sensei.index.similarity = mysimilarity

# indexer type, zoie/hourglass/<custom name>

sensei.indexer.type=zoie

#extra parameters for hourglass

#sensei.indexer.hourglass.schedule

# retention
#sensei.indexer.hourglass.trimthreshold

# frequency for a roll, minute/hour/day
#sensei.indexer.hourglass.frequency

# sensei
# version comparator, default: ZoieConfig.DefaultVersionComparator
# sensei.version.comparator = myVersionComparator

# extra services
sensei.plugin.services =

# broker properties
sensei.broker.port = 8080
sensei.broker.minThread = 50
sensei.broker.maxThread = 100
sensei.broker.maxWaittime = 2000

sensei.broker.webapp.path=src/main/webapp
sensei.search.cluster.name = sensei
sensei.search.cluster.zookeeper.url = localhost:2181
sensei.search.cluster.zookeeper.conn.timeout = 30000
sensei.search.cluster.network.conn.timeout = 1000
sensei.search.cluster.network.write.timeout = 150
sensei.search.cluster.network.max.conn.per.node = 5
sensei.search.cluster.network.stale.timeout.mins = 10
sensei.search.cluster.network.stale.cleanup.freq.mins = 10

# custom router factory
# sensei.search.router.factory = myRouterFactory

1

This lines starts the server configurations.

2

This lines starts the cluster configurations.

3

This lines starts the indexing configurations.

4

This lines starts the plugins configurations.

5

This lines starts the broker and client configurations.

In the following sections, we are going to explain every configuration property in each part: what the property type is, whether it is required, what the default value is, and how it is used, etc.

Server Properties

sensei.node.id
  • Type: int

  • Required: Yes

  • Default: None

This is the node ID of the Sensei node in a cluster.

sensei.node.partitions
  • Type: String (comma separated integers or ranges)

  • Required: Yes

  • Default: None

This specifies the partitions IDs this the Sensei server is going to handle. Partition IDs can be given as either integer numbers or ranges, separated by commas. For example, the following line denotes that the Sensei server has six partitions: 1,4,5,6,7,10.

  sensei.node.partitions=1,4-7,10
sensei.server.port
  • Type: int

  • Required: Yes

  • Default: None

This is the Sensei server port number.

sensei.server.requestThreadCorePoolSize
  • Type: int

  • Required: No

  • Default: 20

This is the core size of thread pool used to execute requests.

sensei.server.requestThreadKeepAliveTimeSecs
  • Type: int

  • Required: No

  • Default: 300

This is the length of time in seconds to keep an idle request thread alive.

sensei.server.requestThreadMaxPoolSize
  • Type: int

  • Required: No

  • Default: 70

This is the maximum size of thread pool used to execute requests.

Cluster Properties

sensei.cluster.name
  • Type: String

  • Required: Yes

  • Default: None

This is the name of the Sensei server cluster.

sensei.cluster.timeout
  • Type: int

  • Required: No

  • Default: 300000

This is the session timeout value, in milliseconds, that is passed to ZooKeeper.

sensei.cluster.url
  • Type: String

  • Required: Yes

  • Default: None

This is the ZooKeeper URL for the Sensei cluster.

Indexing Properties

sensei.index.analyzer

See sensei.index.analyzer in the section called “Plug-in Properties”.

sensei.index.batchDelay
  • Type: int

  • Required: No

  • Default: 300000

This is the maximum time to wait in milliseconds before flushing index events to disk. The default value is 300000 (i.e. 5 minutes).

sensei.index.batchSize
  • Type: int

  • Required: No

  • Default: 10000

This is the batch size to control the pace of data event consumption on the back-end. It is the soft size limit of each event batch. If the events come in too fast and the limit is already reached, then the indexer will block the incoming events until the number of buffered events drop below this limit after some of the events are sent to the background data consumer.

sensei.index.custom

See sensei.index.custom in the section called “Plug-in Properties”.

sensei.index.directory
  • Type: String

  • Required: Yes

  • Default: None

This is the directory used to save the index.

sensei.index.freshness
  • Type: long

  • Required: No

  • Default: 500

This controls the freshness of entries in the index reader cache.

sensei.index.interpreter

See sensei.index.interpreter in the section called “Plug-in Properties”.

sensei.index.manager

See sensei.index.manager in the section called “Plug-in Properties”.

sensei.index.manager.default.batchSize
  • Type: int

  • Required: No

  • Default: 1

This is the batch size to control when data events accumulated in the default index manger should be consumed by the data consumer. The default value is 1.

sensei.index.manager.default.eventsPerMin
  • Type: int

  • Required: No

  • Default: 40000

This is the maximum number of data events that the indexer can consume per minute. If this threshold is exceeded, the indexer will pause for a short period of time before continuing to consume incoming data events.

This property is helpful in preventing the indexer from being overloaded. The default value is 40,000.

sensei.index.manager.default.maxpartition.id
  • Type: int

  • Required: Yes, if the default indexing manager is chosen; No, otherwise.

  • Default: None

This is the maximum partition ID number served by this Sensei cluster if the default Sensei indexing manager is used.

Warning

This property is different from the total number of partitions in a Sensei cluster. For example, if a cluster contains 4 partitions, 0, 1, 2, and 3, then sensei.index.manager.default.maxpartition.id should be set to 3.

sensei.index.manager.default.shardingStrategy

See sensei.index.manager.default.shardingStrategy in the section called “Plug-in Properties”.

sensei.index.manager.default.type

See sensei.index.manager.default.type in the section called “Plug-in Properties”.

sensei.index.maxBatchSize
  • Type: int

  • Required: No

  • Default: 10000

This is the maximum number of incoming data events that can be held by the indexer in a batch before they are flushed to disk. If this number is exceeded, the indexer will stop processing the data events for one minute.

sensei.index.realtime
  • Type: boolean

  • Required: No

  • Default: true

This specifies whether the indexing mode is real-time or not.

sensei.index.similarity

See sensei.index.similarity in the section called “Plug-in Properties”.

sensei.indexer.type
  • Type: String

  • Required: Yes

  • Default: None

This is the internal indexer type used by the Sensei cluster. Currently only two options are supported: zoie and hourglass. If hourglass is used, three more properties need to be set too:

  1. sensei.indexer.hourglass.schedule

  2. sensei.indexer.hourglass.trimthreshold

  3. sensei.indexer.hourglass.frequency

sensei.indexer.hourglass.frequency
  • Type: String

  • Required: No

  • Default: "day"

This is the rolling forward frequency. It has to be one of the following three values:

  • day

  • hour

  • minute

sensei.indexer.hourglass.schedule
  • Type: String

  • Required: Yes, if property sensei.indexer.type is set to "hourglass"; No, otherwise.

  • Default: None

This is a string that specifies Hourglass rolling forward schedule. The format of this string is "ss mm hh", meaning at hh:mm:ss time of the day that we roll forward for daily rolling. If it is hourly rolling, we roll forward at mm:ss time of the hour. If it is minutely rolling, we roll forward at ss seond of the minute.

sensei.indexer.hourglass.trimthreshold
  • Type: int

  • Required: No

  • Default: 14

This is the retention period for how long we are going to keep the events in the index. The unit is the rolling period.

Broker and Client Properties

sensei.broker.maxThread
  • Type: int

  • Required: No

  • Default: 50

This is the maximum size of thread pool used by a broker to execute requests.

sensei.broker.maxWaittime
  • Type: int

  • Required: No

  • Default: 2000

This is the maximum idle time in milliseconds for a thread on a broker. Threads that are idle for longer than this period may be stopped.

sensei.broker.minThread
  • Type: int

  • Required: No

  • Default: 20

This is the core size of thread pool used by the broker to execute requests.

sensei.broker.port
  • Type: int

  • Required: Yes

  • Default: None

This is the port number of the Sensei broker.

sensei.broker.webapp.path
  • Type: String

  • Required: Yes

  • Default: None

This is the resource base of the broker web application.

sensei.search.cluster.zookeeper.url
  • Type: String

  • Required: Yes

  • Default: None

This is the ZooKeeper URL for the Sensei search cluster that a broker talks to.

sensei.search.cluster.name
  • Type: String

  • Required: Yes

  • Default: None

This is the Sensei cluster name, i.e. the service name for the network clients and brokers.

sensei.search.cluster.zookeeper.conn.timeout
  • Type: int

  • Required: No

  • Default: 10000

This is the ZooKeeper network client session timeout value in milliseconds.

sensei.search.cluster.network.conn.timeout
  • Type: int

  • Required: No

  • Default: 1000

This is the maximum number of milliseconds to allow a connection attempt to take.

sensei.search.cluster.network.write.timeout
  • Type: int

  • Required: No

  • Default: 150

This is the number of milliseconds a request can be queued for write before it is considered stale.

sensei.search.cluster.network.max.conn.per.node
  • Type: int

  • Required: No

  • Default: 5

This is the maximum number of open connections to a node.

sensei.search.cluster.network.stale.timeout.mins
  • Type: int

  • Required: No

  • Default: 10

This is the number of minutes to keep a request that is waiting for a response.

sensei.search.cluster.network.stale.cleanup.freq.mins
  • Type: int

  • Required: No

  • Default: 10

This is the frequency to clean up stale requests.

Plug-in Properties

sensei.index.analyzer
  • Type: String

  • Required: No

  • Default: ""

This specifies the bean ID of the analyzer plug-in for analyzing text. If not specified, org.apache.lucene.analysis.standard.StandardAnalyzer will be used.

sensei.index.similarity
  • Type: String

  • Required: No

  • Default: ""

This specifies the bean ID of similarity plug-in for Lucene scoring. If not specified, org.apache.lucene.search.DefaultSimilarity is used.

sensei.index.custom
  • Type: String

  • Required: No

  • Default: ""

This specifies the bean ID of the custom indexing pipeline implementation. A custom indexing pipeline can be plugged into the indexing process to allow users to modify generated Lucene documents at the last step before they are indexed.

A custom indexing pipeline has to implement interface com.sensei.indexing.api.CustomIndexingPipeline.

sensei.index.interpreter
  • Type: String

  • Required: No

  • Default: ""

This specifies the bean ID of the interpretor of Zoie indexables. If not specified, com.sensei.indexing.api.DefaultJsonSchemaInterpreter is used.

sensei.index.manager
  • Type: String

  • Required: No

  • Default: ""

This specifies the bean ID of the indexing manager object implementing com.sensei.search.nodes.SenseiIndexingManager. If not specified, com.sensei.indexing.api.DefaultStreamingIndexingManager is used.

sensei.index.manager.default.type
  • Type: String

  • Required: Yes if sensei.index.manager is not specified, i.e. the default indexing manager is used.

  • Default: None

This specifies the type of gateway that will be used by the default indexing manager. The value identifies the bean ID of an object of com.sensei.indexing.api.gateway.SenseiGateway.

Several built-in gateways are provided by Sensei, but you can always define your own based on your need. No matter a built-in gateway or a custom gateway is used, additional parameters can be specified under the names with prefix sensei.index.manager.default.<gateway-type>.

Currently the following built-in gateway types are supported:

sensei.index.manager.default.<gateway-type>.filter
  • Type: String

  • Required: No

  • Default: None

This is the bean ID of com.sensei.indexing.api.DataSourceFilter object. No matter what gateway type the indexing managers uses, a filter can be plugged in to get the original source data converted to the JSON format defined by the table schema. If the input data is already in the right format, then this filter is not needed.

sensei.index.manager.default.shardingStrategy
  • Type: String

  • Required: No

  • Default: ""

This is the bean ID of the sharding strategy.

sensei.search.router.factory
  • Type: String

  • Required: No

  • Default: ""

This is the bean ID of the Sensei request router factory. This factory builds the load balancer that is used by Sensei brokers to route incoming requests to different Sensei nodes.

sensei.version.comparator
  • Type: String

  • Required: No

  • Default: ""

This specifies the bean ID of version comparator plug-in to be used by the indexer. If not specified, Zoie's default version comparator is used.

File Gateway Properties

For file gateway, the following property has to be specified:

sensei.index.manager.default.file.path
  • Type: String

  • Required: Yes

  • Default: None

This is the path to the input data file.

Kafka Gateway Properties

For kafka gateway, the following properties should/can be specified: [2]

sensei.index.manager.default.kafka.batchsize
  • Type: String

  • Required: Yes

  • Default: None

This is the batch size for each pull request.

sensei.index.manager.default.kafka.host
  • Type: String

  • Required: Yes

  • Default: None

This is the host name of the Kafka server.

sensei.index.manager.default.kafka.port
  • Type: int

  • Required: Yes

  • Default: None

This is the port number on which the Kafka server is listening for connections.

sensei.index.manager.default.kafka.timeout
  • Type: int

  • Required: Yes

  • Default: 10000

This is the socket timeout in milliseconds.

sensei.index.manager.default.kafka.topic
  • Type: String

  • Required: Yes

  • Default: None

The topic of the messages to be fetched.

JMS Gateway Properties

For jms gateway, the following properties should/can be specified:

sensei.index.manager.default.jms.clientId
  • Type: String

  • Required: Yes

  • Default: None

This is the client identifier used to connect to the JMS provider.

sensei.index.manager.default.jms.topic
  • Type: String

  • Required: Yes

  • Default: None

This is the topic name that the JMS client subscribes to.

sensei.index.manager.default.jms.topicFactory
  • Type: String

  • Required: Yes

  • Default: None

This is the bean ID of the proj.zoie.dataprovider.jms.TopicFactory object. This object is used to generate a topic object based on the given topic name.

sensei.index.manager.default.jms.connectionFactory
  • Type: String

  • Required: Yes

  • Default: None

This is the bean ID of the javax.jms.TopicConnectionFactory object, which is used by the JMS client to create a javax.jms.TopicConnection object with the JMS provider.

JDBC Gateway Properties

For jdbc gateway, the following properties should/can be specified:

sensei.index.manager.default.jdbc.adaptor
  • Type: String

  • Required: Yes

  • Default: None

This is the bean ID of the com.sensei.indexing.api.jdbc.SenseiJDBCAdaptor object. This object is used to build a proj.zoie.dataprovider.jdbc.PreparedStatementBuilder object, which is required by proj.zoie.dataprovider.jdbc.JDBCStreamDataProvider.

sensei.index.manager.default.jms.driver
  • Type: String

  • Required: Yes

  • Default: None

This is the class name of the JDBC driver that you want to use.

sensei.index.manager.default.jms.password
  • Type: String

  • Required: Yes

  • Default: None

This is the password for the user name that you use to connect to the database.

sensei.index.manager.default.jms.username
  • Type: String

  • Required: Yes

  • Default: None

This is the user name that you use to connect to the database.



[2] These properties are basically the parameters needed by the Kafka consumer API. The Simple Consumer API from Kafka is used by Sensei.