System Configuration

System Configuration
Prev	Chapter 9. Sensei Configuration	Next

A Sensei node is configured via the sensei.properties, which uses the format supported by Apache Commons Configuration (http:/commons.apache.org/). This file consists of the following five parts:

server: port to listen on, rpc parameters, etc.
cluster: cluster manager, sharding, request routing, etc.
indexing: data interpretation, tokenization, indexer type, etc.
broker and client: e.g. entry into Sensei system
plugins: e.g. customized facet handlers

Below is the configuration file for the demo (available from https://github.com/javasoze/sensei/blob/master/conf/sensei.properties):

# sensei node parameters
sensei.node.id=1
sensei.node.partitions=0,1

# sensei network server parameters
sensei.server.port=1234
sensei.server.requestThreadCorePoolSize=20
sensei.server.requestThreadMaxPoolSize=70
sensei.server.requestThreadKeepAliveTimeSecs=300

# sensei cluster parameters
sensei.cluster.name=sensei
sensei.cluster.url=localhost:2181
sensei.cluster.timeout=30000

# sensei indexing parameters
sensei.index.directory = index/cardata

sensei.index.batchSize = 10000
sensei.index.batchDelay = 300000
sensei.index.maxBatchSize = 10000
sensei.index.realtime = true
sensei.index.freshness = 10000

# index manager parameters

sensei.index.manager.default.maxpartition.id = 1
sensei.index.manager.default.type = file
sensei.index.manager.default.file.path = data/cars.json

# plugins: from plugins.xml

# analyzer, default: StandardAnalyzer
# sensei.index.analyzer = myanalyzer

# similarity, default: DefaultSimilarity
# sensei.index.similarity = mysimilarity

# indexer type, zoie/hourglass/<custom name>

sensei.indexer.type=zoie

#extra parameters for hourglass

#sensei.indexer.hourglass.schedule

# retention
#sensei.indexer.hourglass.trimthreshold

# frequency for a roll, minute/hour/day
#sensei.indexer.hourglass.frequency

# sensei
# version comparator, default: ZoieConfig.DefaultVersionComparator
# sensei.version.comparator = myVersionComparator

# extra services
sensei.plugin.services =

# broker properties
sensei.broker.port = 8080
sensei.broker.minThread = 50
sensei.broker.maxThread = 100
sensei.broker.maxWaittime = 2000

sensei.broker.webapp.path=src/main/webapp
sensei.search.cluster.name = sensei
sensei.search.cluster.zookeeper.url = localhost:2181
sensei.search.cluster.zookeeper.conn.timeout = 30000
sensei.search.cluster.network.conn.timeout = 1000
sensei.search.cluster.network.write.timeout = 150
sensei.search.cluster.network.max.conn.per.node = 5
sensei.search.cluster.network.stale.timeout.mins = 10
sensei.search.cluster.network.stale.cleanup.freq.mins = 10

# custom router factory
# sensei.search.router.factory = myRouterFactory

	This lines starts the server configurations.
	This lines starts the cluster configurations.
	This lines starts the indexing configurations.
	This lines starts the plugins configurations.
	This lines starts the broker and client configurations.

In the following sections, we are going to explain every configuration property in each part: what the property type is, whether it is required, what the default value is, and how it is used, etc.

Server Properties

sensei.node.id

Type: int
Required: Yes
Default: None

This is the node ID of the Sensei node in a cluster.

sensei.node.partitions

Type: String (comma separated integers or ranges)
Required: Yes
Default: None

This specifies the partitions IDs this the Sensei server is going to handle. Partition IDs can be given as either integer numbers or ranges, separated by commas. For example, the following line denotes that the Sensei server has six partitions: 1,4,5,6,7,10.

  sensei.node.partitions=1,4-7,10

sensei.server.port

Type: int
Required: Yes
Default: None

This is the Sensei server port number.

sensei.server.requestThreadCorePoolSize

Type: int
Required: No
Default: 20

This is the core size of thread pool used to execute requests.

sensei.server.requestThreadKeepAliveTimeSecs

Type: int
Required: No
Default: 300

This is the length of time in seconds to keep an idle request thread alive.

sensei.server.requestThreadMaxPoolSize

Type: int
Required: No
Default: 70

This is the maximum size of thread pool used to execute requests.

Cluster Properties

sensei.cluster.name

Type: String
Required: Yes
Default: None

This is the name of the Sensei server cluster.

sensei.cluster.timeout

Type: int
Required: No
Default: 300000

This is the session timeout value, in milliseconds, that is passed to ZooKeeper.

sensei.cluster.url

Type: String
Required: Yes
Default: None

This is the ZooKeeper URL for the Sensei cluster.

Indexing Properties

sensei.index.analyzer

See sensei.index.analyzer in the section called “Plug-in Properties”.

sensei.index.batchDelay

Type: int
Required: No
Default: 300000

This is the maximum time to wait in milliseconds before flushing index events to disk. The default value is 300000 (i.e. 5 minutes).

sensei.index.batchSize

Type: int
Required: No
Default: 10000

This is the batch size to control the pace of data event consumption on the back-end. It is the soft size limit of each event batch. If the events come in too fast and the limit is already reached, then the indexer will block the incoming events until the number of buffered events drop below this limit after some of the events are sent to the background data consumer.

sensei.index.custom

See sensei.index.custom in the section called “Plug-in Properties”.

sensei.index.directory

Type: String
Required: Yes
Default: None

This is the directory used to save the index.

sensei.index.freshness

Type: long
Required: No
Default: 500

This controls the freshness of entries in the index reader cache.

sensei.index.interpreter

See sensei.index.interpreter in the section called “Plug-in Properties”.

sensei.index.manager

See sensei.index.manager in the section called “Plug-in Properties”.

sensei.index.manager.default.batchSize

Type: int
Required: No
Default: 1

This is the batch size to control when data events accumulated in the default index manger should be consumed by the data consumer. The default value is 1.

sensei.index.manager.default.eventsPerMin

Type: int
Required: No
Default: 40000

This is the maximum number of data events that the indexer can consume per minute. If this threshold is exceeded, the indexer will pause for a short period of time before continuing to consume incoming data events.

This property is helpful in preventing the indexer from being overloaded. The default value is 40,000.

sensei.index.manager.default.maxpartition.id

Type: int
Required: Yes, if the default indexing manager is chosen; No, otherwise.
Default: None

This is the maximum partition ID number served by this Sensei cluster if the default Sensei indexing manager is used.

Warning

This property is different from the total number of partitions in a Sensei cluster. For example, if a cluster contains 4 partitions, 0, 1, 2, and 3, then sensei.index.manager.default.maxpartition.id should be set to 3.

sensei.index.manager.default.shardingStrategy

See sensei.index.manager.default.shardingStrategy in the section called “Plug-in Properties”.

sensei.index.manager.default.type

See sensei.index.manager.default.type in the section called “Plug-in Properties”.

sensei.index.maxBatchSize

Type: int
Required: No
Default: 10000

This is the maximum number of incoming data events that can be held by the indexer in a batch before they are flushed to disk. If this number is exceeded, the indexer will stop processing the data events for one minute.

sensei.index.realtime

Type: boolean
Required: No
Default: true

This specifies whether the indexing mode is real-time or not.

sensei.index.similarity

See sensei.index.similarity in the section called “Plug-in Properties”.

sensei.indexer.type

Type: String
Required: Yes
Default: None

This is the internal indexer type used by the Sensei cluster. Currently only two options are supported: zoie and hourglass. If hourglass is used, three more properties need to be set too:

sensei.indexer.hourglass.schedule
sensei.indexer.hourglass.trimthreshold
sensei.indexer.hourglass.frequency

sensei.indexer.hourglass.frequency

Type: String
Required: No
Default: "day"

This is the rolling forward frequency. It has to be one of the following three values:

day
hour
minute

sensei.indexer.hourglass.schedule

Type: String
Required: Yes, if property sensei.indexer.type is set to "hourglass"; No, otherwise.
Default: None

This is a string that specifies Hourglass rolling forward schedule. The format of this string is "ss mm hh", meaning at hh:mm:ss time of the day that we roll forward for daily rolling. If it is hourly rolling, we roll forward at mm:ss time of the hour. If it is minutely rolling, we roll forward at ss seond of the minute.

sensei.indexer.hourglass.trimthreshold

Type: int
Required: No
Default: 14

This is the retention period for how long we are going to keep the events in the index. The unit is the rolling period.

Broker and Client Properties

sensei.broker.maxThread

Type: int
Required: No
Default: 50

This is the maximum size of thread pool used by a broker to execute requests.

sensei.broker.maxWaittime

Type: int
Required: No
Default: 2000

This is the maximum idle time in milliseconds for a thread on a broker. Threads that are idle for longer than this period may be stopped.

sensei.broker.minThread

Type: int
Required: No
Default: 20

This is the core size of thread pool used by the broker to execute requests.

sensei.broker.port

Type: int
Required: Yes
Default: None

This is the port number of the Sensei broker.

sensei.broker.webapp.path

Type: String
Required: Yes
Default: None

This is the resource base of the broker web application.

sensei.search.cluster.zookeeper.url

Type: String
Required: Yes
Default: None

This is the ZooKeeper URL for the Sensei search cluster that a broker talks to.

sensei.search.cluster.name

Type: String
Required: Yes
Default: None

This is the Sensei cluster name, i.e. the service name for the network clients and brokers.

sensei.search.cluster.zookeeper.conn.timeout

Type: int
Required: No
Default: 10000

This is the ZooKeeper network client session timeout value in milliseconds.

sensei.search.cluster.network.conn.timeout

Type: int
Required: No
Default: 1000

This is the maximum number of milliseconds to allow a connection attempt to take.

sensei.search.cluster.network.write.timeout

Type: int
Required: No
Default: 150

This is the number of milliseconds a request can be queued for write before it is considered stale.

sensei.search.cluster.network.max.conn.per.node

Type: int
Required: No
Default: 5

This is the maximum number of open connections to a node.

sensei.search.cluster.network.stale.timeout.mins

Type: int
Required: No
Default: 10

This is the number of minutes to keep a request that is waiting for a response.

sensei.search.cluster.network.stale.cleanup.freq.mins

Type: int
Required: No
Default: 10

This is the frequency to clean up stale requests.

Plug-in Properties

sensei.index.analyzer

Type: String
Required: No
Default: ""

This specifies the bean ID of the analyzer plug-in for analyzing text. If not specified, org.apache.lucene.analysis.standard.StandardAnalyzer will be used.

sensei.index.similarity

Type: String
Required: No
Default: ""

This specifies the bean ID of similarity plug-in for Lucene scoring. If not specified, org.apache.lucene.search.DefaultSimilarity is used.

sensei.index.custom

Type: String
Required: No
Default: ""

This specifies the bean ID of the custom indexing pipeline implementation. A custom indexing pipeline can be plugged into the indexing process to allow users to modify generated Lucene documents at the last step before they are indexed.

A custom indexing pipeline has to implement interface com.sensei.indexing.api.CustomIndexingPipeline.

sensei.index.interpreter

Type: String
Required: No
Default: ""

This specifies the bean ID of the interpretor of Zoie indexables. If not specified, com.sensei.indexing.api.DefaultJsonSchemaInterpreter is used.

sensei.index.manager

Type: String
Required: No
Default: ""

This specifies the bean ID of the indexing manager object implementing com.sensei.search.nodes.SenseiIndexingManager. If not specified, com.sensei.indexing.api.DefaultStreamingIndexingManager is used.

sensei.index.manager.default.type

Type: String
Required: Yes if sensei.index.manager is not specified, i.e. the default indexing manager is used.
Default: None

This specifies the type of gateway that will be used by the default indexing manager. The value identifies the bean ID of an object of com.sensei.indexing.api.gateway.SenseiGateway.

Several built-in gateways are provided by Sensei, but you can always define your own based on your need. No matter a built-in gateway or a custom gateway is used, additional parameters can be specified under the names with prefix sensei.index.manager.default.<gateway-type>.

Currently the following built-in gateway types are supported:

file:
This type of gateway takes a regular text file as the input. Each line in the file contains a data entry in JSON format.
Only one property needs to be set for this gateway type. See the section called “File Gateway Properties”
kafka:
This type of gateway takes Kafka messages as input.
See the section called “Kafka Gateway Properties” for additional property information.
jms:
This type of gateway takes JMS (Java Messages Service) messages as input. The publish-and-subscribe messaging model is used by Sensei, so parameters like topic need to be provided.
See the section called “JMS Gateway Properties” for additional property information.
jdbc:
This type of gateway takes JDBC data as input.
See the section called “JDBC Gateway Properties” for additional property information.

sensei.index.manager.default.<gateway-type>.filter

Type: String
Required: No
Default: None

This is the bean ID of com.sensei.indexing.api.DataSourceFilter object. No matter what gateway type the indexing managers uses, a filter can be plugged in to get the original source data converted to the JSON format defined by the table schema. If the input data is already in the right format, then this filter is not needed.

sensei.index.manager.default.shardingStrategy

Type: String
Required: No
Default: ""

This is the bean ID of the sharding strategy.

sensei.search.router.factory

Type: String
Required: No
Default: ""

This is the bean ID of the Sensei request router factory. This factory builds the load balancer that is used by Sensei brokers to route incoming requests to different Sensei nodes.

sensei.version.comparator

Type: String
Required: No
Default: ""

This specifies the bean ID of version comparator plug-in to be used by the indexer. If not specified, Zoie's default version comparator is used.

File Gateway Properties

For file gateway, the following property has to be specified:

sensei.index.manager.default.file.path

Type: String
Required: Yes
Default: None

This is the path to the input data file.

Kafka Gateway Properties

For kafka gateway, the following properties should/can be specified: ^[2]

sensei.index.manager.default.kafka.batchsize

Type: String
Required: Yes
Default: None

This is the batch size for each pull request.

sensei.index.manager.default.kafka.host

Type: String
Required: Yes
Default: None

This is the host name of the Kafka server.

sensei.index.manager.default.kafka.port

Type: int
Required: Yes
Default: None

This is the port number on which the Kafka server is listening for connections.

sensei.index.manager.default.kafka.timeout

Type: int
Required: Yes
Default: 10000

This is the socket timeout in milliseconds.

sensei.index.manager.default.kafka.topic

Type: String
Required: Yes
Default: None

The topic of the messages to be fetched.

JMS Gateway Properties

For jms gateway, the following properties should/can be specified:

sensei.index.manager.default.jms.clientId

Type: String
Required: Yes
Default: None

This is the client identifier used to connect to the JMS provider.

sensei.index.manager.default.jms.topic

Type: String
Required: Yes
Default: None

This is the topic name that the JMS client subscribes to.

sensei.index.manager.default.jms.topicFactory

Type: String
Required: Yes
Default: None

This is the bean ID of the proj.zoie.dataprovider.jms.TopicFactory object. This object is used to generate a topic object based on the given topic name.

sensei.index.manager.default.jms.connectionFactory

Type: String
Required: Yes
Default: None

This is the bean ID of the javax.jms.TopicConnectionFactory object, which is used by the JMS client to create a javax.jms.TopicConnection object with the JMS provider.

JDBC Gateway Properties

For jdbc gateway, the following properties should/can be specified:

sensei.index.manager.default.jdbc.adaptor

Type: String
Required: Yes
Default: None

This is the bean ID of the com.sensei.indexing.api.jdbc.SenseiJDBCAdaptor object. This object is used to build a proj.zoie.dataprovider.jdbc.PreparedStatementBuilder object, which is required by proj.zoie.dataprovider.jdbc.JDBCStreamDataProvider.

sensei.index.manager.default.jms.driver

Type: String
Required: Yes
Default: None

This is the class name of the JDBC driver that you want to use.

sensei.index.manager.default.jms.password

Type: String
Required: Yes
Default: None

This is the password for the user name that you use to connect to the database.

sensei.index.manager.default.jms.username

Type: String
Required: Yes
Default: None

This is the user name that you use to connect to the database.

^[2]These properties are basically the parameters needed by the Kafka consumer API. The Simple Consumer API from Kafka is used by Sensei.

Prev	Up	Next
Data Modeling	Home	Chapter 10. BQL: Browsing Query Language