A Sensei node is configured via the
sensei.properties
, which uses the format
supported by Apache Commons Configuration
(http:/commons.apache.org/). This file consists of the following
five parts:
server: port to listen on, rpc parameters, etc.
cluster: cluster manager, sharding, request routing, etc.
indexing: data interpretation, tokenization, indexer type, etc.
broker and client: e.g. entry into Sensei system
plugins: e.g. customized facet handlers
Below is the configuration file for the demo (available from https://github.com/javasoze/sensei/blob/master/conf/sensei.properties):
# sensei node parameters sensei.node.id=1 sensei.node.partitions=0,1 # sensei network server parameters sensei.server.port=1234 sensei.server.requestThreadCorePoolSize=20 sensei.server.requestThreadMaxPoolSize=70 sensei.server.requestThreadKeepAliveTimeSecs=300 # sensei cluster parameters sensei.cluster.name=sensei sensei.cluster.url=localhost:2181 sensei.cluster.timeout=30000 # sensei indexing parameters sensei.index.directory = index/cardata sensei.index.batchSize = 10000 sensei.index.batchDelay = 300000 sensei.index.maxBatchSize = 10000 sensei.index.realtime = true sensei.index.freshness = 10000 # index manager parameters sensei.index.manager.default.maxpartition.id = 1 sensei.index.manager.default.type = file sensei.index.manager.default.file.path = data/cars.json # plugins: from plugins.xml # analyzer, default: StandardAnalyzer # sensei.index.analyzer = myanalyzer # similarity, default: DefaultSimilarity # sensei.index.similarity = mysimilarity # indexer type, zoie/hourglass/<custom name> sensei.indexer.type=zoie #extra parameters for hourglass #sensei.indexer.hourglass.schedule # retention #sensei.indexer.hourglass.trimthreshold # frequency for a roll, minute/hour/day #sensei.indexer.hourglass.frequency # sensei # version comparator, default: ZoieConfig.DefaultVersionComparator # sensei.version.comparator = myVersionComparator # extra services sensei.plugin.services = # broker properties sensei.broker.port = 8080 sensei.broker.minThread = 50 sensei.broker.maxThread = 100 sensei.broker.maxWaittime = 2000 sensei.broker.webapp.path=src/main/webapp sensei.search.cluster.name = sensei sensei.search.cluster.zookeeper.url = localhost:2181 sensei.search.cluster.zookeeper.conn.timeout = 30000 sensei.search.cluster.network.conn.timeout = 1000 sensei.search.cluster.network.write.timeout = 150 sensei.search.cluster.network.max.conn.per.node = 5 sensei.search.cluster.network.stale.timeout.mins = 10 sensei.search.cluster.network.stale.cleanup.freq.mins = 10 # custom router factory # sensei.search.router.factory = myRouterFactory
This lines starts the server configurations. | |
This lines starts the cluster configurations. | |
This lines starts the indexing configurations. | |
This lines starts the plugins configurations. | |
This lines starts the broker and client configurations. |
In the following sections, we are going to explain every configuration property in each part: what the property type is, whether it is required, what the default value is, and how it is used, etc.
Type: int
Required: Yes
Default: None
This is the node ID of the Sensei node in a cluster.
Type: String (comma separated integers or ranges)
Required: Yes
Default: None
This specifies the partitions IDs this the Sensei server is going to handle. Partition IDs can be given as either integer numbers or ranges, separated by commas. For example, the following line denotes that the Sensei server has six partitions: 1,4,5,6,7,10.
sensei.node.partitions=1,4-7,10
Type: int
Required: Yes
Default: None
This is the Sensei server port number.
Type: int
Required: No
Default: 20
This is the core size of thread pool used to execute requests.
Type: int
Required: No
Default: 300
This is the length of time in seconds to keep an idle request thread alive.
Type: int
Required: No
Default: 70
This is the maximum size of thread pool used to execute requests.
Type: String
Required: Yes
Default: None
This is the name of the Sensei server cluster.
Type: int
Required: No
Default: 300000
This is the session timeout value, in milliseconds, that is passed to ZooKeeper.
Type: String
Required: Yes
Default: None
This is the ZooKeeper URL for the Sensei cluster.
See sensei.index.analyzer in the section called “Plug-in Properties”.
Type: int
Required: No
Default: 300000
This is the maximum time to wait in milliseconds before flushing index events to disk. The default value is 300000 (i.e. 5 minutes).
Type: int
Required: No
Default: 10000
This is the batch size to control the pace of data event consumption on the back-end. It is the soft size limit of each event batch. If the events come in too fast and the limit is already reached, then the indexer will block the incoming events until the number of buffered events drop below this limit after some of the events are sent to the background data consumer.
See sensei.index.custom in the section called “Plug-in Properties”.
Type: String
Required: Yes
Default: None
This is the directory used to save the index.
Type: long
Required: No
Default: 500
This controls the freshness of entries in the index reader cache.
See sensei.index.interpreter in the section called “Plug-in Properties”.
See sensei.index.manager in the section called “Plug-in Properties”.
Type: int
Required: No
Default: 1
This is the batch size to control when data events accumulated in the default index manger should be consumed by the data consumer. The default value is 1.
Type: int
Required: No
Default: 40000
This is the maximum number of data events that the indexer can consume per minute. If this threshold is exceeded, the indexer will pause for a short period of time before continuing to consume incoming data events.
This property is helpful in preventing the indexer from being overloaded. The default value is 40,000.
Type: int
Required: Yes, if the default indexing manager is chosen; No, otherwise.
Default: None
This is the maximum partition ID number served by this Sensei cluster if the default Sensei indexing manager is used.
This property is different from the total number of partitions in a Sensei cluster. For example, if a cluster contains 4 partitions, 0, 1, 2, and 3, then sensei.index.manager.default.maxpartition.id should be set to 3.
See sensei.index.manager.default.shardingStrategy in the section called “Plug-in Properties”.
See sensei.index.manager.default.type in the section called “Plug-in Properties”.
Type: int
Required: No
Default: 10000
This is the maximum number of incoming data events that can be held by the indexer in a batch before they are flushed to disk. If this number is exceeded, the indexer will stop processing the data events for one minute.
Type: boolean
Required: No
Default: true
This specifies whether the indexing mode is real-time or not.
See sensei.index.similarity in the section called “Plug-in Properties”.
Type: String
Required: Yes
Default: None
This is the internal indexer type used by the Sensei
cluster. Currently only two options are supported:
zoie
and hourglass
. If
hourglass
is used, three more properties need
to be set too:
sensei.indexer.hourglass.schedule
sensei.indexer.hourglass.trimthreshold
sensei.indexer.hourglass.frequency
Type: String
Required: No
Default: "day
"
This is the rolling forward frequency. It has to be one of the following three values:
day
hour
minute
Type: String
Required: Yes, if property sensei.indexer.type is set to
"hourglass
"; No, otherwise.
Default: None
This is a string that specifies Hourglass rolling
forward schedule. The format of this string is "ss
mm hh
", meaning at hh:mm:ss time
of
the day that we roll forward for
daily rolling. If it is
hourly rolling, we roll forward at
mm:ss
time of the hour. If it is
minutely rolling, we roll forward at
ss
seond of the minute.
Type: int
Required: No
Default: 14
This is the retention period for how long we are going to keep the events in the index. The unit is the rolling period.
Type: int
Required: No
Default: 50
This is the maximum size of thread pool used by a broker to execute requests.
Type: int
Required: No
Default: 2000
This is the maximum idle time in milliseconds for a thread on a broker. Threads that are idle for longer than this period may be stopped.
Type: int
Required: No
Default: 20
This is the core size of thread pool used by the broker to execute requests.
Type: int
Required: Yes
Default: None
This is the port number of the Sensei broker.
Type: String
Required: Yes
Default: None
This is the resource base of the broker web application.
Type: String
Required: Yes
Default: None
This is the ZooKeeper URL for the Sensei search cluster that a broker talks to.
Type: String
Required: Yes
Default: None
This is the Sensei cluster name, i.e. the service name for the network clients and brokers.
Type: int
Required: No
Default: 10000
This is the ZooKeeper network client session timeout value in milliseconds.
Type: int
Required: No
Default: 1000
This is the maximum number of milliseconds to allow a connection attempt to take.
Type: int
Required: No
Default: 150
This is the number of milliseconds a request can be queued for write before it is considered stale.
Type: int
Required: No
Default: 5
This is the maximum number of open connections to a node.
Type: int
Required: No
Default: 10
This is the number of minutes to keep a request that is waiting for a response.
Type: int
Required: No
Default: 10
This is the frequency to clean up stale requests.
Type: String
Required: No
Default: ""
This specifies the bean ID of the analyzer plug-in
for analyzing text. If not specified,
org.apache.lucene.analysis.standard.StandardAnalyzer
will be used.
Type: String
Required: No
Default: ""
This specifies the bean ID of similarity plug-in for
Lucene scoring. If not specified,
org.apache.lucene.search.DefaultSimilarity
is
used.
Type: String
Required: No
Default: ""
This specifies the bean ID of the custom indexing pipeline implementation. A custom indexing pipeline can be plugged into the indexing process to allow users to modify generated Lucene documents at the last step before they are indexed.
A custom indexing pipeline has to implement
interface
com.sensei.indexing.api.CustomIndexingPipeline
.
Type: String
Required: No
Default: ""
This specifies the bean ID of the interpretor of
Zoie indexables. If not specified,
com.sensei.indexing.api.DefaultJsonSchemaInterpreter
is used.
Type: String
Required: No
Default: ""
This specifies the bean ID of the indexing manager
object implementing
com.sensei.search.nodes.SenseiIndexingManager
.
If not specified,
com.sensei.indexing.api.DefaultStreamingIndexingManager
is used.
Type: String
Required: Yes if sensei.index.manager is not specified, i.e. the default indexing manager is used.
Default: None
This specifies the type of gateway that will be used
by the default indexing manager. The value identifies the
bean ID of an object of
com.sensei.indexing.api.gateway.SenseiGateway
.
Several built-in gateways are provided by Sensei, but you can always define your own based on your need. No matter a built-in gateway or a custom gateway is used, additional parameters can be specified under the names with prefix sensei.index.manager.default.<gateway-type>.
Currently the following built-in gateway types are supported:
file
:
This type of gateway takes a regular text file as the input. Each line in the file contains a data entry in JSON format.
Only one property needs to be set for this gateway type. See the section called “File Gateway Properties”
kafka
:
This type of gateway takes Kafka messages as input.
See the section called “Kafka Gateway Properties” for additional property information.
jms
:
This type of gateway takes JMS (Java Messages Service) messages as input. The publish-and-subscribe messaging model is used by Sensei, so parameters like topic need to be provided.
See the section called “JMS Gateway Properties” for additional property information.
jdbc
:
This type of gateway takes JDBC data as input.
See the section called “JDBC Gateway Properties” for additional property information.
Type: String
Required: No
Default: None
This is the bean ID of
com.sensei.indexing.api.DataSourceFilter
object. No matter what gateway type the indexing managers
uses, a filter can be plugged in to get the original
source data converted to the JSON format defined by the
table schema. If the input data is already in the right
format, then this filter is not needed.
Type: String
Required: No
Default: ""
This is the bean ID of the sharding strategy.
Type: String
Required: No
Default: ""
This is the bean ID of the Sensei request router factory. This factory builds the load balancer that is used by Sensei brokers to route incoming requests to different Sensei nodes.
Type: String
Required: No
Default: ""
This specifies the bean ID of version comparator plug-in to be used by the indexer. If not specified, Zoie's default version comparator is used.
For file
gateway, the following
property has to be specified:
Type: String
Required: Yes
Default: None
This is the path to the input data file.
For kafka
gateway, the
following properties should/can be specified: [2]
Type: String
Required: Yes
Default: None
This is the batch size for each pull request.
Type: String
Required: Yes
Default: None
This is the host name of the Kafka server.
Type: int
Required: Yes
Default: None
This is the port number on which the Kafka server is listening for connections.
Type: int
Required: Yes
Default: 10000
This is the socket timeout in milliseconds.
Type: String
Required: Yes
Default: None
The topic of the messages to be fetched.
For jms
gateway, the following
properties should/can be specified:
Type: String
Required: Yes
Default: None
This is the client identifier used to connect to the JMS provider.
Type: String
Required: Yes
Default: None
This is the topic name that the JMS client subscribes to.
Type: String
Required: Yes
Default: None
This is the bean ID of the
proj.zoie.dataprovider.jms.TopicFactory
object. This object is used to generate a topic object
based on the given topic name.
Type: String
Required: Yes
Default: None
This is the bean ID of the
javax.jms.TopicConnectionFactory
object, which is used by the JMS client to create a
javax.jms.TopicConnection
object with
the JMS provider.
For jdbc
gateway, the following
properties should/can be specified:
Type: String
Required: Yes
Default: None
This is the bean ID of the
com.sensei.indexing.api.jdbc.SenseiJDBCAdaptor
object. This object is used to build a
proj.zoie.dataprovider.jdbc.PreparedStatementBuilder
object, which is required by
proj.zoie.dataprovider.jdbc.JDBCStreamDataProvider
.
Type: String
Required: Yes
Default: None
This is the class name of the JDBC driver that you want to use.
Type: String
Required: Yes
Default: None
This is the password for the user name that you use to connect to the database.
Type: String
Required: Yes
Default: None
This is the user name that you use to connect to the database.
[2] These properties are basically the parameters needed by the Kafka consumer API. The Simple Consumer API from Kafka is used by Sensei.