elasticsearch.yml 檔案說明

##################### Elasticsearch Configuration Example #####################

# This file contains an overview of various configuration settings,
# targeted at operations staff. Application developers should
# consult the guide at .

說明此檔案包含多個設定的部份,可以至 Elasticsearch Guide 觀看說明

#
# The installation procedure is covered at
# .

安裝相關的部份可參考 Elasticsearch 安裝說明

#
# Elasticsearch comes with reasonable defaults for most settings,
# so you can try it out without bothering with configuration.

Elasticsearch預設會套用許多設定,所已可以不用更改設定就可先進行測試

#
# Most of the time, these defaults are just fine for running a production
# cluster. If you're fine-tuning your cluster, or wondering about the
# effect of certain configuration option, please _do ask_ on the
# mailing list or IRC channel [http://elasticsearch.org/community].

預設情況下,絕大部分都能夠支應一個生產情況下的Cluster,假如想要試著調整設定,或是擔心更改設定會影響整各服務,可以寫信給開發團隊或是至 Elasticsearch 社群 詢問

# Any element in the configuration can be replaced with environment variables
# by placing them in ${...} notation. For example:
#
# node.rack: ${RACK_ENV_VAR}

任何設定檔的元素皆可以使用環境變數

# For information on supported formats and syntax for the config file, see
# 

針對設定檔內部所支援的格式可以參考 Elasticsearch 安裝設定說明

################################### Cluster ###################################

# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique names.
#
 cluster.name: elasticsearch

設定cluster的名稱,後續將使用multicast進行詢問server在哪裡,當server收到multicast後就會跟node進行聯繫

#################################### Node #####################################

# Node names are generated dynamically on startup, so you're relieved
# from configuring them manually. You can tie this node to a specific name:
#
 node.name: "Cowb"

對本node進行命名

# Every node can be configured to allow or deny being eligible as the master,
# and to allow or deny to store the data.
#
# Allow this node to be eligible as a master node (enabled by default):
#
# node.master: true
#
# Allow this node to store data (enabled by default):
#
# node.data: true

在此先對master node跟data node進行解釋
[摘錄自Elasticsearch Server, 2nd Edition第324頁]
The master node is the one that checks all the other nodes to see if they are responsive (other nodes ping the master too). The master node will also accept the new nodes that want to join the cluster. If the master is somehow disconnected from the cluster, the remaining nodes will select a new master from among themselves. All these processes are done automatically on the basis of the configuration values we provide.
By default, Elasticsearch allows every node to be a master node and a data node. However, in certain situations, you may want to have worker nodes that will only hold the data and master nodes that will only be used to process requests and manage the cluster.
透過上面的說明可以只到master node就有點像是p2p環境中的主結點,用來管理cluster內各node
而data node主要就用來儲存資料,在預設情況下,都是enabled的

<

pre lang=”bash”># You can exploit these settings to design advanced cluster topologies.
#

1. You want this node to never become a master node, only to hold data.

This will be the “workhorse” of your cluster.

#

node.master: false

node.data: true

<

pre>

假設你只想把此node用來儲存資料,不作為master node

#
# 2. You want this node to only serve as a master: to not store any data and
#    to have free resources. This will be the "coordinator" of your cluster.
#
# node.master: true
# node.data: false

假設此node不儲存資料,但可作為master node

#
# 3. You want this node to be neither master nor data node, but
#    to act as a "search load balancer" (fetching data from nodes,
#    aggregating results, etc.)
#
# node.master: false
# node.data: false

假設此node既不儲存資料亦不做為master node,只供做搜尋或搜尋的負載平衡使用
此時會從其它有資料的node進行資料的擷取

# Use the Cluster Health API [http://localhost:9200/_cluster/health], the
# Node Info API [http://localhost:9200/_nodes] or GUI tools
# such as ,
# ,
#  and
#  to inspect the cluster state.

可透過連結”http://localhost:9200/_cluster/health”取得cluster的狀態資訊,將以json資料顯示
並也可透過連結”http://localhost:9200/_nodes”取得cluster內各node的狀態資訊,亦是以json資料顯示
或是透過安裝一些GUI工具來觀看 Marvelelasticsearch-paramedicbigdeskelasticsearch-head

這邊以elasticsearch-head做為範例
“/usr/share/elasticsearch/bin/plugin -install mobz/elasticsearch-head”
當安裝完後連結網址 “http://localhost:9200/_plugin/head/” 觀看

# A node can have generic attributes associated with it, which can later be used
# for customized shard allocation filtering, or allocation awareness. An attribute
# is a simple key value pair, similar to node.key: value, here is an example:
#
# node.rack: rack314

# By default, multiple nodes are allowed to start from the same installation location
# to disable it, set the following:
# node.max_local_storage_nodes: 1

說明屬定設定採用key value對應的方式
並且在預設情況下,允許同一臺host中可以運行多個node同時進行,如果要關閉此功能,則設定
“node.max_local_storage_nodes: 1”
其value就表示在本local host環境中可以運行的nodes數上限
可以參考 Google論壇:Multiple elasticsearch nodes on same server

#################################### Index ####################################

# You can set a number of options (such as shard/replica options, mapping
# or analyzer definitions, translog settings, ...) for indices globally,
# in this file.

在本檔案中可以使用數字選項進行設定

#
# Note, that it makes more sense to configure index settings specifically for
# a certain index, either when creating it or by using the index templates API.
#
# See  and
# 
# for more information.

針對創建index的參數或是使用的api的參考資訊可以至 Elasticsearch – Resource – Index modulesElasticsearch – Resource – create index 進行了解

# Set the number of shards (splits) of an index (5 by default):
#
# index.number_of_shards: 5

設定index可以被切成幾分,預設是5份

# Set the number of replicas (additional copies) of an index (1 by default):
#
# index.number_of_replicas: 1

設定index的複本抄寫次數,預設是1份

# Note, that for development on a local machine, with small indices, it usually
# makes sense to "disable" the distributed features:
#
# index.number_of_shards: 1
# index.number_of_replicas: 0

假如只是在本機使用,且index很小,通常會將shards設定1及replicas設定0來關閉分散式的功能

# These settings directly affect the performance of index and search operations
# in your cluster. Assuming you have enough machines to hold shards and
# replicas, the rule of thumb is:
#
# 1. Having more *shards* enhances the _indexing_ performance and allows to
#    _distribute_ a big index across machines.
# 2. Having more *replicas* enhances the _search_ performance and improves the
#    cluster _availability_.
#

通常shards及replicas的設定值會分別影響到索引或搜尋的效能。
1. 當shards值越高時,可以增加索引進行的效能,並且允許透過多台機器建立大的index
2. 當replicas值越高時,可以增加搜尋的效能,並且提高cluster的可用率

<

pre lang=”bash”># The “number_of_shards” is a one-time setting for an index.
#

The “number_of_replicas” can be increased or decreased anytime,

by using the Index Update Settings API.

number_of_shards屬於一次性的設定,當index建立完成後就不會再進行跟動;而number_of_replicas的值則可以隨時透過設定的API進行增減。

# Elasticsearch takes care about load balancing, relocating, gathering the
# results from nodes, etc. Experiment with different settings to fine-tune
# your setup.

# Use the Index Status API () to inspect
# the index status.

Elasticsearch關注於負載平衡、重新配置以及從nodes匯集結果等。不同設定的經驗將有助於你在設定的時效能調教。可以使用連結 “http://localhost:9200/A/_status” 觀看各索引的狀態。

#################################### Paths ####################################

# Path to directory containing configuration (this file and logging.yml):
#
# path.conf: /path/to/conf

存放設定檔的目錄,預設是在”/etc/elasticsearch”內

# Path to directory where to store index data allocated for this node.
#
# path.data: /path/to/data
#
# Can optionally include more than one location, causing data to be striped across
# the locations (a la RAID 0) on a file level, favouring locations with most free
# space on creation. For example:
#
# path.data: /path/to/data1,/path/to/data2

存放資料的目錄,預設在”/var/lib/elasticsearch”下。假設可能因為某些原因需要將資料放置於多個目錄中,例如原本的目錄空間不足了,則可以設定多個存放的目錄供Elasticsearch進行存放資料,這邊將會以檔案的方式進行切割,就好像RAID 0的方式。

# Path to temporary files:
#
# path.work: /path/to/work

存放執行pid的目錄

# Path to log files:
#
# path.logs: /path/to/logs

存放log的目錄

# Path to where plugins are installed:
#
# path.plugins: /path/to/plugins

存放安裝plugin的目錄

#################################### Plugin ###################################

# If a plugin listed here is not installed for current node, the node will not start.
#
# plugin.mandatory: mapper-attachments,lang-groovy

列在這邊的plugin假如沒有安裝,則node不會啟動

################################### Memory ####################################

# Elasticsearch performs poorly when JVM starts swapping: you should ensure that
# it _never_ swaps.
#
# Set this property to true to lock the memory:
#
# bootstrap.mlockall: true

當JVM開始記憶體進行SWAP交換時,Elasticsearch的效能會開始變差,所以必需要確保絕對不會進行記體體swap的交換機制。又或者是設定讓Elasticsearch使用的記憶體空間進行lock。

# Make sure that the ES_MIN_MEM and ES_MAX_MEM environment variables are set
# to the same value, and that the machine has enough memory to allocate
# for Elasticsearch, leaving enough memory for the operating system itself.
#
# You should also make sure that the Elasticsearch process is allowed to lock
# the memory, eg. by using `ulimit -l unlimited`.

確保ES_MIN_MEM及ES_MAX_MEM的環境變數是設定一樣的數值,已經主機有足夠的記憶體可以分配給Elasticsearch使用,避免透過作業系統進行記憶體的調整。
也要確保Elasticsearch允許鎖住記憶體,例如使用ulimit -l unlimited指令等進行確認。

############################## Network And HTTP ###############################

# Elasticsearch, by default, binds itself to the 0.0.0.0 address, and listens
# on port [9200-9300] for HTTP traffic and on port [9300-9400] for node-to-node
# communication. (the range means that if the port is busy, it will automatically
# try the next port).

預設情況下,Elasticsearch綁定的是IP位置0.0.0.0、TCP埠9200~9300(提供HTTP通訊使用)及TCP埠9300~9400(提供Cluster各node通訊使用),當發現使用的port被占用時,會直接使用下一個port。

# Set the bind address specifically (IPv4 or IPv6):
#
# network.bind_host: 192.168.0.1

特別指定要綁定的IP位置

# Set the address other nodes will use to communicate with this node. If not
# set, it is automatically derived. It must point to an actual IP address.
#
# network.publish_host: 192.168.0.1

設定用來跟其他node進行通訊的ip位置,通常會自動設定。需要特別注意的是這個ip必須是有效的真實ip

# Set both 'bind_host' and 'publish_host':
#
# network.host: 192.168.0.1

設定network.host可同時設定publish_host及bind_host兩個屬性

# Set a custom port for the node to node communication (9300 by default):
#
# transport.tcp.port: 9300

設定node之間通訊使用的port

# Enable compression for all communication between nodes (disabled by default):
#
# transport.tcp.compress: true 

設定node之間通訊需不需要壓縮,預設是不進行壓縮

# Set a custom port to listen for HTTP traffic:
#
# http.port: 9200

設定http預設使用的port

# Set a custom allowed content length:
#
# http.max_content_length: 100mb

允許的HTTP request最大值 [參考 Elasticsearch – Resources – http]
設定允許的index大小 [摘錄自Elasticsearch Server, 2nd Edition第70頁]
There is a default limitation on the size of the bulk indexing file, which is set to 100 megabytes and can be changed by specifying the http.max_content_length property in the Elasticsearch configuration file. This lets us avoid issues with possible request timeouts and memory problems when dealing with requests that are too large.

# Disable HTTP completely:
#
# http.enabled: false

設定是否使用 http 模組

################################### Gateway ###################################

# The gateway allows for persisting the cluster state between full cluster
# restarts. Every change to the state (such as adding an index) will be stored
# in the gateway, and when the cluster starts up for the first time,
# it will read its state from the gateway.

此區塊設定用來儲存Cluster內所有索引資訊,包含索引的設置及顯示的對應資訊。當cluster的狀態發生改變時(新增、刪除索引)皆會把狀態儲存至gateway中,而當Cluster第一次啟動的時候,會從gateway中讀取目前的狀態資訊。

# There are several types of gateway implementations. For more information, see
# .

詳細的gateway實做範例可以參考 http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html

[摘錄自 http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html ]
The gateway module allows one to store the state of the cluster meta data across full cluster restarts. The cluster meta data mainly holds all the indices created with their respective (index level) settings and explicit type mappings.
Each time the cluster meta data changes (for example, when an index is added or deleted), those changes will be persisted using the gateway. When the cluster first starts up, the state will be read from the gateway and applied.
The gateway set on the node level will automatically control the index gateway that will be used. For example, if the local gateway is used, then automatically, each index created on the node will also use its own respective index level local gateway. In this case, if an index should not persist its state, it should be explicitly set to none (which is the only other value it can be set to).

# The default gateway type is the "local" gateway (recommended):
#
# gateway.type: local

此處預設的gateway型態為local,代表node level使用local,則index亦會使用local設定,即會將狀態進行永久保存;反之,若狀態不需永久保存,可以設定為none。

# Settings below control how and when to start the initial recovery process on
# a full cluster restart (to reuse as much local data as possible when using shared
# gateway).

後續主要是設定在哪些時候會如何開始一個完整cluster重新啟動初始化的復原程序,已達成在使用shared gateway時儘可能重新使用本地端的資料

# Allow recovery process after N nodes in a cluster are up:
#
# gateway.recover_after_nodes: 1

設定當cluster啟動復原程序時至少需要幾各node

# Set the timeout to initiate the recovery process, once the N nodes
# from previous setting are up (accepts time value):
#
# gateway.recover_after_time: 5m

設定復原程序初始化的等待時間,當node起來後等待多久時間在啟動復原程序

# Set how many nodes are expected in this cluster. Once these N nodes
# are up (and recover_after_nodes is met), begin recovery process immediately
# (without waiting for recover_after_time to expire):
#
# gateway.expected_nodes: 2

設定當多少node啟動後,cluster即啟動復原程序,前列的recover_after_time設定會自動忽略,此處的node包含master及data node。

############################# Recovery Throttling #############################

# These settings allow to control the process of shards allocation between
# nodes during initial recovery, replica allocation, rebalancing,
# or when adding and removing nodes.

此處的設定為當復原初始化、複本分配、重新負載平衡或是新增移除節點時,各節點之間允許shards分配的控制機制。

# Set the number of concurrent recoveries happening on a node:
#
# 1. During the initial recovery
#
# cluster.routing.allocation.node_initial_primaries_recoveries: 4
#
# 2. During adding/removing nodes, rebalancing, etc
#
# cluster.routing.allocation.node_concurrent_recoveries: 2

當初始化復原時,設定一個node上的主要shards數目
當新增、移除node及重新負載平衡時,一個node上的shards數目

# Set to throttle throughput when recovering (eg. 100mb, by default 20mb):
#
# indices.recovery.max_bytes_per_sec: 20mb

設定復原時的吞吐量限制

# Set to limit the number of open concurrent streams when
# recovering a shard from a peer:
#
# indices.recovery.concurrent_streams: 5

設定復原時,同時間一個node上可以開起幾各stream

################################## Discovery ##################################

# Discovery infrastructure ensures nodes can be found within a cluster
# and master node is elected. Multicast discovery is the default.

discovery模組主要是為了確保每一個cluster中的node都能夠被尋找到並且也能選出master node。預設是以multicast進行

# Set to ensure a node sees N other master eligible nodes to be considered
# operational within the cluster. Its recommended to set it to a higher value
# than 1 when running more than 2 nodes in the cluster.
#
# discovery.zen.minimum_master_nodes: 1

設定cluster中master node的各數,當cluster中的nodes數目增加時,建議可以設定超過預設1的設定值

# Set the time to wait for ping responses from other nodes when discovering.
# Set this option to a higher value on a slow or congested network
# to minimize discovery failures:
#
# discovery.zen.ping.timeout: 3s

設定當進行discovery時,ping回應的timeout時間,當網路環境不優的情況下,需要適度增加設定值

# For more information, see
# 

更多的資訊可以參考 http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html

# Unicast discovery allows to explicitly control which nodes will be used
# to discover the cluster. It can be used when multicast is not present,
# or to restrict the cluster communication-wise.

#

當不同網段時會導致multicast無法正常進行discovery,這時可以使用指定cluster目標的unicast進行discovery

# 1. Disable multicast discovery (enabled by default):
#
# discovery.zen.ping.multicast.enabled: false

#

是否要關閉以multicast進行discovery的功能,預設情況下是啟動的

# 2. Configure an initial list of master nodes in the cluster
#    to perform discovery when new nodes (master or data) are started:
#
# discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]

這裡針對unicast設定cluster的master清單,可以設定多筆,亦可以指定不同的port

# EC2 discovery allows to use AWS EC2 API in order to perform discovery.
#
# You have to install the cloud-aws plugin for enabling the EC2 discovery.
#
# For more information, see
# 
#
# See 
# for a step-by-step tutorial.

這部分主要針對aws ex2進行說明,可以額外安裝cloud-aws使得能在ec2環境進行discovery
詳細說明可以參考 http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-ec2.html 及 http://elasticsearch.org/tutorials/elasticsearch-on-ec2/

# GCE discovery allows to use Google Compute Engine API in order to perform discovery.
#
# You have to install the cloud-gce plugin for enabling the GCE discovery.
#
# For more information, see .

亦可以進行Google compute engine的discovery,詳情請參考 https://github.com/elasticsearch/elasticsearch-cloud-gce

# Azure discovery allows to use Azure API in order to perform discovery.
#
# You have to install the cloud-azure plugin for enabling the Azure discovery.
#
# For more information, see .

亦可以進行Azure的discovery,詳情請參考 https://github.com/elasticsearch/elasticsearch-cloud-azure

################################## Slow Log ##################################

# Shard level query and fetch threshold logging.

這部分主要是shard log各種狀態進行記錄query、fetch、index時間時使用的資訊,狀態可分為四種{warn、info、debug、trace},而項目有三個,分別是query、fetch、index
而針對log的啟用設定要看logging.xml檔案

#index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s
#index.search.slowlog.threshold.query.trace: 500ms

#index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms
#index.search.slowlog.threshold.fetch.trace: 200ms

#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
#index.indexing.slowlog.threshold.index.trace: 500ms
################################## GC Logging ################################

#monitor.jvm.gc.young.warn: 1000ms
#monitor.jvm.gc.young.info: 700ms
#monitor.jvm.gc.young.debug: 400ms

#monitor.jvm.gc.old.warn: 10s
#monitor.jvm.gc.old.info: 5s
#monitor.jvm.gc.old.debug: 2s

這部分主要是GC在3種狀態進行記錄query、fetch、index時間時使用的資訊,狀態可分為四種{warn、info、debug、trace},而項目有三個,分別是query、fetch、index
而針對log的啟用設定要看logging.xml檔案

This entry was posted in Elasticsearch. Bookmark the permalink.