{"id":1314,"date":"2014-12-30T15:50:03","date_gmt":"2014-12-30T07:50:03","guid":{"rendered":"https:\/\/cowmanchiang.me\/wp\/?p=1314"},"modified":"2023-10-31T15:44:07","modified_gmt":"2023-10-31T07:44:07","slug":"%e6%b7%ba%e8%ab%87-elasticsearch-%e7%9a%84%e5%ae%9a%e7%be%a9","status":"publish","type":"post","link":"https:\/\/cowmanchiang.me\/wp\/?p=1314","title":{"rendered":"\u6dfa\u8ac7 Elasticsearch \u7684\u5b9a\u7fa9"},"content":{"rendered":"<p>[\u6458\u9304\u81eaElasticsearch Server, 2nd Edition\u7b2c12\u9801]<\/p>\n<p><strong>The basics of Elasticsearch<\/strong><br \/>\nElasticsearch is an open source search server project started by Shay Banon and published in February 2010. During this time, the project has grown into a major player in the field of search and data analysis solutions and is widely used in many more or lesser-known search applications. In addition, due to its distributed nature and real-time capabilities, many people use it as a document store.<\/p>\n<p>Elasticsearch\u662f\u4e00\u500b\u7531Shay Banon\u8d77\u982d\u7684\u958b\u653e\u539f\u59cb\u78bc\u5c08\u6848\uff0c\u4e26\u4e14\u57282010\u5e74\u7684\u4e8c\u6708\u9032\u884c\u516c\u958b\u3002\u767c\u5c55\u81f3\u4eca\uff0cElasticsearch\u5df2\u7d93\u6210\u70ba\u641c\u5c0b\u6216\u662f\u8cc7\u6599\u5206\u6790\u89e3\u6c7a\u65b9\u6848\u4e2d\u4e0d\u53ef\u6216\u7f3a\u7684\u91cd\u8981\u53c3\u8207\u8005\uff0c\u4e26\u4e14\u88ab\u5ee3\u6cdb\u7684\u4f7f\u7528\u5728\u8a31\u8a31\u591a\u591a\u61c9\u7528\u958b\u767c\u4e2d\u3002\u4e5f\u56e0\u70ba\u5b83\u5177\u6709\u5206\u6563\u5f0f\u4ee5\u53ca\u5373\u6642\u8655\u7406\u7684\u7279\u6027\uff0c\u8a31\u591a\u4f7f\u7528\u8005\u5c07\u4ed6\u7528\u65bc\u6587\u4ef6\u5132\u5b58\u4f7f\u7528\u3002<\/p>\n<p><strong>Index<\/strong><br \/>\nIndex is the logical place where Elasticsearch stores logical data, so that it can be divided into smaller pieces. If you come from the relational database world, you can think of an index like a table. However, the index structure is prepared for fast and efficient full-text searching, and in particular, does not store original values. If you know MongoDB, you can think of the Elasticsearch index as a collection in MongoDB. If you are familiar with CouchDB, you can think about an index as you would about the CouchDB database. Elasticsearch can hold many indices located on one machine or spread over many servers. Every index is built of one or more shards, and each shard can have many replicas.<\/p>\n<p>Index\u53ef\u8996\u70ba\u4e00\u500b\u908f\u8f2f\u7684\u7a7a\u9593\uff0cElasitcsearch\u7528\u65bc\u5b58\u653e\u908f\u8f2f\u7684\u8cc7\u6599\uff0c\u4e5f\u56e0\u70ba\u4e26\u975e\u662f\u539f\u59cb\u8cc7\u6599\uff0c\u6240\u4ee5\u9019\u4e9b\u8cc7\u6599\u53ef\u4ee5\u88ab\u5206\u6210\u8a31\u591a\u5c0f\u7247\u6bb5\u9032\u884c\u5b58\u653e\u3002\u5982\u679c\u4f60\u4e4b\u524d\u6709\u4f7f\u7528\u904e\u95dc\u806f\u5f0f\u8cc7\u6599\u5eab\u7684\u7d93\u9a57\uff0c\u53ef\u4ee5\u628aindex\u60f3\u505a\u662ftable\u7684\u578b\u614b\u3002\u7136\u800c\uff0cindex\u7684\u7d50\u69cb\u4e3b\u8981\u662f\u70ba\u4e86\u5728\u5168\u6587\u641c\u5c0b\u6642\u80fd\u5920\u5feb\u901f\u4e14\u6709\u6548\u5730\u9032\u884c\u3002\u5982\u679c\u4f60\u77e5\u9053MonogoDB\uff0c\u5247\u4f60\u53ef\u4ee5\u5c07Elasticsearch index\u8996\u70ba\u662fMonogoDB\u7684collection\uff1b\u53c8\u6216\u662f\u5047\u5982\u4f60\u77e5\u9053CouchDB\uff0c\u5247\u4e5f\u53ef\u4ee5\u5c07index\u8996\u70ba\u662fCouchDB\u7684\u8cc7\u6599\u5eab\u3002Elasticsearch\u53ef\u4ee5\u5728\u4e00\u81fa\u4e3b\u6a5f\u4e0a\u540c\u6642\u7ba1\u7406\u8a31\u591aindex\uff0c\u53c8\u6216\u662f\u6563\u5e03\u5728\u8a31\u591a\u53f0\u96fb\u8166\u4e0a\u3002\u6bcf\u4e00\u500bindex\u662f\u7531\u4e00\u500b\u6216\u591a\u500bshards\u7d44\u6210\uff0c\u800c\u6bcf\u4e00\u500bshard\u53c8\u5305\u542b\u8a31\u591a\u7684\u8907\u672c\u6284\u5beb\u3002<\/p>\n<p><strong>Document<\/strong><br \/>\nThe main entity stored in Elasticsearch is a document. Using the analogy to relational databases, a document is a row of data in a database table. When you compare an Elasticsearch document to a MongoDB document, you will see that both can have different structures, but the document in Elasticsearch needs to have the same type for all the common fields. This means that all the documents with a field called title need to have the same data type for it, for example, string.<br \/>\nDocuments consist of fields, and each field may occur several times in a single document (such a field is called multivalued). Each field has a type (text, number, date, and so on). The field types can also be complex: a field can contain other subdocuments or arrays. The field type is important for Elasticsearch because it gives information about how various operations such as analysis or sorting should be performed. Fortunately, this can be determined automatically (however, we still suggest using mappings). Unlike the relational databases, documents don&#8217;t need to have a fixed structure\u2014every document may have a different set of fields, and in addition to this, fields don&#8217;t have to be known during application development. Of course, one can force a document structure with the use of schema. From the client&#8217;s point of view, a document is a JSON object (see more about the JSON format at http:\/\/en.wikipedia.org\/wiki\/JSON). Each document is stored in one index and has its own unique identifier (which can be generated automatically by Elasticsearch) and document type. A document needs to have a unique identifier in relation to the document type. This means that in a single index, two documents can have the same unique identifier if they are not of the same type.<\/p>\n<p><strong>Document type<\/strong><br \/>\nIn Elasticsearch, one index can store many objects with different purposes. For example, a blog application can store articles and comments. The document type lets us easily differentiate between the objects in a single index. Every document can have a different structure, but in real-world deployments, dividing documents into types significantly helps in data manipulation. Of course, one needs to keep the limitations in mind; that is, different document types can&#8217;t set different types for the same property. For example, a field called title must have the same type across all document types in the same index.<\/p>\n<p><strong>Mapping<\/strong><br \/>\nIn the section about the basics of full-text searching (the Full-text searching section), we wrote about the process of analysis\u2014the preparation of input text for indexing and searching. Every field of the document must be properly analyzed depending on its type. For example, a different analysis chain is required for the numeric fields (numbers shouldn&#8217;t be sorted alphabetically) and for the text fetched from web pages (for example, the first step would require you to omit the HTML tags as it is useless information\u2014noise). Elasticsearch stores information about the fields in the mapping. Every document type has its own mapping, even if we don&#8217;t explicitly define it.<\/p>\n<p><strong>Key concepts of Elasticsearch<\/strong><br \/>\nNow, we already know that Elasticsearch stores data in one or more indices. Every index can contain documents of various types. We also know that each document has many fields and how Elasticsearch treats these fields is defined by mappings. But there is more. From the beginning, Elasticsearch was created as a distributed solution that can handle billions of documents and hundreds of search requests per second. This is<br \/>\ndue to several important concepts that we are going to describe in more detail now.<\/p>\n<p><strong>Node and cluster<\/strong><br \/>\nElasticsearch can work as a standalone, single-search server. Nevertheless, to be able to process large sets of data and to achieve fault tolerance and high availability, Elasticsearch can be run on many cooperating servers. Collectively, these servers are called a cluster, and each server forming it is called a node.<\/p>\n<p><strong>Shard<\/strong><br \/>\nWhen we have a large number of documents, we may come to a point where a single node may not be enough\u2014for example, because of RAM limitations, hard disk capacity, insufficient processing power, and inability to respond to client requests fast enough. In such a case, data can be divided into smaller parts called shards (where each shard is a separate Apache Lucene index). Each shard can be placed on a different server, and thus, your data can be spread among the cluster nodes. When you query an index that is built from multiple shards, Elasticsearch sends the query<br \/>\nto each relevant shard and merges the result in such a way that your application doesn&#8217;t know about the shards. In addition to this, having multiple shards can speed up the indexing.<\/p>\n<p>\u7576\u6211\u5011\u6709\u5927\u91cf\u7684\u6587\u4ef6\u6642\uff0c\u53ef\u80fd\u6703\u56e0\u70ba\u53ea\u6709\u55ae\u4e00node\u800c\u7121\u6cd5\u5feb\u901f\u5730\u56de\u61c9\u4f7f\u7528\u8005\u7684\u8acb\u6c42\uff0c\u539f\u56e0\u6709\u53ef\u80fd\u662f\u56e0\u70ba\u8a18\u61b6\u9ad4\u7684\u9650\u5236\uff0c\u786c\u789f\u7a7a\u9593\u7684\u9650\u5236\u6216\u662f\u8655\u7406\u7684\u6548\u80fd\u4e0d\u5920\u7b49\u3002\u6709\u9451\u65bc\u6b64\uff0c\u8cc7\u6599\u53ef\u4ee5\u5206\u6210\u591a\u500b\u5c0f\u584a\uff0c\u5728\u9019\u88e1\u7a31\u70bashard\uff0c\u4e5f\u5c31\u662f\u5207\u5272\u5f8c\u7684apache luncene index\u3002\u6bcf\u4e00\u500bshard\u53ef\u4ee5\u653e\u5728\u4e0d\u540c\u7684\u4f3a\u670d\u5668\u4e0a\uff0c\u56e0\u6b64\u4f60\u7684\u8cc7\u6599\u53ef\u4ee5\u6563\u5e03\u5728\u53e2\u96c6\u7684node\u4e4b\u4e2d\u3002\u7576\u4f60\u6b63\u5728\u67e5\u8a62\u4e00\u500b\u7531\u591a\u500bshards\u7d44\u6210\u7684index\u6642\uff0cElasticsearch\u6703\u50b3\u9001\u67e5\u8a62\u7d66\u6bcf\u4e00\u500bshard\u6301\u6709\u7684node\u4e26\u4e14\u5c07\u6210\u679c\u6574\u4f75\uff0c\u5728\u9019\u904e\u7a0b\u4e2d\u4f60\u7684application\u4e0d\u6703\u77e5\u9053Elasticsearch\u80cc\u5f8c\u6240\u9032\u884c\u7684\u52d5\u4f5c\u3002\u56e0\u6b64\uff0c\u4f7f\u7528\u591a\u500bshards\u53ef\u4ee5\u52a0\u901findex\u7684\u9032\u884c\u3002<\/p>\n<p><strong>Replica<\/strong><br \/>\nIn order to increase query throughput or achieve high availability, shard replicas can be used. A replica is just an exact copy of the shard, and each shard can have zero or more replicas. In other words, Elasticsearch can have many identical shards and one of them is automatically chosen as a place where the operations that change the index are directed. This special shard is called a primary shard, and the others are called replica shards. When the primary shard is lost (for example, a server holding the shard data is unavailable), the cluster will promote the replica to be the new primary shard.<\/p>\n<p><strong>Gateway<\/strong><br \/>\nElasticsearch handles many nodes. The cluster state is held by the gateway. By default, every node has this information stored locally, which is synchronized among nodes. We will discuss the gateway module in The gateway and recovery modules section of Chapter 7, Elasticsearch Cluster in Detail.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>[\u6458\u9304\u81eaElasticsearch Server, 2nd Edition\u7b2c12\u9801] The basics of Elasticsearch Elasticsearch is an open source search server project started by Shay Banon and published in February 2010. During this time, the project has grown into a major player in the field of &hellip; <a href=\"https:\/\/cowmanchiang.me\/wp\/?p=1314\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[40],"tags":[],"class_list":["post-1314","post","type-post","status-publish","format-standard","hentry","category-elasticsearch"],"_links":{"self":[{"href":"https:\/\/cowmanchiang.me\/wp\/index.php?rest_route=\/wp\/v2\/posts\/1314","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cowmanchiang.me\/wp\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cowmanchiang.me\/wp\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cowmanchiang.me\/wp\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cowmanchiang.me\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1314"}],"version-history":[{"count":1,"href":"https:\/\/cowmanchiang.me\/wp\/index.php?rest_route=\/wp\/v2\/posts\/1314\/revisions"}],"predecessor-version":[{"id":1956,"href":"https:\/\/cowmanchiang.me\/wp\/index.php?rest_route=\/wp\/v2\/posts\/1314\/revisions\/1956"}],"wp:attachment":[{"href":"https:\/\/cowmanchiang.me\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1314"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cowmanchiang.me\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1314"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cowmanchiang.me\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1314"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}