elasticsearch 基本使用,ES8.10
官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html
ES版本:8.10
By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure. For example, text fields are stored in inverted indices, and numeric and geo fields are stored in BKD trees.
Elasticsearch will detect and map booleans, floating point and integer values, dates, and strings to the appropriate Elasticsearch data types.
http://localhost:9200/
{name: "WINDOWS10-JACK",cluster_name: "elasticsearch",cluster_uuid: "JYZzG3wITwqXA2cPUv1otA",version: {number: "8.10.4",build_flavor: "default",build_type: "zip",build_hash: "b4a62ac808e886ff032700c391f45f1408b2538c",build_date: "2023-10-11T22:04:35.506990650Z",build_snapshot: false,lucene_version: "9.7.0",minimum_wire_compatibility_version: "7.17.0",minimum_index_compatibility_version: "7.0.0"},tagline: "You Know, for Search"
}
信息查看
http://localhost:9200/_cat
=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/tasks
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/thread_pool/{thread_pools}
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}
/_cat/templates
/_cat/component_templates/_cat/ml/anomaly_detectors
/_cat/ml/anomaly_detectors/{job_id}
/_cat/ml/datafeeds
/_cat/ml/datafeeds/{datafeed_id}
/_cat/ml/trained_models
/_cat/ml/trained_models/{model_id}
/_cat/ml/data_frame/analytics
/_cat/ml/data_frame/analytics/{id}
/_cat/transforms
/_cat/transforms/{transform_id}
健康检查
http://localhost:9200/_cat/health
1698826410 08:13:30 elasticsearch yellow 1 1 6 6 0 0 3 0 - 66.7%
带上表头 http://localhost:9200/_cat/health?v
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1698826430 08:13:50 elasticsearch yellow 1 1 6 6 0 0 3 0 - 66.7%
正常情况下,Elasticsearch 集群健康状态分为三种:
- green:最健康得状态,说明所有的分片包括备份都可用; 这种情况Elasticsearch集群所有的主分片和副本分片都已分配,
Elasticsearch集群是 100% 可用的。 - yellow :基本的分片可用,但是备份不可用(或者是没有备份);
这种情况Elasticsearch集群所有的主分片已经分片了,但至少还有一个副本是缺失的。不会有数据丢失,所以搜索结果依然是完整的。不过,你的高可用性在某种程度上被弱化。如果
更多的 分片消失,你就会丢数据了。把 yellow 想象成一个需要及时调查的警告。 - red:部分的分片可用,表明分片有一部分损坏。此时执行查询部分数据仍然可以查到,遇到这种情况,还是赶快解决比较好;
这种情况Elasticsearch集群至少一个主分片(以及它的全部副本)都在缺失中。这意味着你在缺少数据:搜索只能返回部分数据,而分配到这个分片上的写入请求会返回一个异常。
Elasticsearch 集群不健康时的排查思路
- 首先确保 es 主节点最先启动,随后启动数据节点;
- 允许 selinux(非必要),关闭 iptables
- 确保数据节点的elasticsearch配置文件正确
- 系统最大打开文件描述符数是否够用
- elasticsearch设置的内存是否够用 ("ES_HEAP_SIZE"内存设置 和 "indices.fielddata.cache.size"上限设置);
- elasticsearch的索引数量暴增 , 删除一部分索引(尤其是不需要的索引);
查看集群节点
http://localhost:9200/_cat/nodes?v
查看索引列表
http://localhost:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open book Jc7jskvOSzCobjs4SPS7Jw 3 1 0 0 744b 744b
yellow open magazine 8-zYU22QTda7CjSEdphJzA 3 1 0 0 678b 678b
创建索引
索引的名称必须是小写的不可重名
创建一个 articles 索引,包含3个分片,1个副本,pretty表示返回json信息。PUT http://localhost:9200/articles?pretty
{"settings" : {"index" : {"number_of_shards" : 3,"number_of_replicas" : 1}},"mappings" : {"type1" : {"properties" : {"field1" : { "type" : "text" }}}}
}返回
{"acknowledged": true,"shards_acknowledged": true,"index": "articles"
}
查看索引定义信息
可以一次获取多个索引(以逗号间隔) 获取所有索引 _all 或 用通配符*
GET http://localhost:9200/articles{"articles": {"aliases": {},"mappings": {"properties": {"age": {"type": "long"},"gender": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"name": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}}}},"settings": {"index": {"routing": {"allocation": {"include": {"_tier_preference": "data_content"}}},"number_of_shards": "1","provided_name": "articles","creation_date": "1698827033458","number_of_replicas": "1","uuid": "OOVlnKxYT9SiWdlt-ZOPMw","version": {"created": "8100499"}}}}
}
GET http://localhost:9200/articles/_settings
GET http://localhost:9200/articles/_mappings
修改索引的settings信息
索引的设置信息分为静态信息和动态信息两部分。静态信息不可更改,如索引的分片数。动态信息可以修改。具体参考 https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#index-modules-settings
PUT http://localhost:9200/articles/_settings
{"index" : {"number_of_replicas" : 2}
}
设置索引的读写
index.blocks.read_only:设为true,则索引以及索引的元数据只可读
index.blocks.read_only_allow_delete:设为true,只读时允许删除。
index.blocks.read:设为true,则不可读。
index.blocks.write:设为true,则不可写。
index.blocks.metadata:设为true,则索引元数据不可读写。
删除索引
可以一次删除多个索引(以逗号间隔) 删除所有索引 _all 或 通配符 *
DELETE http://localhost:9200/article
判断索引是否存在
HEAD http://localhost:9200/article通过返回的HTTP状态码判断,200存在, 404不存在
写入数据
向 articles 索引写入一个文档,并指定id为1
curl -X PUT "http://localhost:9200/articles/_doc/1?pretty" -H 'Content-Type: application/json' -d'{"name": "John Doe"}'返回
{"_index": "articles","_id": "1","_version": 1,"result": "created","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 0,"_primary_term": 1
}
查询文档
获取指定id的文档
GET http://localhost:9200/articles/_doc/1?pretty返回
{"_index": "articles","_id": "1","_version": 1,"_seq_no": 0,"_primary_term": 1,"found": true,"_source": {"name": "John Doe"}
}
_source为原始数据
获取全部文档
GET http://localhost:9200/articles/_search返回
{"took": 2,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 2,"relation": "eq"},"max_score": 1.0,"hits": [{"_index": "articles","_id": "1","_score": 1.0,"_source": {"name": "John Doe"}},{"_index": "articles","_id": "2","_score": 1.0,"_source": {"name": "Rao Xiao Ya","age": 30,"gender": "male"}}]}
}
GET http://localhost:9200/articles/_search
{"query": { "match_all": {} },"sort": [{"name": "asc" }]
}
ES中的数据类型 data type
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html
1、keyword
keyword, which is used for structured content such as IDs, email addresses, hostnames, status codes, zip codes, or tags.constant_keywordfor keyword fields that always contain the same value.wildcardfor unstructured machine-generated content. Thewildcardtype is optimized for fields with large values or high cardinality.
Keyword fields are often used in sorting, aggregations, and term-level queries, such as term.
keyword类型的字段不能被分词,因此不能提供 full-text search。
2、text
text, the traditional field type for full-text content such as the body of an email or the description of a product.match_only_text, a space-optimized variant oftextthat disables scoring and performs slower on queries that need positions. It is best suited for indexing log messages.
text 类型会被分词,然后根据分词来建立索引结构。
text 类型主要用于full-text search,不适合做 sorting 和 aggregations。
text 类型可以设置分词器analyzer,而 keyword 则不需要。
**NOTE:**可以使一个字段同时拥有text和keyword两种类型,称为 multi fields,这样,此字段既可以做 full-text search,也可以做 sorting 和 aggregations,比如对于书籍的书名,文章标题等不是很长的文本,因为有了keyword,那么在检索的时候会更精确,因为它不考虑分词,另外,它还会返回包含分词的结果,也就是text的效果。比如要检索《刘心武妙品红楼梦》这本书,如果我输入书名的全称,在返回的结果集中,那应该第一个就是准确的结果,后面的才是分词之后的结果。
"name":{"type":"text","fields":{"keyword":{"type":"keyword", "ignore_above":256}}
}
3、numeric
long | A signed 64-bit integer with a minimum value of -263 and a maximum value of 263-1. |
|---|---|
integer | A signed 32-bit integer with a minimum value of -231 and a maximum value of 231-1. |
short | A signed 16-bit integer with a minimum value of -32,768 and a maximum value of 32,767. |
byte | A signed 8-bit integer with a minimum value of -128 and a maximum value of 127. |
double | A double-precision 64-bit IEEE 754 floating point number, restricted to finite values. |
float | A single-precision 32-bit IEEE 754 floating point number, restricted to finite values. |
half_float | A half-precision 16-bit IEEE 754 floating point number, restricted to finite values. |
scaled_float | A floating point number that is backed by a long, scaled by a fixed double scaling factor. |
unsigned_long | An unsigned 64-bit integer with a minimum value of 0 and a maximum value of 264-1. |
关于 scaled_float:就是将一个浮点数在底层使用 integer 来存储,因为 integer 要比 float 更容易压缩,因此可以节省磁盘空间,具体实现就是设置一个浮点数因子 scaling_factor,这样你输入一个浮点数 f1,在底层存储的为 scaling_factor * f1的整型
{"mappings": {"properties": {"number_of_bytes": {"type": "integer"},"time_in_seconds": {"type": "float"},"price": {"type": "scaled_float","scaling_factor": 100}}}
}
4、date
JSON doesn’t have a date data type, so dates in Elasticsearch can either be:
- strings containing formatted dates, e.g.
"2015-01-01"or"2015/01/01 12:10:30". - a number representing milliseconds-since-the-epoch.
- a number representing seconds-since-the-epoch (configuration).
Internally, dates are converted to UTC (if the time-zone is specified) and stored as a long number representing milliseconds-since-the-epoch (内部存储的是毫秒时间戳).
查询的时候,查询的时间条件也会被转换成毫秒时间戳,然后返回结果的时候又会转换成字符串。
{"mappings": {"properties": {"update_date": {"type": "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"}}}
}
The first format will be used to convert the milliseconds-since-the-epoch value back into a string. The default format value is strict_date_optional_time||epoch_millis.代表传入的值应为秒级时间戳。
关于mapping映射
在ES中mapping映射类似于在数据库中定义表结构,即表里面有哪些字段、字段是什么类型、字段的默认值等。
在创建索引的同时可以指定mapping,索引创建后也可以修改mapping。
如果索引没有mapping,但是就往里写入数据,ES会自动根据数据来创建mapping,称为Dynamic mapping,但是实际业务中,对于关键字段类型,我们都是通常预先定义好,这样可以避免ES自动生成的字段类型不是你想要的类型。
官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html
NOTE:Before 7.0.0, the mapping definition included a type name. Elasticsearch 7.0.0 and later no longer accept a default mapping. See Removal of mapping types. 也就是说 mappings 和 properties 之间不会再有 type 这个层级。
1、Dynamic mapping的规则
| JSON data type | "dynamic":"true" | "dynamic":"runtime" |
|---|---|---|
null | No field added | No field added |
true or false | boolean | boolean |
double | float | double |
long | long | long |
object | object | No field added |
array | Depends on the first non-null value in the array | Depends on the first non-null value in the array |
string that passes date detection | date | date |
string that passes numeric detection | float or long | double or long |
string that doesn’t pass date detection or numeric detection | text with a .keyword sub-field | keyword |
2、显示的mapping声明
{"mappings":{"properties":{"email":{"type": "keyword"},"name":{"type":"text","fields":{"keyword":{"type":"keyword", "ignore_above":256}}},"photo":{"type":"text","index": false}}}
}
ignore_above:Do not index any string longer than this value. Defaults to 2147483647 so that all values would be accepted. Please however note that default dynamic mapping rules create a sub keyword field that overrides this default by setting ignore_above: 256. 翻译一下:keyword类型顾名思义就是内容不要太长,如果太长了,就会按着设置的这个值截取掉,来创建索引,超出的内容不参与索引,但是并不是说超出的内容被删掉了。
默认,ES会为每个字段都构建索引结构,当然可以设置"index": false,这样就不会构建索引,且这个字段不能作为搜索条件来搜索。
3、update the mapping of a field
对于一个已经存在的字段,只能修改mapping中指定的一些参数,比如字段的type参数就不能修改。
如果字段的属性被修改了,有时候还需要重新为你的数据构建索引。
4、view the mapping of an index
GET http://localhost:9200/articles/_mappings
关于文本分词 Text Analyze
The index analysis module acts as a configurable registry of analyzers that can be used in order to convert a string field into individual terms which are:
- added to the inverted index in order to make the document searchable
- used by high level queries such as the
matchquery to generate search terms.
文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis.html
分词器:language analyzer
在写入文档和搜索文档的时候会使用到分词器,并且尽量这两个操作使用同一个分词器,这样能确保分词的结果一样。分词器只对 text field 生效。设置分词器是必须的,这会使文本搜索更精确和高效。
分词的作用是提供 full-text 检索,即文档中包含要检索的词就算匹配到。
Elasticsearch includes a default analyzer, called the standard analyzer, which works well for most use cases right out of the box.
If you want to tailor your search experience, you can choose a different built-in analyzer or even configure a custom one. A custom analyzer gives you control over each step of the analysis process, including:
- Changes to the text before tokenization
- How text is converted to tokens
- Normalization changes made to tokens before indexing or search
一个分词器包含三个底层的模块:character filters, tokenizers, and token filters
-
Character filters
A character filter receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters. For instance, a character filter could be used to convert Hindu-Arabic numerals (٠١٢٣٤٥٦٧٨٩) into their Arabic-Latin equivalents (0123456789), or to strip HTML elements like
<b>from the stream.An analyzer may have zero or more character filters, which are applied in order.
-
Tokenizer
A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens. For instance, a
whitespacetokenizer breaks text into tokens whenever it sees any whitespace. It would convert the text"Quick brown fox!"into the terms[Quick, brown, fox!].The tokenizer is also responsible for recording the order or position of each term and the start and end character offsets of the original word which the term represents.
An analyzer must have exactly one tokenizer.
-
Token filters
A token filter receives the token stream and may add, remove, or change tokens. For example, a
lowercasetoken filter converts all tokens to lowercase, astoptoken filter removes common words (stop words) likethefrom the token stream, and asynonymtoken filter introduces synonyms into the token stream.Token filters are not allowed to change the position or character offsets of each token.
An analyzer may have zero or more token filters, which are applied in order.
内置的分词器
Fingerprint
Keyword
Language
Pattern
Simple
Standard
Stop
Whitespace
测试分词器
curl -X POST "localhost:9200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{"analyzer": "whitespace","text": "The quick brown fox."
}
'
{"tokens": [{"token": "The","start_offset": 0,"end_offset": 3,"type": "word","position": 0},{"token": "quick","start_offset": 4,"end_offset": 9,"type": "word","position": 1},{"token": "brown","start_offset": 10,"end_offset": 15,"type": "word","position": 2},{"token": "fox.","start_offset": 16,"end_offset": 20,"type": "word","position": 3}]
}
定制分词器
第一种是使用ES内置的 Tokenizer 和 filters 来组合成一个分词器,然后在索引的 settings 中设置。
- zero or more character filters
- a tokenizer
- zero or more token filters.
{"settings": {"analysis": {"analyzer": {"std_folded": { "type": "custom","tokenizer": "standard","filter": ["lowercase","asciifolding"],"char_filter": ["html_strip"]}}}},"mappings": {"properties": {"my_text": {"type": "text","analyzer": "std_folded" }}}
}
第二中是引入第三方扩展包放在ES中,比如 elasticsearch-analysis-ik ,这个后面再说。
指定分词器
1、在索引的 settings 中指定全局的分词器。
还可以同时设置 search_analyzer
{"settings": {"analysis": {"analyzer": {"default": {"type": "simple"}}}}
}
或者
{"settings": {"analysis": {"analyzer": {"default": {"type": "simple"},"default_search": {"type": "whitespace"}}}}
}
2、在索引的 mappings 中可以为每个text字段设置不同的分词器,如果没有指定,那就按照默认的分词器Standard。
还可以同时设置 search_analyzer
{"mappings": {"properties": {"title": {"type": "text","analyzer": "whitespace"}}}
}
或者
{"mappings": {"properties": {"title": {"type": "text","analyzer": "whitespace","search_analyzer": "simple"}}}
}
3、在query语句中设置分词器,如果没有指定,那就按照默认的分词器Standard。
{"query": {"match": {"message": {"query": "Quick foxes","analyzer": "stop"}}}
}
在查询的时候,ES会根据以下顺序来确定使用哪个分词器。
- The
analyzerparameter in the search query. See Specify the search analyzer for a query. - The
search_analyzermapping parameter for the field. See Specify the search analyzer for a field. - The
analysis.analyzer.default_searchindex setting. See Specify the default search analyzer for an index. - The
analyzermapping parameter for the field. See Specify the analyzer for a field.
If none of these parameters are specified, the standard analyzer is used.
查询操作
Query DSL
Query DSL supports a variety of query types you can mix and match to get the results you want. Query types include:
- Boolean and other compound queries, which let you combine queries and match results based on multiple criteria
- Term-level queries for filtering and finding exact matches
- Full text queries, which are commonly used in search engines
- Geo and spatial queries
Aggregations
You can use search aggregations to get statistics and other analytics for your search results. Aggregations help you answer questions like:
- What’s the average response time for my servers?
- What are the top IP addresses hit by users on my network?
- What is the total transaction revenue by customer?
Search multiple data streams and indices
You can use comma-separated values and grep-like index patterns to search several data streams and indices in the same request. You can even boost search results from specific indices. See Search multiple data streams and indices.
Paginate search results
By default, searches return only the top 10 matching hits. To retrieve more or fewer documents, see Paginate search results.
Retrieve selected fields
The search response’s hits.hits property includes the full document _source for each hit. To retrieve only a subset of the _source or other fields, see Retrieve selected fields.
Sort search results
By default, search hits are sorted by _score, a relevance score that measures how well each document matches the query. To customize the calculation of these scores, use the script_score query. To sort search hits by other field values, see Sort search results.
Run an async search
Elasticsearch searches are designed to run on large volumes of data quickly, often returning results in milliseconds. For this reason, searches are synchronous by default. The search request waits for complete results before returning a response.
However, complete results can take longer for searches across large data sets or multiple clusters.
To avoid long waits, you can run an asynchronous, or async, search instead. An async search lets you retrieve partial results for a long-running search now and get complete results later.
Search timeout
GET /my-index-000001/_search
{"timeout": "2s","query": {"match": {"user.id": "kimchy"}}
}
Search cancellation
You can cancel a search request using the task management API. Elasticsearch also automatically cancels a search request when your client’s HTTP connection closes. We recommend you set up your client to close HTTP connections when a search request is aborted or times out.
Track total hits 返回全部数据
默认情况下之后返回 10 条记录,这就需要多次查询。
Generally the total hit count can’t be computed accurately without visiting all matches, which is costly for queries that match lots of documents. The track_total_hits parameter allows you to control how the total number of hits should be tracked. Given that it is often enough to have a lower bound of the number of hits, such as “there are at least 10000 hits”, the default is set to 10,000. This means that requests will count the total hit accurately up to 10,000 hits. It is a good trade off to speed up searches if you don’t need the accurate number of hits after a certain threshold.
通常情况下,总的命中数是不能被精确的计算出的,除非拉取到所有的匹配到的数据,这显然会浪费时间,参数track_total_hits允许你来控制该如何设定命中数,一般设置一个下限就足够了,比如“这里至少有10000个文档会被命中”,这将提升查询效率。
GET my-index-000001/_search
{"track_total_hits": true,"query": {"match" : {"user.id" : "elkbee"}}
}
When set to true the search response will always track the number of hits that match the query accurately (e.g. total.relation will always be equal to "eq" when track_total_hits is set to true). Otherwise the "total.relation" returned in the "total" object in the search response determines how the "total.value" should be interpreted. A value of "gte" means that the "total.value" is a lower bound of the total hits that match the query and a value of "eq" indicates that "total.value" is the accurate count.
当track_total_hits设置为true时,将会精确的统计命中数(即,total.relation的值永远是eq),否则,就需要通过total.value和total.relation的值来解释命中数,gte表示大于等于,eq表示等于。
{"_shards": ..."timed_out": false,"took": 100,"hits": {"max_score": 1.0,"total" : {"value": 2048, "relation": "eq" },"hits": ...}
}
当然track_total_hits也可以设置为一个整数,意味着最多返回的文档数。
GET my-index-000001/_search
{"track_total_hits": 100,"query": {"match": {"user.id": "elkbee"}}
}
{"_shards": ..."timed_out": false,"took": 30,"hits": {"max_score": 1.0,"total": {"value": 42, "relation": "eq" },"hits": ...}
}
如果库中的总文档数大于等于 100,那就是gte,
{"_shards": ..."hits": {"max_score": 1.0,"total": {"value": 100, "relation": "gte" },"hits": ...}
}
Quickly check for matching docs 检查是否存在,无需返回列表
If you only want to know if there are any documents matching a specific query, you can set the size to 0 to indicate that we are not interested in the search results. You can also set terminate_after to 1 to indicate that the query execution can be terminated whenever the first matching document was found (per shard).
GET /_search?q=user.id:elkbee&size=0&terminate_after=1
terminate_after is always applied after the post_filter and stops the query as well as the aggregation executions when enough hits have been collected on the shard. Though the doc count on aggregations may not reflect the hits.total in the response since aggregations are applied before the post filtering.
The response will not contain any hits as the size was set to 0. The hits.total will be either equal to 0, indicating that there were no matching documents, or greater than 0 meaning that there were at least as many documents matching the query when it was early terminated. Also if the query was terminated early, the terminated_early flag will be set to true in the response. Some queries are able to retrieve the hits count directly from the index statistics, which is much faster as it does not require executing the query. In those situations, no documents are collected, the returned total.hits will be higher than terminate_after, and terminated_early will be set to false.
{"took": 3,"timed_out": false,"terminated_early": true,"_shards": {"total": 1,"successful": 1,"skipped" : 0,"failed": 0},"hits": {"total" : {"value": 1,"relation": "eq"},"max_score": null,"hits": []}
}
The took time in the response contains the milliseconds that this request took for processing, beginning quickly after the node received the query, up until all search related work is done and before the above JSON is returned to the client. This means it includes the time spent waiting in thread pools, executing a distributed search across the whole cluster and gathering all the results.
Highlighting
Highlighters enable you to get highlighted snippets from one or more fields in your search results so you can show users where the query matches are. When you request highlights, the response contains an additional highlight element for each search hit that includes the highlighted fields and the highlighted fragments.
Elasticsearch supports three highlighters: unified, plain, and fvh (fast vector highlighter). You can specify the highlighter type you want to use for each field.
GET /_search
{"query": {"match": { "content": "kimchy" }},"highlight": {"fields": {"content": {}}}
}
Query 和 Filter 的区别
query 从句解决的是“How well does this document match this query clause?”,因此会计算匹配的分数_score,然后让用户自己决策,query从句的使用参见 Search APIs
filter 从句解决的是“Does this document match this query clause?”,答案是 Yes or No,不会计算分数,是一种精确的条件。比如要过滤出status为on的记录。
比如下面的例子
GET /_search
{"query": { "bool": { "must": [{ "match": { "title": "Search" }},{ "match": { "content": "Elasticsearch" }}],"filter": [ { "term": { "status": "published" }},{ "range": { "publish_date": { "gte": "2015-01-01" }}}]}}
}
This query will match documents where all of the following conditions are met:
- The
titlefield contains the wordsearch. - The
contentfield contains the wordelasticsearch. - The
statusfield contains the exact wordpublished. - The
publish_datefield contains a date from 1 Jan 2015 onwards.
基本查询结构
GET /{索引名}/_search
{"from" : 0, // 搜索结果的开始位置"size" : 10, // 分页大小,也就是一次返回多少数据"_source" : [ ... ], // 需要返回的字段数组"query" : { ... }, // query子句"aggs" : { ... }, // aggs子句"sort" : { ... } // sort子句
}
GET /my-index-000001/_search
{"from": 5,"size": 20,"_source" : ["nickname", "photo"],"query": {"match": {"user.id": "kimchy"}},"sort": [{"date": "asc"},{"tie_breaker_id": "asc"},"_score"],"aggs": {}
}
范围查询
GET /{索引名}/_search
{"query": {"range": {"{FIELD}": {"gte": 100, "lte": 200}}}
}
{FIELD} - 字段名
gte范围参数 - 等价于>=
lte范围参数 - 等价于 <=
范围参数可以只写一个,例如:仅保留 “gte”: 100, 则代表 FIELD字段 >= 100
gt - 大于 ( > )
gte - 大于且等于 ( >= )
lt - 小于 ( < )
lte - 小于且等于 ( <= )
Bool组合查询
GET /{索引名}/_search
{"query": {"bool": { // bool查询"must": [], // must条件,类似SQL中的and, 代表必须匹配条件"must_not": [], // must_not条件,跟must相反,必须不匹配条件"should": [] // should条件,类似SQL中or, 代表匹配其中一个条件"filter": [] // filter子句}}
}
字段 bool的下级包括:must, must_not, filter, should
term是代表完全匹配,即不进行分词器分析,文档中必须包含整个搜索的词汇。
match和term的区别是:match查询的时候,elasticsearch会根据你给定的字段提供合适的分析器,而term查询不会有分析器分析的过程,match查询相当于模糊匹配,只包含其中一部分关键词就行。
短语搜索phrase searches
相似搜索similarity searches
前缀搜索prefix searches
elasticsearch支持JSON-style-query 和 SQL-style-query
JSON: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html
SQL: https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-overview.html
分析数据 Analyzing your data
聚合查询
elasticsearch-go
github.com/elastic/go-elasticsearch/v8
https://www.elastic.co/guide/en/elasticsearch/client/go-api/current/getting-started-go.html
相关文章:
elasticsearch 基本使用,ES8.10
官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html ES版本:8.10 By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure…...
pytorch中常用的损失函数
1 损失函数的作用 损失函数是模型训练的基础,并且在大多数机器学习项目中,如果没有损失函数,就无法驱动模型做出正确的预测。 通俗地说,损失函数是一种数学函数或表达式,用于衡量模型在某些数据集上的表现。损失函数在…...
申克SCHENCK动平衡机显示器维修CAB700系统控制面板
适用电枢转子的卧式平衡机,高测量率,自动测量循环,自动定标完整的切槽计数可选项,CAB700动平衡测量系统两种皮带驱动方式(上置式或下置式)适用于站立或坐姿操作的人性化工作台设计。 动平衡机申克控制器面板维修型号:V…...
【论文阅读】PSDF Fusion:用于动态 3D 数据融合和场景重建的概率符号距离函数
【论文阅读】PSDF Fusion:用于动态 3D 数据融合和场景重建的概率符号距离函数 Abstract1 Introduction3 Overview3.1 Hybrid Data Structure3.2 3D Representations3.3 Pipeline 4 PSDF Fusion and Surface Reconstruction4.1 PSDF Fusion4.2 Inlier Ratio Evaluati…...
React 测试笔记 03 - 测试 Redux 中 Reducer 状态变化
React 测试笔记 03 - 测试 Redux 中 Reducer 状态变化 这段时间都在重构代码,把本来奇奇怪怪(singleton)的实现改成用 redux 的实现,然后就突然想到……即然 redux 的改变不涉及到 UI 的改变,那么是不是说可以单独写 redux 的测试……&#…...
xilinx primitives(原语)
Xilinx的原语分为10类,包括:计算组件,IO端口组件,寄存器/锁存器,时钟组件,处理器组件,移位寄存器,配置和检测组件,RAM/ROM组件,Slice/CLB组件,G-t…...
机器学习 - DBSCAN聚类算法:技术与实战全解析
目录 一、简介DBSCAN算法的定义和背景聚类的重要性和应用领域DBSCAN与其他聚类算法的比较 二、理论基础密度的概念核心点、边界点和噪声点DBSCAN算法流程邻域的查询聚类的形成过程 参数选择的影响 三、算法参数eps(邻域半径)举例说明:如何选择…...
kafka微服务学习
消息中间件对比: 1、吞吐、可靠性、性能 Kafka安装 Kafka对于zookeeper是强依赖,保存kafka相关的节点数据,所以安装Kafka之前必须先安装zookeeper Docker安装zookeeper 下载镜像: docker pull zookeeper:3.4.14创建容器 do…...
5G网络切片,到底是什么?
网络切片,是5G引入的一个全新概念。 一看到切片,首先想到的,必然是把一个完整的东西切成薄片。于是,切面包或者切西瓜这样的画面,映入脑海。 添加图片注释,不超过 140 字(可选) 然而…...
linux安装nodejs
写在前面 因为工作需要,需要使用到nodejs,所以这里简单记录下学习过程。 1:安装 wget https://nodejs.org/dist/v14.17.4/node-v14.17.4-linux-x64.tar.xz tar xf node-v14.17.4-linux-x64.tar.xz mkdir /usr/local/lib/node // 这一步骤根…...
第1天:Python基础语法(一)
** 1、Python简介 ** Python是一种高级、通用的编程语言,由Guido van Rossum于1989年创造。它被设计为易于阅读和理解,具有简洁而清晰的语法,使得初学者和专业开发人员都能够轻松上手。 Python拥有丰富的标准库,提供了广泛的功…...
ppt聚光灯效果
1.放入三张图片内容或其他 2.全选复制成图片 3.设置黑色矩形,透明度30% 4.粘贴复制后的图片,制定图层 5.插入椭圆,先选中矩形,再选中椭圆,点击绘图工具,选择相交即可(关键)...
图文解析 Nacos 配置中心的实现
目录 一、什么是 Nacos 二、配置中心的架构 三、Nacos 使用示例 (一)官方代码示例 (二)Properties 解读 (三)配置项的层级设计 (四)获取配置 (五)注册…...
P1918 保龄球
Portal. 记录每一个瓶子数对应的位置即可。 注意到值域很大( a i ≤ 1 0 9 a_i\leq 10^9 ai≤109),要用 map 存储。 #include <bits/stdc.h> using namespace std;map<int,int> p;int main() {int n;cin>>n;for(int i…...
SAP-PP-报错:工作中心 7333_JQ 工厂 7331 对任务清单类型 N 不存在
创建工艺路线时报错:工作中心 7333_JQ 工厂 7331 对任务清单类型 N 不存在, 这是因为在创建工作中心时未维护控制键值导致的...
MySQL -- 用户管理
MySQL – 用户管理 文章目录 MySQL -- 用户管理一、用户1.用户信息2.创建用户3.删除用户4.远端登录MySQL5.修改用户密码6.数据库的权限 一、用户 1.用户信息 MySQL中的用户,都存储在系统数据库mysql的user表中: host: 表示这个用户可以从…...
IOS浏览器不支持对element ui table的宽度设置百分比
IOS浏览器不支持对element ui table的宽度设置百分比 IOS浏览器会把百分号识别成px,所以我们可以根据屏幕宽度将百分比转换成px getColumnWidth(data) {const screenWidth window.innerWidth;const desiredPercentage data;const widthInPixels (screenWidth *…...
Vue+OpenLayers 创建地图并显示鼠标所在经纬度
1、效果 2、创建地图 本文用的是高德地图 页面 <div class"map" id"map"></div><div id"mouse-position" class"position_coordinate"></div>初始化地图 var gaodeLayer new TileLayer({title: "高德地…...
01-编码-H264编码原理
1.整体概念 编码的含义就是压缩,将摄像头采集的YUV或RGB数据压缩成H264。 压缩的过程就是去除信息冗余的过程,一般视频有如下的冗余信息。 (1)空间冗余:在同一个画面中,相邻的像素点之间的变化很小,因而可以用一个特定大小的矩阵来描述相邻的这些像素。 (2)时间冗余:…...
RxJava/RxAndroid的操作符使用(二)
文章目录 一、创建操作1、基本创建2、快速创建2.1 empty2.2 never2.3 error2.4 from2.5 just 3、定时与延时创建操作3.1 defer3.2 timer3.3 interval3.4 intervalRange3.5 range3.6 repeat 二、过滤操作1、skip/skipLast2、debounce3、distinct——去重4、elementAt——获取指定…...
C++_核心编程_多态案例二-制作饮品
#include <iostream> #include <string> using namespace std;/*制作饮品的大致流程为:煮水 - 冲泡 - 倒入杯中 - 加入辅料 利用多态技术实现本案例,提供抽象制作饮品基类,提供子类制作咖啡和茶叶*//*基类*/ class AbstractDr…...
React Native 开发环境搭建(全平台详解)
React Native 开发环境搭建(全平台详解) 在开始使用 React Native 开发移动应用之前,正确设置开发环境是至关重要的一步。本文将为你提供一份全面的指南,涵盖 macOS 和 Windows 平台的配置步骤,如何在 Android 和 iOS…...
2025年能源电力系统与流体力学国际会议 (EPSFD 2025)
2025年能源电力系统与流体力学国际会议(EPSFD 2025)将于本年度在美丽的杭州盛大召开。作为全球能源、电力系统以及流体力学领域的顶级盛会,EPSFD 2025旨在为来自世界各地的科学家、工程师和研究人员提供一个展示最新研究成果、分享实践经验及…...
大型活动交通拥堵治理的视觉算法应用
大型活动下智慧交通的视觉分析应用 一、背景与挑战 大型活动(如演唱会、马拉松赛事、高考中考等)期间,城市交通面临瞬时人流车流激增、传统摄像头模糊、交通拥堵识别滞后等问题。以演唱会为例,暖城商圈曾因观众集中离场导致周边…...
iPhone密码忘记了办?iPhoneUnlocker,iPhone解锁工具Aiseesoft iPhone Unlocker 高级注册版分享
平时用 iPhone 的时候,难免会碰到解锁的麻烦事。比如密码忘了、人脸识别 / 指纹识别突然不灵,或者买了二手 iPhone 却被原来的 iCloud 账号锁住,这时候就需要靠谱的解锁工具来帮忙了。Aiseesoft iPhone Unlocker 就是专门解决这些问题的软件&…...
为什么需要建设工程项目管理?工程项目管理有哪些亮点功能?
在建筑行业,项目管理的重要性不言而喻。随着工程规模的扩大、技术复杂度的提升,传统的管理模式已经难以满足现代工程的需求。过去,许多企业依赖手工记录、口头沟通和分散的信息管理,导致效率低下、成本失控、风险频发。例如&#…...
【机器视觉】单目测距——运动结构恢复
ps:图是随便找的,为了凑个封面 前言 在前面对光流法进行进一步改进,希望将2D光流推广至3D场景流时,发现2D转3D过程中存在尺度歧义问题,需要补全摄像头拍摄图像中缺失的深度信息,否则解空间不收敛…...
全球首个30米分辨率湿地数据集(2000—2022)
数据简介 今天我们分享的数据是全球30米分辨率湿地数据集,包含8种湿地亚类,该数据以0.5X0.5的瓦片存储,我们整理了所有属于中国的瓦片名称与其对应省份,方便大家研究使用。 该数据集作为全球首个30米分辨率、覆盖2000–2022年时间…...
学校招生小程序源码介绍
基于ThinkPHPFastAdminUniApp开发的学校招生小程序源码,专为学校招生场景量身打造,功能实用且操作便捷。 从技术架构来看,ThinkPHP提供稳定可靠的后台服务,FastAdmin加速开发流程,UniApp则保障小程序在多端有良好的兼…...
苍穹外卖--缓存菜品
1.问题说明 用户端小程序展示的菜品数据都是通过查询数据库获得,如果用户端访问量比较大,数据库访问压力随之增大 2.实现思路 通过Redis来缓存菜品数据,减少数据库查询操作。 缓存逻辑分析: ①每个分类下的菜品保持一份缓存数据…...
