概述

Elasticsearch 是一个分布式、RESTful风格的、可扩展的实时搜索和数据分析引擎,一个建立在全文搜索引擎 Apache Lucene 基础上的、支持处理PB级别数据的搜索引擎。
Elasticsearch 最新官方文档 Elasticsearch 7.x 官方文档 Elasticsearch 2.x 中文文档

准备工作

安装

如果你想使用IK中文/pinyin拼音分词器插件,请务必先查看IK分词器pinyin分词器支持的版本,选择 Verified 对应的ES版本

Windows安装

- 选择版本号
ElasticSearch https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.13.4-windows-x86_64.zip
ElasticSearch IK插件 https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.13.4/elasticsearch-analysis-ik-7.13.4.zip
ElasticSearch pinyin插件 https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v7.13.4/elasticsearch-analysis-pinyin-7.13.4.zip
  • 下载ElasticSearch的zip并解压
  • 配置 config/elasticsearch.yml
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    #集群名
    cluster.name: myes
    #集群节点名
    node.name: myes_node_1
    #允许访问的ip
    network.host: 0.0.0.0
    #开放请求端口
    http.port: 9200
    #集群端口
    transport.tcp.port: 9300
    #是否为主节点
    node.master: true
    #开启缓存
    node.data: true
    #用于初始化的主节点
    cluster.initial_master_nodes: myes_node_1
    #备用主节点
    discovery.zen.ping.unicast.hosts: ["0.0.0.0:9300"]
    #只有足够的master候选节点时,才可以选举出一个master
    discovery.zen.minimum_master_nodes: 1
    #是否锁住内存,避免交换(swapped)带来的性能损失
    bootstrap.memory_lock: false
    #设置一台机器能运行的节点数目
    node.max_local_storage_nodes: 1
    #跨域配置
    http.cors.enabled: true
    http.cors.allow-origin: "*"

docker安装

我这里是使用了wsl2,如果使用虚拟机或服务器也大差不差

1
docker pull elasticsearch:7.13.4
1
docker run --name elasticsearch -d -p 9200:9200 -e "discovery.type=single-node" -p 9300:9300 elasticsearch:7.13.4

等待一会,如果你不能正常访问 localhost:9200 / 服务器ip:9200 ,尝试 docker restart -i elasticsearch 再次运行并查看日志
如果你发现了 max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144] 这个报错,可以使用下面的解决方式

  • 临时解决:sudo sysctl -w vm.max_map_count=262144
  • 永久解决:在/etc/sysctl.conf文件最后添加一行:vm.max_map_count=262144

修改配置文件:docker exec -it /bin/bash elasticsearch vi ./config/elasticsearch.yml
安装插件:docker exec -it /bin/bash elasticsearch ./bin/elasticsearch-plugin install 插件.zip地址

映射到宿主路径

映射到宿主路径可以更方便的同步插件和配置文件

  • 创建 /usr/elasticsearch 目录(可自定义)
  • 在该目录下创建 /config/elasticsearch.yml/plugins/
  • 修改配置 + 装插件都在这个目录下进行即可
  • 启动时使用下面的命令运行(如果你不配置集群可以继续使用 -e "discovery.type=single-node"
    1
    docker run --name elasticsearch -d -v /usr/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml -v /usr/elasticsearch/plugins/:/usr/share/elasticsearch/plugins  -p 9200:9200  -p 9300:9300 elasticsearch:7.13.4
  • 如果遇到启动不起来的情况,看看有没有max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]这个报错,解决方式在上边。

客户端工具

推荐使用 elasticsearch head 谷歌浏览器插件版

本站下载

装完点击插件图标直接用即可

体系结构

Elasticsearch是面向文档(document oriented)的,这意味着它可以存储整个对象或文档(document)。然而它不仅仅是存储,还会索引(index)每个文档的内容使之可以被搜索。在Elasticsearch中,你可以对文档(而非成行成列的数据)进行索引、搜索、排序、过滤。

Elasticsearch比传统关系型数据库如下:

Elasticsearch 索引(Indices) 类型(Types) 文档(Documents) 字段(Fields)
传统关系型数据库 Relational DB 数据库(Databases) 表(Tables) 行(Rows) 列(Columns)

注意:6.0之前的版本有type(类型)概念,type相当于关系数据库的表,ES官方已在ES7.0版本中彻底删除type。
Elasticsearch 映射类型(mapping type)为何在 7.0版本后彻底移除?

创建索引库相当于关系数据库中的数据库还是表?
1、如果相当于数据库就表示一个索引库可以创建很多不同类型的文档,这在ES中也是允许的。
2、如果相当于表就表示一个索引库只能存储相同类型的文档,ES官方建议在一个索引库中只存储相同类型的文档。

正排索引和倒排索引

https://www.cnblogs.com/softidea/p/9852048.html

快速入门(ES7+)

创建索引库

【PUT】http://localhost:9200/索引库名称
1
2
3
4
5
6
7
8
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
}
参数 作用
number_of_shards 设置分片的数量,在集群中通常设置多个分片,表示一个索引库将拆分成多片分别存储不同的结点,提高了ES的处理能力和高可用性,入门程序使用单机环境,这里设置为1。
number_of_replicas 设置副本的数量,设置副本是为了提高ES的高可靠性,单机环境设置为0。

创建映射

在索引中每个文档都包括了一个或多个field,创建映射就是向索引库中创建field(也就是传统数据库表中的每一列)的过程

【POST】http://localhost:9200/索引库名称/_mapping
1
2
3
4
5
6
7
8
9
10
11
12
13
{
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "integer"
},
"job": {
"type": "keyword"
}
}
}

注:在ES6中曾经使用:http://localhost:9200/索引库名称/类型名称/_mapping
但是ES7中不需要在指定类型名称了,但是有一个默认的类型_doc

创建文档

ElasticSearch文档相当于数据库表中的记录

【POST】http://localhost:9200/索引库名称/_doc/ID值(如果不指定ID值会随机生成)
1
2
3
4
5
{
"name": "汐涌及岸",
"age": 18,
"job": "码农"
}

搜索文档

根据ID查看

【GET】http://localhost:9200/索引库名称/_doc/ID值
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
返回结果
{
"_index": "myindex",
"_type": "_doc",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"name": "汐涌及岸",
"age": 18,
"job": "码农"
}
}

简单搜索

  • 查询所有:【GET】http://localhost:9200/索引库名称/_search
  • 查询name中包含汐的:【GET】http://localhost:9200/索引库名称/_search?q=name:汐
    为什么是包含?这是text类型的特性,会自动对插入的记录进行分词
  • 查询job是码农的: 【GET】http://localhost:9200/索引库名称/_search?q=job:码农
    同上,keyword不会进行分词
  • 查询name中包含汐并且job是码农的:【GET】http://localhost:9200/索引库名称/_search?q=name:汐&q=job:码农
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    返回结果
    {
    "took": 3, //本次操作花费的时间,单位为毫秒
    "timed_out": false, //请求是否超时
    "_shards": { //说明本次操作共搜索了哪些分片
    "total": 1, //总数
    "successful": 1, //成功数
    "skipped": 0, //跳过数
    "failed": 0 //失败数
    },
    "hits": { //搜索命中的记录
    "total": { //符合条件的文档总数
    "value": 1, //数量
    "relation": "eq" //关系
    },
    "max_score": 0.2876821, //最高的匹配度得分
    "hits": [ //按匹配度排序的查询结果
    {
    "_index": "myindex", //索引
    "_type": "_doc", //类型
    "_id": "1", //ID值
    "_score": 0.2876821, //匹配度得分
    "_source": { //源
    "name": "汐涌及岸",
    "age": 18,
    "job": "码农"
    }
    }
    ]
    }
    }

IK分词器插件

简单使用

  • 测试分词器
    【POST】localhost:9200/_analyze
    1
    2
    3
    {
    "text": "测试一下ElasticSearch默认的分词器"
    }
    默认不支持中文,所以分出来的结果为
    【测】【试】【一】【下】【elasticsearch】【默】【认】【的】【分】【词】【器】
  • 安装IK分词插件,地址
    将压缩包解压到 plugins/elasticsearch-analysis-ik/.... 重新启动es即可
  • 使用ik插件测试分词
    【POST】localhost:9200/_analyze
    1
    2
    3
    4
    {
    "text": "测试一下IK分词器插件",
    "analyzer":"ik_max_word" //支持两种分词模式:ik_max_word和ik_smart模式。
    }
    当使用ik_max_word模式时,会将文本做最细粒度的拆分:
    【测试】【试一下】【一下】【ik】【分词器】【分词】【器】【插件】
    而ik_smart模式,是做最粗粒度的拆分:
    【测试】【一下】【ik】【分词器】【插件】

自定义词库

如果要让IK分词器支持一些专有词语,可以自定义词库。
elasticsearch-analysis-ik/config 下新增一个 my.dic (注意编码为UTF-8)
每一个专有词语作为一行即可

1
汐涌及岸

编辑 elasticsearch-analysis-ik/config/IKAnalyzer.cfg.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">my.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<!-- <entry key="remote_ext_dict">words_location</entry> -->
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

重启es,再次测试分词,可以看到成功识别自定义词典中的词语
停用词也是如此,将不需要识别的词语,比如【了】【的】【呢】【吗】配置一下即可

映射类型

本文只记录核心类型,全部类型请参考ElasticSearch 最新官方文档

  • text :存储数据时,分词并建立索引

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    {
    "properties": {
    "name": {
    "type": "text",
    //指定了在索引和搜索都使用ik_max_word进行分词
    "analyzer":"ik_max_word",
    //单独设置搜索时的分词方式
    //"search_analyzer":"ik_smart"
    },
    "pic":{
    "type":"text",
    //指定是否索引,默认为true,即要进行索引(只有进行索引才可以从索引库搜索到)
    //但是例如图片地址不需要索引,所以设置为false
    "index":false
    }
    }
    }

    建议 索引时使用ik_max_word 将搜索内容进行细粒度分词, 搜索时使用ik_smart 提高搜索精确性。

  • binary :编码为 Base64 字符串的二进制值。

  • boolean :布尔值,true 和 false

  • Keywords

  • Numbers

    • long :64位存储
    • integer :32位存储
    • short :16位存储
    • byte :8位存储
    • double :64位双精度存储
    • float :32位单精度存储
  • Dates

    • date :以毫秒为单位存储日期,默认格式化方式为"strict_date_optional_time_nanos||epoch_millis",也支持自定义。在官方文档中查看更多解释
      1
      2
      3
      4
      5
      6
      7
      8
      {
      "properties": {
      "timestamp": {
      "type": "date",
      "format": "yyyy-MM-ddHH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
      }
      }
    • date_nanos 以纳秒(微毫秒)为单位存储日期

Java REST High Level Client

  • 导入elasticsearch-rest-high-level-client依赖
  • 新建一个配置类
    ElasticSearchConfig.class
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    @Configuration
    public class ElasticSearchConfig {
    private String hosts = "127.0.0.1:9200,127.0.0.1:9201";
    @Bean
    public RestHighLevelClient restHighLevelClient() {
    return new RestHighLevelClient(RestClient.builder(
    Arrays.stream(hosts.split(",")).map(address -> {
    String[] addressSplit = address.split(":", 2);
    return new HttpHost(addressSplit[0], addressSplit.length > 1 ? Integer.parseInt(addressSplit[1]) : 9200);
    }).toArray(HttpHost[]::new)
    ));
    }
    }
  • 创建索引库
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    @Autowired
    private RestHighLevelClient esClient;
    @Test
    void testCreateIndex() throws IOException {
    //创建 创建索引请求
    CreateIndexRequest createIndexRequest =
    new CreateIndexRequest("java_index").settings(Settings.builder()
    .put("number_of_shards", 1)
    .put("number_of_replicas", 0)
    );
    //创建 映射
    createIndexRequest.mapping(
    "{\"properties\":{" +
    "\"name\":{\"type\":\"text\"}," +
    "\"age\":{\"type\":\"integer\"}," +
    "\"job\":{\"type\":\"keyword\"}" +
    "}}" ,
    XContentType.JSON);
    //创建 索引操作客户端
    IndicesClient indices = esClient.indices();
    //创建 响应对象
    CreateIndexResponse createIndexResponse = indices.create(createIndexRequest, RequestOptions.DEFAULT);
    //得到结果
    boolean acknowledged = createIndexResponse.isAcknowledged();
    System.out.println(acknowledged?"成功":"失败");
    }
  • 增删改文档
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    //索引请求对象
    IndexRequest indexRequest = new IndexRequest("java_index");
    //请求内容
    indexRequest.source(new HashMap<String, Object>() {{
    put("name", "汐涌及岸");
    put("age", 18);
    put("timestamp", new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(System.currentTimeMillis()));
    }});
    //响应对象
    IndexResponse indexResponse = esClient.index(indexRequest, RequestOptions.DEFAULT);
    //响应结果
    DocWriteResponse.Result result = indexResponse.getResult();
    System.out.println(result);
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    //修改索引请求对象
    UpdateRequest updateRequest = new UpdateRequest("java_index","ID值");
    //请求内容
    updateRequest.doc(new HashMap<String, Object>() {{
    put("name", "汐涌及岸");
    put("age", 18);
    put("reg_time", new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(System.currentTimeMillis()));
    }});
    //响应对象
    UpdateResponse updateResponse = esClient.update(updateRequest, RequestOptions.DEFAULT);
    //响应结果
    DocWriteResponse.Result result = updateResponse.getResult();
    System.out.println(result);
    1
    2
    3
    4
    5
    6
    7
    //删除索引请求对象
    DeleteRequest deleteRequest = new DeleteRequest("java_index", "ID值");
    //响应对象
    DeleteResponse deleteResponse = esClient.delete(deleteRequest, RequestOptions.DEFAULT);
    //响应结果
    DocWriteResponse.Result result = deleteResponse.getResult();
    System.out.println(result);

Query DSL搜索

Query DSL(Domain Specific Language)是ElasticSearch提出的基于json的搜索方式,在搜索时传入特定的json格式的数据来完成不同的搜索需求。

【POST】http://localhost:9200/books/_search
1
2
3
4
5
6
7
8
9
{
"query": {
"match_all": {}
},
"_source": [
"title",
"price"
]
}
1
2
3
4
5
6
7
8
SearchRequest searchRequest = new SearchRequest("books").source(SearchSourceBuilder.searchSource()
.query(QueryBuilders.matchAllQuery())
.fetchSource(new String[]{"title", "price"}, null)
);
SearchResponse searchResponse = esClient.search(searchRequest, RequestOptions.DEFAULT);
for (SearchHit hit : searchResponse.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}

TermQuery为精确查询,在搜索时会将传入的参数作为一个整体关键字,不再对其分词

【POST】http://localhost:9200/books/_search
1
2
3
4
5
6
7
8
9
10
11
{
"query": {
"term": {
"title":"三国志"
}
},
"_source": [
"title",
"price"
]
}
1
2
3
4
5
6
7
8
SearchRequest searchRequest = new SearchRequest("books").source(SearchSourceBuilder.searchSource()
.query(QueryBuilders.termQuery("title","三国志"))
.fetchSource(new String[]{"title", "price"}, null)
);
SearchResponse searchResponse = esClient.search(searchRequest, RequestOptions.DEFAULT);
for (SearchHit hit : searchResponse.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}

matchQuery即全文检索,在搜索时会将传入的参数作为分词,再使用各个词条从索引中搜索。

【POST】http://localhost:9200/books/_search
1
2
3
4
5
6
7
8
9
10
11
{
"query": {
"match": {
"title":"三国志"
}
},
"_source": [
"title",
"price"
]
}
1
2
3
4
5
6
7
8
SearchRequest searchRequest = new SearchRequest("books").source(SearchSourceBuilder.searchSource()
.query(QueryBuilders.matchQuery("title","三国志"))
.fetchSource(new String[]{"title", "price"}, null)
);
SearchResponse searchResponse = esClient.search(searchRequest, RequestOptions.DEFAULT);
for (SearchHit hit : searchResponse.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}

multiMatchQuery一次可以匹配多个字段

【POST】http://localhost:9200/books/_search
1
2
3
4
5
6
7
8
9
{
"query": {
"multi_match": {
"query":"东汉",
"fields":["title","description"]
}
},
"_source": ["title", "price"]
}
1
2
3
4
5
6
7
8
SearchRequest searchRequest = new SearchRequest("books").source(SearchSourceBuilder.searchSource()
.query(QueryBuilders.multiMatchQuery("东汉", "title", "description"))
.fetchSource(new String[]{"title", "price"}, null)
);
SearchResponse searchResponse = esClient.search(searchRequest, RequestOptions.DEFAULT);
for (SearchHit hit : searchResponse.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}

must:文档必须匹配must所包括的查询条件,相当于 AND
should:文档应该匹配should所包括的查询条件其中的一个或多个,相当于 OR
must_not:文档不能匹配must_not所包括的该查询条件,相当于 NOT

【POST】http://localhost:9200/books/_search
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
"query": {
"bool": {
"must": [
{
"term": {
"title": "三国"
}
},
{
"multi_match": {
"query": "东汉",
"fields": ["title", "description"]
}
}
]
}
},
"_source": ["title", "price"]
}
1
2
3
4
5
6
7
8
9
10
11
SearchRequest searchRequest = new SearchRequest("books").source(SearchSourceBuilder.searchSource()
.query(QueryBuilders.boolQuery()
.must(QueryBuilders.termQuery("title", "三国"))
.should(QueryBuilders.multiMatchQuery("东汉", "title","description"))
)
.fetchSource(new String[]{"title", "price"}, null)
);
SearchResponse searchResponse = esClient.search(searchRequest, RequestOptions.DEFAULT);
for (SearchHit hit : searchResponse.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}

要在bool查询中使用,和must/should/must_not同级
过滤是针对搜索的结果进行过滤,过滤器主要判断的是文档是否匹配,不去计算和判断文档的匹配度得分,所以过滤器性能比查询要高,且方便缓存,推荐尽量使用过滤器去实现查询或者过滤器和查询共同使用。
range:范围过滤。
term:项匹配过滤。
注意:range和term一次只能对一个Field设置范围过滤。

【POST】http://localhost:9200/books/_search
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
"query": {
"bool": {
"must": [
{
"match": {
"description": "末年"
}
}
],

"filter": [
{
"range": { "price": { "gte": 51, "lte": 60 } }
}
]
}
},
"_source": ["title", "price"]
}
1
2
3
4
5
6
7
8
9
10
11
SearchRequest searchRequest = new SearchRequest("books").source(SearchSourceBuilder.searchSource()
.query(QueryBuilders.boolQuery()
.must(QueryBuilders.matchQuery("description", "末年"))
.filter(QueryBuilders.rangeQuery("price").gte(51).lte(60))
)
.fetchSource(new String[]{"title", "price"}, null)
);
SearchResponse searchResponse = esClient.search(searchRequest, RequestOptions.DEFAULT);
for (SearchHit hit : searchResponse.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}

form:表示起始文档的下标,从0开始。
size:查询的文档数量。

【POST】http://localhost:9200/books/_search
1
2
3
4
5
6
7
8
{
"from": 0,
"size": 1,
"query": {
"match_all": {}
},
"_source": ["title", "price"]
}
1
2
3
4
5
6
7
8
9
10
SearchRequest searchRequest = new SearchRequest("books").source(SearchSourceBuilder.searchSource()
.from(0)
.size(1)
.query(QueryBuilders.matchAllQuery())
.fetchSource(new String[]{"title", "price"}, null)
);
SearchResponse searchResponse = esClient.search(searchRequest, RequestOptions.DEFAULT);
for (SearchHit hit : searchResponse.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}

可以在字段上添加一个或多个排序,支持在keyword、date、float等类型上添加,text类型的字段上不允许添加排序

【POST】http://localhost:9200/books/_search
1
2
3
4
5
6
7
8
9
10
11
{
"query": {
"match_all": {}
},
"sort": [
{
"price": "desc"
}
],
"_source": ["title", "price"]
}
1
2
3
4
5
6
7
8
9
SearchRequest searchRequest = new SearchRequest("books").source(SearchSourceBuilder.searchSource()
.query(QueryBuilders.matchAllQuery())
.sort("price", SortOrder.DESC)
.fetchSource(new String[]{"title", "price"}, null)
);
SearchResponse searchResponse = esClient.search(searchRequest, RequestOptions.DEFAULT);
for (SearchHit hit : searchResponse.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}

boolQuery的探讨

ElasticSearch中的bool查询,并不能按照口头的逻辑,而是要按照数学逻辑来表示

比如,如果我们想要表示 查询 a=1并且b=2而且c=3或者d=4

  • 按照正常的 (a=1 && b=2) && (c=3 || d=4) 思路,我们会写成
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    {
    "query": {
    "bool": {
    "must": [
    {"term": { "a":1 } }
    {"term": { "b":2 } }
    ],
    "should":[
    {"term": { "c":3 } }
    {"term": { "d":4 } }
    ]
    }
    },
    }
    然而我们会发现并不能得到我们想要的结果
  • 按照数学逻辑 ((a=1 && b=2) && c=3) || ((a=1 && b=2) && d=4) 就ok了
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    {
    "query": {
    "bool": {
    "should": [
    {"term": { "a":1 } }
    {"term": { "b":2 } }
    {"term": { "c":3 } }
    ],
    "should":[
    {"term": { "a":1 } }
    {"term": { "b":2 } }
    {"term": { "d":4 } }
    ]
    }
    },
    }
  • 或者使用嵌套bool 从 a=1 && b=2 的中搜索 c=3 || d=4
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    {
    "query": {
    "bool": {
    "must": [
    {"term": { "a":1 } }
    {"term": { "b":2 } }
    {"bool": {
    "should":[
    {"term": { "c":3 } }
    {"term": { "d":4 } }
    ]
    }}
    ]
    }
    },
    }

高亮显示

【POST】http://localhost:9200/books/_search
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"query": {
"multi_match": {
"query": "三国",
"fields":["title","description"]
}
},
"_source": ["title", "description"],
"highlight": {
"pre_tags": ["<b>"],
"post_tags": ["</b>"],
"fields": {
"title": {},
"description": {}
}
}
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
{
"took": 35,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.3265235,
"hits": [
{
"_index": "books",
"_type": "_doc",
"_id": "qByyU3sBmhW6GP0MI5PA",
"_score": 1.3265235,
"_source": {
"description": "东汉末年,山河动荡,刘汉王朝气数将尽。内有十常侍颠倒黑白,祸乱朝纲;外有张氏兄弟高呼“苍天当死,黄天当立”的口号,掀起浩大的农民起义。一时间狼烟四起,战火熊熊,刘家的朝廷宛如大厦将倾,岌岌可危。",
"title": "三国演义"
},
"highlight": {
"title": ["<b>三国</b>演义"]
}
},
{
"_index": "books",
"_type": "_doc",
"_id": "qRy3U3sBmhW6GP0MYZMa",
"_score": 1.3265235,
"_source": {
"description": "二十四史之一,是由西晋史学家陈寿所著,记载中国三国时期的曹魏、蜀汉、东吴纪传体断代史,是二十四史中评价最高的“前四史”之一。",
"title": "三国志"
},
"highlight": {
"description": [
"二十四史之一,是由西晋史学家陈寿所著,记载中国<b>三国</b>时期的曹魏、蜀汉、东吴纪传体断代史,是二十四史中评价最高的“前四史”之一。"
],
"title": ["<b>三国</b>志"]
}
}
]
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SearchRequest searchRequest = new SearchRequest("books").source(SearchSourceBuilder.searchSource()
.query(QueryBuilders.multiMatchQuery("三国","title","description"))
.fetchSource(new String[]{"title", "price"}, null)
.highlighter(SearchSourceBuilder.highlight()
.preTags("<b>")
.postTags("</b>")
.field("title")
.field("description")
)
);
SearchResponse searchResponse = esClient.search(searchRequest, RequestOptions.DEFAULT);
for (SearchHit hit : searchResponse.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
System.err.println(hit.getHighlightFields());
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
SearchRequest searchRequest = new SearchRequest("books").source(SearchSourceBuilder.searchSource()
.query(QueryBuilders.multiMatchQuery("三国", "title", "description"))
.fetchSource(new String[]{"title", "price"}, null)
.highlighter(SearchSourceBuilder.highlight()
.preTags("<b>")
.postTags("</b>")
.field("title")
.field("description")
)
);
SearchResponse searchResponse = esClient.search(searchRequest, RequestOptions.DEFAULT);
//使用highlight的内容替换掉source内容
for (SearchHit hit : searchResponse.getHits().getHits()) {
Map<String, Object> sMap = hit.getSourceAsMap();
Map<String, HighlightField> hMap = hit.getHighlightFields();
for (String fieldName : sMap.keySet()) {
HighlightField hField = hMap.get(fieldName);
if (hField != null) sMap.put(fieldName, String.join("", Arrays.stream(hField.fragments()).map(Text::toString).toArray(String[]::new)));
}
}
for (SearchHit hit : searchResponse.getHits().getHits()){
System.out.println(hit.getSourceAsMap());
}

Spring Data ElasticSearch

配置文件

application.yml
1
2
3
4
spring:
elasticsearch:
rest:
uris: http://localhost:9200

JPA方式调用

继承 ElasticsearchRepository<实体类,主键类型>

1
2
public interface ItemIndex extends ElasticsearchRepository<Item,String> {
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
@Autowired
private ItemIndex itemIndex;

public void add(Item item) {
//设置成空让es自己生成
item.setId(null);
itemIndex.save(item);
}
public void update(Item item) {
itemIndex.save(item);
}
public void delete(String id) {
itemIndex.deleteById(id);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@Data
@Document(indexName = "items")
public class Item {

@Id
private String id;
@Field(type = FieldType.Text, analyzer = "ik_max_word", searchAnalyzer = "ik_smart")
private String name;
private Double price;
@Field(type = FieldType.Keyword)
private String type;
@Field(type = FieldType.Text, analyzer = "ik_max_word", searchAnalyzer = "ik_smart")
private String description;

}

ElasticsearchRestTemplate

条件+分页查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
@Autowired
private ElasticsearchRestTemplate esTemplate;

public PageResult<Item> list(ItemSearch itemSearch, int pageNum, int pageSize) {
//构建查询条件
NativeSearchQueryBuilder searchQueryBuilder = new NativeSearchQueryBuilder();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
//条件1 : 关键字 - 匹配 名称/描述
if (!StringUtil.isNullOrEmpty(itemSearch.getKeyword()))
boolQueryBuilder.filter(QueryBuilders.multiMatchQuery(itemSearch.getKeyword(), "name", "description"));
//条件2 : 价格区间
if (itemSearch.getPriceMin() != null || itemSearch.getPriceMax() != null) {
RangeQueryBuilder priceQueryBuilder = QueryBuilders.rangeQuery("price");
if (itemSearch.getPriceMin() != null) priceQueryBuilder.gte(itemSearch.getPriceMin());
if (itemSearch.getPriceMax() != null) priceQueryBuilder.lte(itemSearch.getPriceMax());
boolQueryBuilder.filter(priceQueryBuilder);
}
//条件3 : 类型精确
if (!StringUtil.isNullOrEmpty(itemSearch.getType()))
boolQueryBuilder.filter(QueryBuilders.termQuery("type", itemSearch.getType()));
//条件4 : 价格排序
if (!StringUtil.isNullOrEmpty(itemSearch.getOrderType())) {
if (itemSearch.getOrderType().equals("DESC"))
searchQueryBuilder.withSort(SortBuilders.fieldSort("price").order(SortOrder.DESC));
if (itemSearch.getOrderType().equals("ASC"))
searchQueryBuilder.withSort(SortBuilders.fieldSort("price").order(SortOrder.ASC));
}
//分页
searchQueryBuilder.withPageable(PageRequest.of(pageNum - 1, pageSize));
//高亮
searchQueryBuilder.withHighlightFields(new HighlightBuilder.Field("name").preTags("<b>").postTags("</b>"));
//获取结果
searchQueryBuilder.withQuery(boolQueryBuilder);
SearchHits<Item> hits = esTemplate.search(searchQueryBuilder.build(), Item.class);
//替换高亮
return new PageResult<>(hits.getTotalHits(), hits.stream().map(hit -> {
Item item = hit.getContent();
List<String> nameFields = hit.getHighlightFields().get("name");
if (nameFields != null) item.setName(String.join("", nameFields));
return item;
}).collect(Collectors.toList()));
//不替换高亮
//return new PageResult<>(hits.getTotalHits(), hits.stream().map(SearchHit::getContent).collect(Collectors.toList()));
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@Data
@Document(indexName = "items")
public class Item {

@Id
private String id;
@Field(type = FieldType.Text, analyzer = "ik_max_word", searchAnalyzer = "ik_smart")
private String name;
private Double price;
@Field(type = FieldType.Keyword)
private String type;
@Field(type = FieldType.Text, analyzer = "ik_max_word", searchAnalyzer = "ik_smart")
private String description;

}
1
2
3
4
5
6
7
8
9
10
@Data
public class ItemSearch {

private String keyword;
private Double priceMin;
private Double priceMax;
private String type;
private String orderType;

}
1
2
3
4
@GetMapping("items")
Result list(ItemSearch itemSearch, @RequestParam(value = "pn",defaultValue = "1") int pageNum, @RequestParam(value = "ps",defaultValue = "3 ") int pageSize) {
return Result.success(itemService.list(itemSearch,pageNum,pageSize));
}

聚合查询(Aggregations)

聚合(Aggregations)将ElasticSearch中的数据总结为指标(metrics)、统计数据(statistics)或其他分析数据(analytics)。

Elasticsearch中的聚合主要可以分为下面 三类[?]

  • Bucket(桶)满足特定条件的文档的集合,类似于SQL中的group by。
  • Metric(度量/指标)主要针对的number类型的数据,需要ES做比较多的计算工作,类似于SQL中的count()、sum()、max()等聚合函数。
  • Pipeline(管道) Aggregations that take input from other aggregations instead of documents or fields.

相关链接:Aggregations (聚合)API的使用Elasticsearch Aggregations使用总结

示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
//构建查询
NativeSearchQueryBuilder searchQueryBuilder = new NativeSearchQueryBuilder();
//查询条件,根据自己需求改动即可
searchQueryBuilder.withQuery(QueryBuilders.matchAllQuery())
//聚合查询
searchQueryBuilder.addAggregation(AggregationBuilders.terms("typeGroup").field("type"));
//执行查询
SearchHits<Item> hits = esTemplate.search(searchQueryBuilder.build(), Item.class);
//聚合结果:聚合名字typeGroup(自定义),聚合字段是type(可以多个),限制返回数量20(默认为10)
HashMap<String, Long> typesWithCount = new HashMap<>();
ParsedStringTerms typeGroup = hits.getAggregations().get("typeGroup");
for (Terms.Bucket typeBucket : typeGroup.getBuckets()) {
typesWithCount.put(typeBucket.getKeyAsString(),typeBucket.getDocCount());
}
//输出结果
System.out.println(typesWithCount);
1
2
3
4
5
6
7
8
9
10
11
12
13
{
"query": {
"match_all": {}
},
"aggregations": {
"typeGroup": {
"terms": {
"field": "type",
"size": 20
}
}
}
}

查询结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
"hits": {
//...
},
"aggregations": {
"typeGroup": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "电脑主机",
"doc_count": 2
},
{
"key": "食品",
"doc_count": 1
}
]
}
}
}

写typeCount只是为了演示嵌套功能
只获取分组名和数量推荐按【 ElasticsearchRestTemplate分组+分组数量】的方式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
//构建查询
NativeSearchQueryBuilder searchQueryBuilder = new NativeSearchQueryBuilder();
//查询条件,根据自己需求改动即可
searchQueryBuilder.withQuery(QueryBuilders.matchAllQuery())
//聚合查询
searchQueryBuilder.addAggregation(
//聚合名字typeGroup(自定义),聚合字段是type(可以多个),限制返回数量20(默认为10)
AggregationBuilders.terms("typeGroup").field("type").size(20)
//嵌套一个子聚合查询
.subAggregation(AggregationBuilders.count("typeCount").field("type"))
);
//执行查询
SearchHits<Item> hits = esTemplate.search(searchQueryBuilder.build(), Item.class);
//获取聚合结果
HashMap<String, Long> typesWithCount = new HashMap<>();
ParsedStringTerms typeGroup = hits.getAggregations().get("typeGroup");
for (Terms.Bucket typeBucket : typeGroup.getBuckets()) {
ValueCount typeCount = typeBucket.getAggregations().get("typeCount");
typesWithCount.put(typeBucket.getKeyAsString(),typeCount.getValue());
}
//输出结果
System.out.println(typesWithCount);
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
"query": {
"match_all": {}
},
"aggregations": {
"typeGroup": {
"terms": {
"field": "type",
"size": 20
},
"aggregations": {
"typeCount": {
"value_count": {
"field": "type"
}
}
}
}
}
}

查询结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
"hits": {
//...
},
"aggregations": {
"typeGroup": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "电脑主机",
"doc_count": 2,
"typeCount": {
"value": 2
}
},
{
"key": "食品",
"doc_count": 1,
"typeCount": {
"value": 1
}
}
]
}
}
}