2417 字

12 分钟

... 次访问

Elastic Search查询和聚合

2024-07-30

技术分享

ElasticSearch

/

JAVA

Elastic Search API#

Elastic Search查询和聚合#

1.导入数据#

这是ES官方提供的测试数据#

https://github.com/elastic/elasticsearch/blob/v6.8.18/docs/src/test/resources/accounts.json

1
//格式是json
2
{
3
  "account_number": 0,
4
  "balance": 16623,
5
  "firstname": "Bradshaw",
6
  "lastname": "Mckenzie",
7
  "age": 29,
8
  "gender": "F",
9
  "address": "244 Columbus Place",
10
  "employer": "Euron",
11
  "email": "[email protected]",
12
  "city": "Hobucken",
13
  "state": "CO"
14
}

上传#

1
//上传上去
2
http://localhost:9200/account/_bulk?pretty=&refresh=

数据#

2.查询#

查询所有#

match_all表示查询所有的数据，sort即按照什么字段排序

1
GET /account/_search
2
{
3
  "query": { "match_all": {} },
4
  "sort": [
5
    { "account_number": "asc" }
6
  ]
7
}

结果#

分页查询(from+size)#

from 相当于PageNum

size 相当于PageSize

1
GET /account/_search
2
{
3
  "query": { "match_all": {} },
4
  "sort": [
5
    { "account_number": "asc" }
6
  ],
7
  "from": 10,
8
  "size": 2
9
}

指定字段查询：match#

如果要在字段中搜索特定字词，可以使用match; 如下语句将查询address 字段中包含 mill 或者 lane的数据

1
GET /account/_search
2
{
3
  "query": { "match": { "address": "mill lane" } }
4
}

查询段落匹配：match_phrase#

如果我们希望查询的条件是 address字段中包含 “mill lane”，则可以使用match_phrase

1
GET /account/_search
2
{
3
  "query": { "match_phrase": { "address": "mill lane" } }
4
}

多条件查询: bool#

如果要构造更复杂的查询，可以使用bool查询来组合多个查询条件。

例如，以下请求在bank索引中搜索40岁客户的帐户，但不包括居住在爱达荷州（ID）的任何人

1
GET /account/_search
2
{
3
  "query": {
4
    "bool": {
5
      "must": [
6
        { "match": { "age": "40" } }
7
      ],
8
      "must_not": [
9
        { "match": { "state": "ID" } }
10
      ]
11
    }
12
  }
13
}

查询条件：query or filter#

1
GET /account/_search
2
{
3
  "query": {
4
    "bool": {
5
      "must": [
6
        {
7
          "match": {
8
            "state": "ND"
9
          }
10
        }
11
      ],
12
      "filter": [
13
        {
14
          "term": {
15
            "age": "40"
16
          }
17
        },
18
        {
19
          "range": {
20
            "balance": {
21
              "gte": 20000,
22
              "lte": 30000
23
            }
24
          }
25
        }
26
      ]
27
    }
28
  }
29
}

query 上下文的条件是用来给文档打分的，匹配越好 _score 越高

filter 的条件只产生两种结果：符合与不符合，后者被过滤掉，没有_score

3.聚合查询：Aggregation#

简单聚合#

比如我们希望计算出account每个州的统计数量，使用aggs关键字对state字段聚合，被聚合的字段无需对分词统计，所以使用state.keyword对整个字段统计

1
GET /account/_search
2
{
3
  "size": 0,
4
  "aggs": {
5
    "group_by_state": {
6
      "terms": {
7
        "field": "state.keyword"
8
      }
9
    }
10
  }
11
}

嵌套聚合#

ES还可以处理个聚合条件的嵌套。

在对state分组的基础上，嵌套计算avg(balance):

1
GET /account/_search
2
{
3
  "size": 0,
4
  "aggs": {
5
    "group_by_state": {
6
      "terms": {
7
        "field": "state.keyword"
8
      },
9
      "aggs": {
10
        "average_balance": {
11
          "avg": {
12
            "field": "balance"
13
          }
14
        }
15
      }
16
    }
17
  }
18
}

对聚合结果排序#

可以通过在aggs中对嵌套聚合的结果进行排序

对嵌套计算出的avg(balance)，这里是average_balance，进行排序

1
GET /account/_search
2
{
3
  "size": 0,
4
  "aggs": {
5
    "group_by_state": {
6
      "terms": {
7
        "field": "state.keyword",
8
        "order": {
9
          "average_balance": "desc"
10
        }
11
      },
12
      "aggs": {
13
        "average_balance": {
14
          "avg": {
15
            "field": "balance"
16
          }
17
        }
18
      }
19
    }
20
  }
21
}

索引管理#

1.索引管理的引入#

1
PUT /customer/_doc/1
2
{
3
  "name": "John Doe"
4
}

默认是自动创建索引

1
{
2
  "mappings": {
3
    "_doc": {
4
      "properties": {
5
        "name": {
6
          "type": "text",
7
          "fields": {
8
            "keyword": {
9
              "type": "keyword",
10
              "ignore_above": 256
11
            }
12
          }
13
        }
14
      }
15
    }
16
  }
17
}

禁止自动创建索引

1
action.auto_create_index: false

2.索引管理#

创建索引#

1
PUT /test-index-users//创建索引
2
{
3
  "settings": {
4
    "number_of_shards": 1,//分片
5
    "number_of_replicas": 0//副本
6
  },
7
  "mappings": {//这部分定义了索引的映射，即文档的字段和字段类型。
8
    "properties": {//定义了文档的字段
9
      "name": {//定义了一个名为name的字段
10
        "type": "text",//指定字段类型为text，表示这个字段可以存储文本数据
11
        "fields": {//为name字段定义了多字段（multi-fields）
12
          "keyword": {//定义了一个名为keyword的子字段
13
            "type": "keyword",//指定子字段类型为keyword，表示这个字段适合用于精确匹配和聚合
14
            "ignore_above": 256//指定如果字符串长度超过256个字符，则不索引该字段
15
          }
16
        }
17
      },
18
      "age": {//字段
19
        "type": "long"
20
      },
21
      "remarks": {//字段
22
        "type": "text"
23
      }
24
    }
25
  }
26
}

插入测试数据#

1
POST /test-index-users/_doc
2
{
3
  "name": "王呈现",
4
  "age": "1",
5
  "remarks": "冲冲冲"
6
}

打开/关闭索引#

1
POST /test-index-users/_close
2
POST /test-index-users/_open

删除索引#

1
DELETE /test-index-users

查看索引#

1
GET /test-index-users/_mapping

索引模板#

在创建索引之前可以先配置模板，这样在创建索引（手动创建索引或通过对文档建立索引）时，模板设置将用作创建索引的基础

模板有两种类型：索引模板和组件模板。

组件模板是可重用的构建块，用于配置映射，设置和别名；它们不会直接应用于一组索引。
索引模板可以包含组件模板的集合，也可以直接指定设置，映射和别名。

1.索引模板中的优先级索引模板中的优先级#

可组合模板优先于旧模板。如果没有可组合模板匹配给定索引，则旧版模板可能仍匹配并被应用。
如果使用显式设置创建索引并且该索引也与索引模板匹配，则创建索引请求中的设置将优先于索引模板及其组件模板中指定的设置。
如果新数据流或索引与多个索引模板匹配，则使用优先级最高的索引模板。

2.内置索引模板#

Elasticsearch具有内置索引模板，每个索引模板的优先级为100，适用于以下索引模式：

logs-*-*
metrics-*-*
synthetics-*-*

集成springboot#

导入es依赖#

1
        <dependency>
2
            <groupId>org.springframework.boot</groupId>
3
            <artifactId>spring-boot-starter-web</artifactId>
4
        </dependency>
5
        <dependency>
6
            <groupId>org.springframework.boot</groupId>
7
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
8
        </dependency>
9
        <dependency>
10
            <groupId>org.projectlombok</groupId>
11
            <artifactId>lombok</artifactId>
12
            <optional>true</optional>
13
        </dependency>

添加配置类#

1
spring:
2
  elasticsearch:
3
    uris: localhost:9200
4
    read-timeout: 30s
5
    connection-timeout: 5s

创建实体类#

1
//indexName 指定索引名称
2
@Document(indexName = "lx-sd")
3
@Data
4
public class Article {
5
    @Id
6
    @Field(index = false,type = FieldType.Integer)
7
    private Integer id;
8
    /**
9
     * index:是否设置分词  默认为true
10
     * analyzer：储存时使用的分词器
11
     * searchAnalyze:搜索时使用的分词器
12
     * store：是否存储  默认为false
13
     * type：数据类型  默认值是FieldType.Auto
14
     *
15
     */
16
    @Field(analyzer = "ik_smart",searchAnalyzer = "ik_smart",store = true,type = FieldType.Text)
17
    private String title;
18
    @Field(analyzer = "ik_smart",searchAnalyzer = "ik_smart",store = true,type = FieldType.Text)
19
    private String context;
20
    @Field(store = true,type = FieldType.Integer)
21
    private Integer hits;
22
}

创建dao层#

1
@Component
2
public interface ArticleDao extends ElasticsearchRepository<Article,Integer> {
3

4
    /**
5
     * 根据标题查询
6
     * @param title
7
     * @return
8
     */
9
    List<Article> findByTitle(String title);
10

11
    /**
12
     * 根据标题或内容查询
13
     * @param title
14
     * @param context
15
     * @return
16
     */
17
    List<Article> findByTitleOrContext(String title,String context);
18

19
    /**
20
     * 根据标题或内容查询（含分页）
21
     * @param title
22
     * @param context
23
     * @param pageable
24
     * @return
25
     */
26
    List<Article> findByTitleOrContext(String title, String context, Pageable pageable);
27
}

测试类#

1.储存一条数据#

1
//通过springboot es向elasticsearch数据库储存一条数据
2
@Test
3
public void testSave() {
4
    //创建文档
5
    Article article = new Article();
6
    article.setId(1);
7
    article.setTitle("es搜索");
8
    article.setContext("成功了吗");
9
    //保存文档
10
    articleDao.save(article);
11
}

2.查询#

1
//根据标题查询
2
@Test
3
public void testFindByTitle(){
4
    List<Article> es = articleDao.findByTitle("es");
5
    for (Article e : es) {
6
        System.out.println(e);
7
    }
8
}

3.修改#

1
//修改
2
@Test
3
public void testUpdate() {
4
    //判断数据库中是否有你指定的id的文档，如果没有。就进行保存，如果有，就进行更新
5
    //创建文档
6
    Article article = new Article();
7
    article.setId(1);
8
    article.setTitle("es搜索1");
9
    article.setContext("成功了吗1");
10
    //保存文档
11
    articleDao.save(article);
12
}

4.删除#

1
//删除
2
@Test
3
public void testDelete() {
4
//根据主键删除
5
    articleDao.deleteById(1);
6
}

5.分页查询#

1
//重新构建数据
2
@Test
3
public void makeData(){
4
    for (int i = 1; i <= 10; i++) {
5
        //创建文档
6
        Article article = new Article();
7
        article.setId(i);
8
        article.setTitle("es搜索"+i);
9
        article.setContext("成功了吗"+i);
10
        article.setHits(100+i);
11
        //保存数据
12
        articleDao.save(article);
13
    }
14
}
15

16
//分页查询
17
@Test
18
public void testFindAllWithPage(){
19
    //设置分页条件
20
    //page代表页码，从0开始
21
    PageRequest pageRequest = PageRequest.of(1, 3);
22

23
    Page<Article> all = articleDao.findAll(pageRequest);
24
    for (Article article : all) {
25
        System.out.println(article);
26
    }
27
}

6.排序查询

1
//排序查询
2
@Test
3
public void testFindWithSort(){
4
    //设置排序条件
5
    Sort sort = Sort.by(Sort.Order.desc("hits"));
6
    Iterable<Article> all = articleDao.findAll(sort);
7
    for (Article article : all) {
8
        System.out.println(article);
9
    }
10
}

复合查询#

在查询中会有多种条件组合的查询，在ElasticSearch中叫复合查询。它提供了5种复合查询方式：

bool query(布尔查询)
boosting query(提高查询)
constant_score（固定分数查询）
dis_max(最佳匹配查询）
function_score(函数查询）

1.bool query(布尔查询)#

通过布尔逻辑将较小的查询组合成较大的查询。

Bool查询语法有以下特点

子查询可以任意顺序出现
可以嵌套多个查询，包括bool查询
如果bool查询中没有must条件，should中必须至少满足一条才会返回结果。

bool查询包含四种操作符，分别是must,should,must_not,filter。他们均是一种数组，数组里面是对应的判断条件。

must：必须匹配。贡献算分
must_not：过滤子句，必须不能匹配，但不贡献算分
should：选择性匹配，至少满足一条。贡献算分
filter：过滤子句，必须匹配，但不贡献算分

2.boosting query(提高查询)#

不同于bool查询，bool查询中只要一个子查询条件不匹配那么搜索的数据就不会出现。而boosting query则是降低显示的权重/优先级（即score)。

比如搜索逻辑是 name = ‘apple’ and type =‘fruit’，对于只满足部分条件的数据，不是不显示，而是降低显示的优先级（即score)

3.constant_score（固定分数查询）#

查询某个条件时，固定的返回指定的score；显然当不需要计算score时，只需要filter条件即可，因为filter context忽略score。

4.dis_max(最佳匹配查询）#

分离最大化查询（Disjunction Max Query）指的是：将任何与任一查询匹配的文档作为结果返回，但只将最佳匹配的评分作为查询的评分结果返回。

5.function_score(函数查询）#

简而言之就是用自定义function的方式来计算_score。

Elastic Search查询和聚合

https://fuwari.vercel.app/posts/elastic-search查询和聚合/

作者

Purezento

发布于

2024-07-30

许可协议

CC BY-NC-SA 4.0

SkyWalking

MQTT

Elastic Search API#

Elastic Search查询和聚合#

1.导入数据#

这是ES官方提供的测试数据#

上传#

数据#

2.查询#

查询所有#

结果#

分页查询(from+size)#

指定字段查询：match#

查询段落匹配：match_phrase#

多条件查询: bool#

查询条件：query or filter#

3.聚合查询：Aggregation#

简单聚合#

嵌套聚合#

对聚合结果排序#

索引管理#

1.索引管理的引入#

2.索引管理#

创建索引#

插入测试数据#

打开/关闭索引#

删除索引#

查看索引#

索引模板#

1.索引模板中的优先级 索引模板中的优先级#

2.内置索引模板#

集成springboot#

导入es依赖#

添加配置类#

创建实体类#

创建dao层#

测试类#

1.储存一条数据#

2.查询#

3.修改#

4.删除#

5.分页查询#

复合查询#

1.bool query(布尔查询)#

2.boosting query(提高查询)#

3.constant_score（固定分数查询）#

4.dis_max(最佳匹配查询）#

5.function_score(函数查询）#

1.索引模板中的优先级索引模板中的优先级#