Elasticsearch: Hybrid Search Optimization Tips

youngerjesus 2024. 8. 12. 11:57

2024. 8. 12. 11:57

ELSER Limitations:

it only supports up to 512 tokens when running text inference.
So, if your data contains longer text excerpts that you need to be fully searchable, you are left with two options: a) use another model that supports longer text, or b) split your text into smaller segments.

Optimization Tip:

다음과 같이 Index Creation 에서 vector 필드를 _source 문서에서 제외 시키는 것.
leverage RRF by providing the `rank` section at the end of the query

PUT my-index
{
  "mappings": {
    "_source": {
      "excludes": [
        "text_embedding.predicted_value",
        "ml.tokens"
      ]
    },
    "properties": {
      "text": {
        "type": "text"
      },
      "ml": {
        "properties": {
          "tokens": {
            "type": "rank_features"
          }
        }
      },
      "text_embedding": {
        "properties": {
          "predicted_value": {
            "type": "dense_vector",
            "dims": 384,
            "index": true,
            "similarity": "cosine"
          }
        }
      }
    }
  }
}

POST my-index/_search
{
  "_source": false,
  "fields": [ "text-field" ],
  "sub_searches": [
    {
      "query": {
        "match": {
          "text-field": "brown fox"
        }
      }
    },
    {
      "query": {
        "text_expansion": {
          "ml.tokens": {
            "model_id": ".elser_model_1",
            "model_text": "a quick brown fox jumps over a lazy dog"
          }
        }
      }
    }
  ],
  "knn": {
    "field": "image-vector",
    "query_vector": [0.1, 3.2, ..., 2.1],
    "k": 5,
    "num_candidates": 100
  },
  "rank": {
    "rrf": {}
  }
}

저작자표시 비영리

'Elasticsearch' 카테고리의 다른 글

Elasticsearch: Ingest Pipelines (0)	2024.08.06
Elasticsearch: kNN Search Performance Parameter (0)	2024.08.05
Elasticsearch: Dense vector field type (0)	2024.08.05
Elasticsearch: kNN Methods (0)	2024.08.02
Elasticsearch: 벡터 유사도 메트릭(similarity metric) (0)	2024.08.02

여정민의 블로그 실용주의 프로그래머가 되고 싶은 평범한 엔지니어입니다.

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

여정민의 블로그

Elasticsearch: Hybrid Search Optimization Tips

'Elasticsearch' 카테고리의 다른 글

+ Recent posts

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역