ELSER Limitations:
- it only supports up to 512 tokens when running text inference.
- So, if your data contains longer text excerpts that you need to be fully searchable, you are left with two options: a) use another model that supports longer text, or b) split your text into smaller segments.
Optimization Tip:
- 다음과 같이 Index Creation 에서 vector 필드를 _source 문서에서 제외 시키는 것.
- leverage RRF by providing the `rank` section at the end of the query
PUT my-index
{
"mappings": {
"_source": {
"excludes": [
"text_embedding.predicted_value",
"ml.tokens"
]
},
"properties": {
"text": {
"type": "text"
},
"ml": {
"properties": {
"tokens": {
"type": "rank_features"
}
}
},
"text_embedding": {
"properties": {
"predicted_value": {
"type": "dense_vector",
"dims": 384,
"index": true,
"similarity": "cosine"
}
}
}
}
}
}
POST my-index/_search
{
"_source": false,
"fields": [ "text-field" ],
"sub_searches": [
{
"query": {
"match": {
"text-field": "brown fox"
}
}
},
{
"query": {
"text_expansion": {
"ml.tokens": {
"model_id": ".elser_model_1",
"model_text": "a quick brown fox jumps over a lazy dog"
}
}
}
}
],
"knn": {
"field": "image-vector",
"query_vector": [0.1, 3.2, ..., 2.1],
"k": 5,
"num_candidates": 100
},
"rank": {
"rrf": {}
}
}
'Elasticsearch' 카테고리의 다른 글
Elasticsearch: Ingest Pipelines (0) | 2024.08.06 |
---|---|
Elasticsearch: kNN Search Performance Parameter (0) | 2024.08.05 |
Elasticsearch: Dense vector field type (0) | 2024.08.05 |
Elasticsearch: kNN Methods (0) | 2024.08.02 |
Elasticsearch: 벡터 유사도 메트릭(similarity metric) (0) | 2024.08.02 |