Hybrid model
Using a Hybrid Model¶
MTEB provides a unified [mteb.HybridSearch][] wrapper that allows you to combine multiple retrievers and cross-encoders using different fusion strategies (e.g. Reciprocal Rank Fusion, Distribution-Based Score Fusion, or custom fusion functions).
You can define a hybrid model as follows:
import mteb
from mteb import HybridSearch
# Load individual sub-models (lexical/sparse and dense)
bm25 = mteb.get_model("mteb/baseline-bm25s")
dense = mteb.get_model("intfloat/multilingual-e5-small")
# Create the hybrid model combining both sub-models
hybrid_model = HybridSearch(
models=[bm25, dense],
fusion_strategy="rrf",
weights=[0.5, 0.5],
)
# Evaluate the hybrid model on your selected tasks
tasks = mteb.get_tasks(tasks=["NFCorpus"])
results = mteb.evaluate(hybrid_model, tasks=tasks)
Performance Comparison¶
To demonstrate the effectiveness of combining different retrieval paradigms, the table below compares the performance (NDCG@10) of individual sub-models against their hybrid combinations using Reciprocal Rank Fusion (RRF), Distribution-Based Score Fusion (DBSF), and Relative Score Fusion (RSF):
| Task | mteb/baseline-bm25s | intfloat/multilingual-e5-small | hybrid using rrf |
hybrid using dbsf |
hybrid using 'rsf` |
|---|---|---|---|---|---|
| NanoSciFactRetrieval | 0.710 | 0.725 | 0.754 | 0.538 | 0.767 |
| NanoNFCorpusRetrieval | 0.325 | 0.288 | 0.329 | 0.338 | 0.359 |
| NanoSCIDOCSRetrieval | 0.335 | 0.344 | 0.369 | 0.344 | 0.372 |
Note: Hybrid models combine the lexical mteb/baseline-bm25s and dense intfloat/multilingual-e5-small models using equal weights (0.5/0.5).