Skip to content

Hybrid model

Using a Hybrid Model

MTEB provides a unified [mteb.HybridSearch][] wrapper that allows you to combine multiple retrievers and cross-encoders using different fusion strategies (e.g. Reciprocal Rank Fusion, Distribution-Based Score Fusion, or custom fusion functions).

You can define a hybrid model as follows:

import mteb
from mteb import HybridSearch

# Load individual sub-models (lexical/sparse and dense)
bm25 = mteb.get_model("mteb/baseline-bm25s")
dense = mteb.get_model("intfloat/multilingual-e5-small")

# Create the hybrid model combining both sub-models
hybrid_model = HybridSearch(
    models=[bm25, dense],
    fusion_strategy="rrf",
    weights=[0.5, 0.5],
)

# Evaluate the hybrid model on your selected tasks
tasks = mteb.get_tasks(tasks=["NFCorpus"])
results = mteb.evaluate(hybrid_model, tasks=tasks)

Performance Comparison

To demonstrate the effectiveness of combining different retrieval paradigms, the table below compares the performance (NDCG@10) of individual sub-models against their hybrid combinations using Reciprocal Rank Fusion (RRF), Distribution-Based Score Fusion (DBSF), and Relative Score Fusion (RSF):

Task mteb/baseline-bm25s intfloat/multilingual-e5-small hybrid using rrf hybrid using dbsf hybrid using 'rsf`
NanoSciFactRetrieval 0.710 0.725 0.754 0.538 0.767
NanoNFCorpusRetrieval 0.325 0.288 0.329 0.338 0.359
NanoSCIDOCSRetrieval 0.335 0.344 0.369 0.344 0.372

Note: Hybrid models combine the lexical mteb/baseline-bm25s and dense intfloat/multilingual-e5-small models using equal weights (0.5/0.5).