Available Benchmarks¶
BEIR¶
Zero-shot retrieval quality across a heterogeneous set of IR tasks and domains, providing a common framework for comparing NLP-based retrieval models.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| TRECCOVID | Retrieval | text | eng |
| NFCorpus | Retrieval | text | eng |
| NQ | Retrieval | text | eng |
| HotpotQA | Retrieval | text | eng |
| FiQA2018 | Retrieval | text | eng |
| ArguAna | Retrieval | text | eng |
| Touche2020 | Retrieval | text | eng |
| CQADupstackRetrieval | Retrieval | text | eng |
| QuoraRetrieval | Retrieval | text | eng |
| DBPedia | Retrieval | text | eng |
| SCIDOCS | Retrieval | text | eng |
| FEVER | Retrieval | text | eng |
| ClimateFEVER | Retrieval | text | eng |
| SciFact | Retrieval | text | eng |
| MSMARCO | Retrieval | text | eng |
Citation
@inproceedings{thakur2021beir,
author = {Nandan Thakur and Nils Reimers and Andreas R{\"u}ckl{\'e} and Abhishek Srivastava and Iryna Gurevych},
booktitle = {Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},
title = {{BEIR}: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models},
url = {https://openreview.net/forum?id=wCu6T5xFjeJ},
year = {2021},
}
BEIR-NL¶
Zero-shot retrieval quality in Dutch across the BEIR task suite, created through automated translation of the original English benchmark.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| ArguAna-NL | Retrieval | text | nld |
| CQADupstack-NL | Retrieval | text | nld |
| FEVER-NL | Retrieval | text | nld |
| NQ-NL | Retrieval | text | nld |
| Touche2020-NL | Retrieval | text | nld |
| FiQA2018-NL | Retrieval | text | nld |
| Quora-NL | Retrieval | text | nld |
| HotpotQA-NL | Retrieval | text | nld |
| SCIDOCS-NL | Retrieval | text | nld |
| ClimateFEVER-NL | Retrieval | text | nld |
| mMARCO-NL | Retrieval | text | nld |
| SciFact-NL | Retrieval | text | nld |
| DBPedia-NL | Retrieval | text | nld |
| NFCorpus-NL | Retrieval | text | nld |
| TRECCOVID-NL | Retrieval | text | nld |
Citation
@misc{banar2024beirnlzeroshotinformationretrieval,
archiveprefix = {arXiv},
author = {Nikolay Banar and Ehsan Lotfi and Walter Daelemans},
eprint = {2412.08329},
primaryclass = {cs.CL},
title = {BEIR-NL: Zero-shot Information Retrieval Benchmark for the Dutch Language},
url = {https://arxiv.org/abs/2412.08329},
year = {2024},
}
BRIGHT¶
Reasoning-intensive retrieval quality across real-world queries spanning diverse domains including economics, psychology, mathematics, and coding, drawn from naturally occurring and carefully curated human data.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| BrightRetrieval | Retrieval | text | eng |
Citation
@article{su2024bright,
author = {Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S and Tang, Michael and others},
journal = {arXiv preprint arXiv:2407.12883},
title = {Bright: A realistic and challenging benchmark for reasoning-intensive retrieval},
year = {2024},
}
BRIGHT (long)¶
Reasoning-intensive retrieval quality across real-world queries spanning diverse domains, filtered to longer documents to stress-test models on extended contexts.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| BrightLongRetrieval | Retrieval | text | eng |
Citation
@article{su2024bright,
author = {Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S and Tang, Michael and others},
journal = {arXiv preprint arXiv:2407.12883},
title = {Bright: A realistic and challenging benchmark for reasoning-intensive retrieval},
year = {2024},
}
BRIGHT(v1.1)¶
Reasoning-intensive retrieval quality across real-world queries spanning diverse domains including economics, psychology, mathematics, and coding. v1.1 restructures tasks into separate datasets and adds per-task prompts.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| BrightBiologyRetrieval | Retrieval | text | eng |
| BrightEarthScienceRetrieval | Retrieval | text | eng |
| BrightEconomicsRetrieval | Retrieval | text | eng |
| BrightPsychologyRetrieval | Retrieval | text | eng |
| BrightRoboticsRetrieval | Retrieval | text | eng |
| BrightStackoverflowRetrieval | Retrieval | text | eng |
| BrightSustainableLivingRetrieval | Retrieval | text | eng |
| BrightPonyRetrieval | Retrieval | text | eng |
| BrightLeetcodeRetrieval | Retrieval | text | eng |
| BrightAopsRetrieval | Retrieval | text | eng |
| BrightTheoremQATheoremsRetrieval | Retrieval | text | eng |
| BrightTheoremQAQuestionsRetrieval | Retrieval | text | eng |
| BrightBiologyLongRetrieval | Retrieval | text | eng |
| BrightEarthScienceLongRetrieval | Retrieval | text | eng |
| BrightEconomicsLongRetrieval | Retrieval | text | eng |
| BrightPsychologyLongRetrieval | Retrieval | text | eng |
| BrightRoboticsLongRetrieval | Retrieval | text | eng |
| BrightStackoverflowLongRetrieval | Retrieval | text | eng |
| BrightSustainableLivingLongRetrieval | Retrieval | text | eng |
| BrightPonyLongRetrieval | Retrieval | text | eng |
Citation
@article{su2024bright,
author = {Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S and Tang, Michael and others},
journal = {arXiv preprint arXiv:2407.12883},
title = {Bright: A realistic and challenging benchmark for reasoning-intensive retrieval},
year = {2024},
}
BuiltBench(eng)¶
Text embedding quality in the built environment domain across clustering, retrieval, and reranking, spanning architecture, engineering, construction, and operations management.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| BuiltBenchClusteringP2P | Clustering | text | eng |
| BuiltBenchClusteringS2S | Clustering | text | eng |
| BuiltBenchRetrieval | Retrieval | text | eng |
| BuiltBenchReranking | Reranking | text | eng |
Citation
@article{shahinmoghadam2024benchmarking,
author = {Shahinmoghadam, Mehrzad and Motamedi, Ali},
journal = {arXiv preprint arXiv:2411.12056},
title = {Benchmarking pre-trained text embedding models in aligning built asset information},
year = {2024},
}
ChemTEB¶
Chemical domain text embedding quality across bitext mining, classification, clustering, pair classification, and retrieval.
Tasks
Citation
@article{kasmaee2024chemteb,
author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
journal = {arXiv preprint arXiv:2412.00532},
title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
year = {2024},
}
ChemTEB(v1.1)¶
Chemical domain text embedding quality across bitext mining, classification, clustering, pair classification, and retrieval. v1.1 adds the ChemRxivRetrieval task.
Tasks
Citation
@article{kasmaee2024chemteb,
author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
journal = {arXiv preprint arXiv:2412.00532},
title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
year = {2024},
}
@article{kasmaee2025chembed,
author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Astaraki, Mahdi and Saloot, Mohammad Arshi and Sherck, Nicholas and Mahyar, Hamidreza and Samiee, Soheila},
journal = {arXiv preprint arXiv:2508.01643},
title = {Chembed: Enhancing chemical literature search through domain-specific text embeddings},
year = {2025},
}
CoIR¶
Code information retrieval across diverse programming languages and coding tasks, including code search, question answering, and text-to-SQL retrieval.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| AppsRetrieval | Retrieval | text | eng, python |
| CodeFeedbackMT | Retrieval | text | eng |
| CodeFeedbackST | Retrieval | text | eng |
| CodeSearchNetCCRetrieval | Retrieval | text | go, java, javascript, php, python, ... (6) |
| CodeTransOceanContest | Retrieval | text | c++, python |
| CodeTransOceanDL | Retrieval | text | python |
| CosQA | Retrieval | text | eng, python |
| COIRCodeSearchNetRetrieval | Retrieval | text | go, java, javascript, php, python, ... (6) |
| StackOverflowQA | Retrieval | text | eng |
| SyntheticText2SQL | Retrieval | text | eng, sql |
Citation
@misc{li2024coircomprehensivebenchmarkcode,
archiveprefix = {arXiv},
author = {Xiangyang Li and Kuicai Dong and Yi Quan Lee and Wei Xia and Yichun Yin and Hao Zhang and Yong Liu and Yasheng Wang and Ruiming Tang},
eprint = {2407.02883},
primaryclass = {cs.IR},
title = {CoIR: A Comprehensive Benchmark for Code Information Retrieval Models},
url = {https://arxiv.org/abs/2407.02883},
year = {2024},
}
CoREB(v1)¶
Code embedding and reranking quality across code-to-text, text-to-code, and code-to-code retrieval tasks, using counterfactually rewritten problems in five programming languages to limit training data contamination.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| CorebC2TRetrieval | Retrieval | text | c++, eng, go, java, python, ... (6) |
| CorebC2CRetrieval | Retrieval | text | c++, eng, go, java, python, ... (6) |
| CorebT2CRetrieval | Retrieval | text | c++, eng, go, java, python, ... (6) |
| CorebC2TReranking | Reranking | text | c++, eng, go, java, python, ... (6) |
| CorebC2CReranking | Reranking | text | c++, eng, go, java, python, ... (6) |
| CorebT2CReranking | Reranking | text | c++, eng, go, java, python, ... (6) |
Citation
@article{xue2026coreb,
author = {Xue, Siqiao and Liao, Zihan and Qin, Jin and Zhang, Ziyin and Mu, Yixiang and Zhou, Fan and Yu, Hang},
journal = {arXiv preprint arXiv:2605.04615},
title = {Beyond Retrieval: A Multitask Benchmark and Model for Code Search},
url = {https://arxiv.org/abs/2605.04615},
year = {2026},
}
CodeRAG¶
Code retrieval quality for retrieval-augmented generation, covering programming solutions, online tutorials, library documentation, and Stack Overflow posts.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| CodeRAGLibraryDocumentationSolutions | Reranking | text | python |
| CodeRAGOnlineTutorials | Reranking | text | python |
| CodeRAGProgrammingSolutions | Reranking | text | python |
| CodeRAGStackoverflowPosts | Reranking | text | python |
Citation
@misc{wang2024coderagbenchretrievalaugmentcode,
archiveprefix = {arXiv},
author = {Zora Zhiruo Wang and Akari Asai and Xinyan Velocity Yu and Frank F. Xu and Yiqing Xie and Graham Neubig and Daniel Fried},
eprint = {2406.14497},
primaryclass = {cs.SE},
title = {CodeRAG-Bench: Can Retrieval Augment Code Generation?},
url = {https://arxiv.org/abs/2406.14497},
year = {2024},
}
Encodechka¶
Russian text embedding quality across paraphrase identification, sentiment analysis, toxicity classification, intent classification, natural language inference, and semantic similarity.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| RUParaPhraserSTS | STS | text | rus |
| SentiRuEval2016 | Classification | text | rus |
| RuToxicOKMLCUPClassification | Classification | text | rus |
| InappropriatenessClassificationv2 | Classification | text | rus |
| RuNLUIntentClassification | Classification | text | rus |
| XNLI | PairClassification | text | ara, bul, deu, ell, eng, ... (14) |
| RuSTSBenchmarkSTS | STS | text | rus |
Citation
@misc{dale_encodechka,
author = {Dale, David},
editor = {habr.com},
month = {June},
note = {[Online; posted 12-June-2022]},
title = {Russian rating of sentence encoders},
url = {https://habr.com/ru/articles/669674/},
year = {2022},
}
FollowIR¶
Instruction-following retrieval quality, measuring how well models retrieve relevant documents when given detailed natural language instructions alongside queries.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| Robust04InstructionRetrieval | InstructionReranking | text | eng |
| News21InstructionRetrieval | InstructionReranking | text | eng |
| Core17InstructionRetrieval | InstructionReranking | text | eng |
Citation
@misc{weller2024followir,
archiveprefix = {arXiv},
author = {Orion Weller and Benjamin Chang and Sean MacAvaney and Kyle Lo and Arman Cohan and Benjamin Van Durme and Dawn Lawrie and Luca Soldaini},
eprint = {2403.15246},
primaryclass = {cs.IR},
title = {FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions},
year = {2024},
}
HUME(v1)¶
Text embedding performance benchmarked against human annotator scores across classification, clustering, reranking, and semantic similarity tasks, capturing where models exceed or fall short of human-level judgment.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| HUMEEmotionClassification | Classification | text | eng |
| HUMEToxicConversationsClassification | Classification | text | eng |
| HUMETweetSentimentExtractionClassification | Classification | text | eng |
| HUMEMultilingualSentimentClassification | Classification | text | ara, eng, nob, rus |
| HUMEArxivClusteringP2P | Clustering | text | eng |
| HUMERedditClusteringP2P | Clustering | text | eng |
| HUMEWikiCitiesClustering | Clustering | text | eng |
| HUMESIB200ClusteringS2S | Clustering | text | ara, dan, eng, fra, rus |
| HUMECore17InstructionReranking | Reranking | text | eng |
| HUMENews21InstructionReranking | Reranking | text | eng |
| HUMERobust04InstructionReranking | Reranking | text | eng |
| HUMEWikipediaRerankingMultilingual | Reranking | text | dan, eng, nob |
| HUMESICK-R | STS | text | eng |
| HUMESTS12 | STS | text | eng |
| HUMESTSBenchmark | STS | text | eng |
| HUMESTS22 | STS | text | ara, eng, fra, rus |
JMTEB(v2)¶
Japanese text embedding quality across clustering, classification, semantic similarity, retrieval, and reranking. v2 extends the benchmark to 28 datasets for more comprehensive evaluation compared with MTEB(jpn, v1).
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| LivedoorNewsClustering.v2 | Clustering | text | jpn |
| MewsC16JaClustering | Clustering | text | jpn |
| SIB200ClusteringS2S | Clustering | text | ace, acm, acq, aeb, afr, ... (197) |
| AmazonReviewsClassification | Classification | text | cmn, deu, eng, fra, jpn, ... (6) |
| AmazonCounterfactualClassification | Classification | text | deu, eng, jpn |
| MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| JapaneseSentimentClassification | Classification | text | jpn |
| SIB200Classification | Classification | text | ace, acm, acq, aeb, afr, ... (197) |
| WRIMEClassification | Classification | text | jpn |
| JSTS | STS | text | jpn |
| JSICK | STS | text | jpn |
| JaqketRetrieval | Retrieval | text | jpn |
| MrTidyRetrieval | Retrieval | text | ara, ben, eng, fin, ind, ... (11) |
| JaGovFaqsRetrieval | Retrieval | text | jpn |
| NLPJournalTitleAbsRetrieval.V2 | Retrieval | text | jpn |
| NLPJournalTitleIntroRetrieval.V2 | Retrieval | text | jpn |
| NLPJournalAbsIntroRetrieval.V2 | Retrieval | text | jpn |
| NLPJournalAbsArticleRetrieval.V2 | Retrieval | text | jpn |
| JaCWIRRetrieval | Retrieval | text | jpn |
| MIRACLRetrieval | Retrieval | text | ara, ben, deu, eng, fas, ... (18) |
| MintakaRetrieval | Retrieval | text | ara, deu, fra, hin, ita, ... (8) |
| MultiLongDocRetrieval | Retrieval | text | ara, cmn, deu, eng, fra, ... (13) |
| ESCIReranking | Reranking | text | eng, jpn, spa |
| JQaRAReranking | Reranking | text | jpn |
| JaCWIRReranking | Reranking | text | jpn |
| MIRACLReranking | Reranking | text | ara, ben, deu, eng, fas, ... (18) |
| MultiLongDocReranking | Reranking | text | ara, deu, eng, fra, hin, ... (13) |
Citation
@article{li2025jmteb,
author = {Li, Shengzhe and Ohagi, Masaya and Ri, Ryokan and Fukuchi, Akihiko and Shibata, Tomohide and Kawahara, Daisuke},
issue = {3},
journal = {Vol.2025-NL-265,No.3,1-15},
month = {sep},
title = {{JMTEB and JMTEB-lite: Japanese Massive Text Embedding Benchmark and Its Lightweight Version}},
year = {2025},
}
JMTEB-lite(v1)¶
Japanese text embedding quality across clustering, classification, semantic similarity, retrieval, and reranking, with heavy datasets optimized via hard negative pooling to enable faster evaluation while maintaining rankings consistent with JMTEB.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| LivedoorNewsClustering.v2 | Clustering | text | jpn |
| MewsC16JaClustering | Clustering | text | jpn |
| SIB200ClusteringS2S | Clustering | text | ace, acm, acq, aeb, afr, ... (197) |
| AmazonReviewsClassification | Classification | text | cmn, deu, eng, fra, jpn, ... (6) |
| AmazonCounterfactualClassification | Classification | text | deu, eng, jpn |
| MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| JapaneseSentimentClassification | Classification | text | jpn |
| SIB200Classification | Classification | text | ace, acm, acq, aeb, afr, ... (197) |
| WRIMEClassification | Classification | text | jpn |
| JSTS | STS | text | jpn |
| JSICK | STS | text | jpn |
| JaqketRetrievalLite | Retrieval | text | jpn |
| MrTyDiJaRetrievalLite | Retrieval | text | jpn |
| JaGovFaqsRetrieval | Retrieval | text | jpn |
| NLPJournalTitleAbsRetrieval.V2 | Retrieval | text | jpn |
| NLPJournalTitleIntroRetrieval.V2 | Retrieval | text | jpn |
| NLPJournalAbsIntroRetrieval.V2 | Retrieval | text | jpn |
| NLPJournalAbsArticleRetrieval.V2 | Retrieval | text | jpn |
| JaCWIRRetrievalLite | Retrieval | text | jpn |
| MIRACLJaRetrievalLite | Retrieval | text | jpn |
| MintakaRetrieval | Retrieval | text | ara, deu, fra, hin, ita, ... (8) |
| MultiLongDocRetrieval | Retrieval | text | ara, cmn, deu, eng, fra, ... (13) |
| ESCIReranking | Reranking | text | eng, jpn, spa |
| JQaRARerankingLite | Reranking | text | jpn |
| JaCWIRRerankingLite | Reranking | text | jpn |
| MIRACLReranking | Reranking | text | ara, ben, deu, eng, fas, ... (18) |
| MultiLongDocReranking | Reranking | text | ara, deu, eng, fra, hin, ... (13) |
Citation
@article{li2025jmteb,
author = {Li, Shengzhe and Ohagi, Masaya and Ri, Ryokan and Fukuchi, Akihiko and Shibata, Tomohide and Kawahara, Daisuke},
issue = {3},
journal = {Vol.2025-NL-265,No.3,1-15},
month = {sep},
title = {{JMTEB and JMTEB-lite: Japanese Massive Text Embedding Benchmark and Its Lightweight Version}},
year = {2025},
}
JinaVDR¶
Visual document retrieval across multilingual, domain-diverse, and layout-rich document types, spanning medical, legal, financial, technical, and other domains across multiple languages.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| JinaVDRMedicalPrescriptionsRetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRStanfordSlideRetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRDonutVQAISynHMPRetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRTableVQARetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRChartQARetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRTQARetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDROpenAINewsRetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDREuropeanaDeNewsRetrieval | DocumentUnderstanding | text, image | deu |
| JinaVDREuropeanaEsNewsRetrieval | DocumentUnderstanding | text, image | spa |
| JinaVDREuropeanaItScansRetrieval | DocumentUnderstanding | text, image | ita |
| JinaVDREuropeanaNlLegalRetrieval | DocumentUnderstanding | text, image | nld |
| JinaVDRHindiGovVQARetrieval | DocumentUnderstanding | text, image | hin |
| JinaVDRAutomobileCatelogRetrieval | DocumentUnderstanding | text, image | jpn |
| JinaVDRBeveragesCatalogueRetrieval | DocumentUnderstanding | text, image | rus |
| JinaVDRRamensBenchmarkRetrieval | DocumentUnderstanding | text, image | jpn |
| JinaVDRJDocQARetrieval | DocumentUnderstanding | text, image | jpn |
| JinaVDRHungarianDocQARetrieval | DocumentUnderstanding | text, image | hun |
| JinaVDRArabicChartQARetrieval | DocumentUnderstanding | text, image | ara |
| JinaVDRArabicInfographicsVQARetrieval | DocumentUnderstanding | text, image | ara |
| JinaVDROWIDChartsRetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRMPMQARetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRJina2024YearlyBookRetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRWikimediaCommonsMapsRetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRPlotQARetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRMMTabRetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRCharXivOCRRetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRStudentEnrollmentSyntheticRetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRGitHubReadmeRetrieval | DocumentUnderstanding | text, image | ara, ben, deu, eng, fra, ... (17) |
| JinaVDRTweetStockSyntheticsRetrieval | DocumentUnderstanding | text, image | ara, deu, eng, fra, hin, ... (10) |
| JinaVDRAirbnbSyntheticRetrieval | DocumentUnderstanding | text, image | ara, deu, eng, fra, hin, ... (10) |
| JinaVDRShanghaiMasterPlanRetrieval | DocumentUnderstanding | text, image | zho |
| JinaVDRWikimediaCommonsDocumentsRetrieval | DocumentUnderstanding | text, image | ara, ben, deu, eng, fra, ... (20) |
| JinaVDREuropeanaFrNewsRetrieval | DocumentUnderstanding | text, image | fra |
| JinaVDRDocQAHealthcareIndustryRetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRDocQAAI | DocumentUnderstanding | text, image | eng |
| JinaVDRShiftProjectRetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRTatQARetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRInfovqaRetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRDocVQARetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRDocQAGovReportRetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRTabFQuadRetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRDocQAEnergyRetrieval | DocumentUnderstanding | text, image | eng |
| JinaVDRArxivQARetrieval | DocumentUnderstanding | text, image | eng |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
KoViDoRe(v2)¶
Korean visual document retrieval across enterprise document domains including cybersecurity, economics, energy, and HR.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| KoVidore2CybersecurityRetrieval | DocumentUnderstanding | text, image | kor |
| KoVidore2EconomicRetrieval | DocumentUnderstanding | text, image | kor |
| KoVidore2EnergyRetrieval | DocumentUnderstanding | text, image | kor |
| KoVidore2HrRetrieval | DocumentUnderstanding | text, image | kor |
Citation
@misc{choi2026kovidorev2,
author = {Yongbin Choi},
note = {A benchmark for evaluating Korean vision document retrieval with multi-page reasoning queries in practical domains},
title = {KoViDoRe v2: a comprehensive evaluation of vision document retrieval for enterprise use-cases},
url = {https://github.com/whybe-choi/kovidore-data-generator},
year = {2026},
}
LMEB¶
Long-horizon memory retrieval quality across episodic, dialogue, semantic, and procedural retrieval tasks, measuring how well embedding models retrieve evidence in long-term memory scenarios.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| EPBench | Retrieval | text | eng |
| KnowMeBench | Retrieval | text | eng |
| LoCoMo | Retrieval | text | eng |
| LongMemEval | Retrieval | text | eng |
| REALTALK | Retrieval | text | eng |
| TMD | Retrieval | text | eng |
| MemBench | Retrieval | text | eng |
| ConvoMem | Retrieval | text | eng |
| QASPER | Retrieval | text | eng |
| NovelQA | Retrieval | text | eng |
| PeerQA | Retrieval | text | eng |
| CovidQA | Retrieval | text | eng |
| ESGReports | Retrieval | text | eng |
| LMEBMLDR | Retrieval | text | eng |
| LooGLE | Retrieval | text | eng |
| LMEB_SciFact | Retrieval | text | eng |
| Gorilla | Retrieval | text | eng |
| ToolBench | Retrieval | text | eng |
| ReMe | Retrieval | text | eng |
| ProceduralMemBench | Retrieval | text | eng |
| MemGovern | Retrieval | text | eng |
| DeepPlanning | Retrieval | text | eng |
Citation
@misc{zhao2026lmeb,
archiveprefix = {arXiv},
author = {Zhao, Xinping and Hu, Xinshuo and Xu, Jiaxin and Tang, Danyu and Zhang, Xin and Zhou, Mengjia and Zhong, Yan and Zhou, Yao and Shan, Zifei and Zhang, Meishan and Hu, Baotian and Zhang, Min},
eprint = {2603.12572},
primaryclass = {cs.CL},
title = {LMEB: Long-horizon Memory Embedding Benchmark},
url = {https://arxiv.org/abs/2603.12572},
year = {2026},
}
LongEmbed¶
Long-context retrieval quality across synthetic and real-world tasks featuring documents of varying length with dispersed target information.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| LEMBNarrativeQARetrieval | Retrieval | text | eng |
| LEMBNeedleRetrieval | Retrieval | text | eng |
| LEMBPasskeyRetrieval | Retrieval | text | eng |
| LEMBQMSumRetrieval | Retrieval | text | eng |
| LEMBSummScreenFDRetrieval | Retrieval | text | eng |
| LEMBWikimQARetrieval | Retrieval | text | eng |
Citation
@article{zhu2024longembed,
author = {Zhu, Dawei and Wang, Liang and Yang, Nan and Song, Yifan and Wu, Wenhao and Wei, Furu and Li, Sujian},
journal = {arXiv preprint arXiv:2404.12096},
title = {LongEmbed: Extending Embedding Models for Long Context Retrieval},
year = {2024},
}
MAEB(beta)¶
Audio embedding quality across both audio-only and audio-text cross-modal tasks, spanning retrieval, classification, clustering, multilabel classification, pair classification, reranking, and zero-shot classification. Currently in beta pending peer review.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| ClothoT2ARetrieval | Any2AnyRetrieval | text, audio | eng |
| CommonVoiceMini21T2ARetrieval | Any2AnyRetrieval | text, audio | abk, afr, amh, ara, asm, ... (114) |
| FleursT2ARetrieval | Any2AnyRetrieval | text, audio | afr, amh, ara, asm, ast, ... (102) |
| GigaSpeechT2ARetrieval | Any2AnyRetrieval | text, audio | eng |
| JamAltArtistA2ARetrieval | Any2AnyRetrieval | audio | deu, eng, fra, spa |
| JamAltLyricA2TRetrieval | Any2AnyRetrieval | text, audio | deu, eng, fra, spa |
| MACST2ARetrieval | Any2AnyRetrieval | text, audio | eng |
| SpokenSQuADT2ARetrieval | Any2AnyRetrieval | text, audio | eng |
| UrbanSound8KT2ARetrieval | Any2AnyRetrieval | text, audio | zxx |
| BeijingOpera | AudioClassification | audio | zxx |
| BirdCLEF | AudioClassification | audio | zxx |
| CREMA_D | AudioClassification | audio | eng |
| CommonLanguageAgeDetection | AudioClassification | audio | eng |
| GTZANGenre | AudioClassification | audio | zxx |
| IEMOCAPGender | AudioClassification | audio | eng |
| MInDS14 | AudioClassification | audio | ces, deu, eng, fra, ita, ... (12) |
| MridinghamTonic | AudioClassification | audio | zxx |
| SIBFLEURS | AudioClassification | audio | afr, amh, arb, asm, ast, ... (101) |
| VoxCelebSA | AudioClassification | audio | eng |
| VoxPopuliLanguageID | AudioClassification | audio | deu, eng, fra, pol, spa |
| CREMA_DClustering | AudioClustering | audio | eng |
| VehicleSoundClustering | AudioClustering | audio | zxx |
| VoxPopuliGenderClustering | AudioClustering | audio | deu, eng, fra, pol, spa |
| CREMADPairClassification | AudioPairClassification | audio | eng |
| NMSQAPairClassification | AudioPairClassification | audio | eng |
| VoxPopuliAccentPairClassification | AudioPairClassification | audio | eng |
| GTZANAudioReranking | AudioReranking | audio | zxx |
| RavdessZeroshot | AudioZeroshotClassification | audio, text | eng |
| SpeechCommandsZeroshotv0.02 | AudioZeroshotClassification | audio, text | eng |
| FSD2019Kaggle | AudioMultilabelClassification | audio | eng |
Citation
@misc{assadi2026maebmassiveaudioembedding,
archiveprefix = {arXiv},
author = {Adnan El Assadi and Isaac Chung and Chenghao Xiao and Roman Solomatin and Animesh Jha and Rahul Chand and Silky Singh and Kaitlyn Wang and Ali Sartaz Khan and Marc Moussa Nasser and Sufen Fong and Pengfei He and Alan Xiao and Ayush Sunil Munot and Aditya Shrivastava and Artem Gazizov and Niklas Muennighoff and Kenneth Enevoldsen},
eprint = {2602.16008},
primaryclass = {cs.SD},
title = {MAEB: Massive Audio Embedding Benchmark},
url = {https://arxiv.org/abs/2602.16008},
year = {2026},
}
MAEB(beta, audio-only)¶
Audio-only embedding quality across classification, clustering, pair classification, reranking, and retrieval tasks. Currently in beta pending peer review.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| JamAltArtistA2ARetrieval | Any2AnyRetrieval | audio | deu, eng, fra, spa |
| BeijingOpera | AudioClassification | audio | zxx |
| BirdCLEF | AudioClassification | audio | zxx |
| CREMA_D | AudioClassification | audio | eng |
| CommonLanguageAgeDetection | AudioClassification | audio | eng |
| GTZANGenre | AudioClassification | audio | zxx |
| IEMOCAPGender | AudioClassification | audio | eng |
| MInDS14 | AudioClassification | audio | ces, deu, eng, fra, ita, ... (12) |
| MridinghamTonic | AudioClassification | audio | zxx |
| SIBFLEURS | AudioClassification | audio | afr, amh, arb, asm, ast, ... (101) |
| VoxCelebSA | AudioClassification | audio | eng |
| VoxPopuliLanguageID | AudioClassification | audio | deu, eng, fra, pol, spa |
| CREMA_DClustering | AudioClustering | audio | eng |
| VehicleSoundClustering | AudioClustering | audio | zxx |
| VoxPopuliGenderClustering | AudioClustering | audio | deu, eng, fra, pol, spa |
| CREMADPairClassification | AudioPairClassification | audio | eng |
| NMSQAPairClassification | AudioPairClassification | audio | eng |
| VoxPopuliAccentPairClassification | AudioPairClassification | audio | eng |
| GTZANAudioReranking | AudioReranking | audio | zxx |
Citation
@misc{assadi2026maebmassiveaudioembedding,
archiveprefix = {arXiv},
author = {Adnan El Assadi and Isaac Chung and Chenghao Xiao and Roman Solomatin and Animesh Jha and Rahul Chand and Silky Singh and Kaitlyn Wang and Ali Sartaz Khan and Marc Moussa Nasser and Sufen Fong and Pengfei He and Alan Xiao and Ayush Sunil Munot and Aditya Shrivastava and Artem Gazizov and Niklas Muennighoff and Kenneth Enevoldsen},
eprint = {2602.16008},
primaryclass = {cs.SD},
title = {MAEB: Massive Audio Embedding Benchmark},
url = {https://arxiv.org/abs/2602.16008},
year = {2026},
}
MIEB(Img)¶
Image-only embedding quality across retrieval, classification, clustering, and visual STS, excluding tasks that require a text encoder.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| CUB200I2IRetrieval | Any2AnyRetrieval | image | eng |
| FORBI2IRetrieval | Any2AnyRetrieval | image | eng |
| GLDv2I2IRetrieval | Any2AnyRetrieval | image | eng |
| METI2IRetrieval | Any2AnyRetrieval | image | eng |
| NIGHTSI2IRetrieval | Any2AnyRetrieval | image | eng |
| ROxfordEasyI2IRetrieval | Any2AnyRetrieval | image | eng |
| ROxfordMediumI2IRetrieval | Any2AnyRetrieval | image | eng |
| ROxfordHardI2IRetrieval | Any2AnyRetrieval | image | eng |
| RP2kI2IRetrieval | Any2AnyRetrieval | image | eng |
| RParisEasyI2IRetrieval | Any2AnyRetrieval | image | eng |
| RParisMediumI2IRetrieval | Any2AnyRetrieval | image | eng |
| RParisHardI2IRetrieval | Any2AnyRetrieval | image | eng |
| SketchyI2IRetrieval | Any2AnyRetrieval | image | eng |
| SOPI2IRetrieval | Any2AnyRetrieval | image | eng |
| StanfordCarsI2IRetrieval | Any2AnyRetrieval | image | eng |
| Birdsnap | ImageClassification | image | eng |
| Caltech101 | ImageClassification | image | eng |
| CIFAR10 | ImageClassification | image | eng |
| CIFAR100 | ImageClassification | image | eng |
| Country211 | ImageClassification | image | eng |
| DTD | ImageClassification | image | eng |
| EuroSAT | ImageClassification | image | eng |
| FER2013 | ImageClassification | image | eng |
| FGVCAircraft | ImageClassification | image | eng |
| Food101Classification | ImageClassification | image | eng |
| GTSRB | ImageClassification | image | eng |
| Imagenet1k | ImageClassification | image | eng |
| MNIST | ImageClassification | image | eng |
| OxfordFlowersClassification | ImageClassification | image | eng |
| OxfordPets | ImageClassification | image | eng |
| PatchCamelyon | ImageClassification | image | eng |
| RESISC45 | ImageClassification | image | eng |
| StanfordCars | ImageClassification | image | eng |
| STL10 | ImageClassification | image | eng |
| SUN397 | ImageClassification | image | eng |
| UCF101 | ImageClassification | image | eng |
| CIFAR10Clustering | ImageClustering | image | eng |
| CIFAR100Clustering | ImageClustering | image | eng |
| ImageNetDog15Clustering | ImageClustering | image | eng |
| ImageNet10Clustering | ImageClustering | image | eng |
| TinyImageNetClustering | ImageClustering | image | eng |
| VOC2007 | ImageClassification | image | eng |
| STS12VisualSTS | VisualSTS(eng) | image | eng |
| STS13VisualSTS | VisualSTS(eng) | image | eng |
| STS14VisualSTS | VisualSTS(eng) | image | eng |
| STS15VisualSTS | VisualSTS(eng) | image | eng |
| STS16VisualSTS | VisualSTS(eng) | image | eng |
| STS17MultilingualVisualSTS | VisualSTS(multi) | image | ara, deu, eng, fra, ita, ... (9) |
| STSBenchmarkMultilingualVisualSTS | VisualSTS(multi) | image | cmn, deu, eng, fra, ita, ... (10) |
Citation
@inproceedings{xiao2025mieb,
author = {Xiao, Chenghao and Chung, Isaac and Kerboua, Imene and Stirling, Jamie and Zhang, Xin and Kardos, M\'arton and Solomatin, Roman and Al Moubayed, Noura and Enevoldsen, Kenneth and Muennighoff, Niklas},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
pages = {22187-22198},
title = {MIEB: Massive Image Embedding Benchmark},
year = {2025},
}
MIEB(Multilingual)¶
Multilingual image embedding quality across 39 languages, spanning image classification (zero-shot and linear probing), clustering, retrieval, compositionality evaluation, document understanding, visual STS, and CV-centric tasks. Extends MIEB(eng) with multilingual retrieval datasets and the multilingual portions of VisualSTS-b and VisualSTS-16.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| Birdsnap | ImageClassification | image | eng |
| Caltech101 | ImageClassification | image | eng |
| CIFAR10 | ImageClassification | image | eng |
| CIFAR100 | ImageClassification | image | eng |
| Country211 | ImageClassification | image | eng |
| DTD | ImageClassification | image | eng |
| EuroSAT | ImageClassification | image | eng |
| FER2013 | ImageClassification | image | eng |
| FGVCAircraft | ImageClassification | image | eng |
| Food101Classification | ImageClassification | image | eng |
| GTSRB | ImageClassification | image | eng |
| Imagenet1k | ImageClassification | image | eng |
| MNIST | ImageClassification | image | eng |
| OxfordFlowersClassification | ImageClassification | image | eng |
| OxfordPets | ImageClassification | image | eng |
| PatchCamelyon | ImageClassification | image | eng |
| RESISC45 | ImageClassification | image | eng |
| StanfordCars | ImageClassification | image | eng |
| STL10 | ImageClassification | image | eng |
| SUN397 | ImageClassification | image | eng |
| UCF101 | ImageClassification | image | eng |
| VOC2007 | ImageClassification | image | eng |
| CIFAR10Clustering | ImageClustering | image | eng |
| CIFAR100Clustering | ImageClustering | image | eng |
| ImageNetDog15Clustering | ImageClustering | image | eng |
| ImageNet10Clustering | ImageClustering | image | eng |
| TinyImageNetClustering | ImageClustering | image | eng |
| BirdsnapZeroShot | ZeroShotClassification | image, text | eng |
| Caltech101ZeroShot | ZeroShotClassification | text, image | eng |
| CIFAR10ZeroShot | ZeroShotClassification | text, image | eng |
| CIFAR100ZeroShot | ZeroShotClassification | text, image | eng |
| CLEVRZeroShot | ZeroShotClassification | text, image | eng |
| CLEVRCountZeroShot | ZeroShotClassification | text, image | eng |
| Country211ZeroShot | ZeroShotClassification | image, text | eng |
| DTDZeroShot | ZeroShotClassification | image, text | eng |
| EuroSATZeroShot | ZeroShotClassification | image, text | eng |
| FER2013ZeroShot | ZeroShotClassification | image, text | eng |
| FGVCAircraftZeroShot | ZeroShotClassification | text, image | eng |
| Food101ZeroShot | ZeroShotClassification | text, image | eng |
| GTSRBZeroShot | ZeroShotClassification | image, text | eng |
| Imagenet1kZeroShot | ZeroShotClassification | image, text | eng |
| MNISTZeroShot | ZeroShotClassification | image, text | eng |
| OxfordPetsZeroShot | ZeroShotClassification | text, image | eng |
| PatchCamelyonZeroShot | ZeroShotClassification | image, text | eng |
| RenderedSST2 | ZeroShotClassification | text, image | eng |
| RESISC45ZeroShot | ZeroShotClassification | image, text | eng |
| StanfordCarsZeroShot | ZeroShotClassification | image, text | eng |
| STL10ZeroShot | ZeroShotClassification | image, text | eng |
| SUN397ZeroShot | ZeroShotClassification | image, text | eng |
| UCF101ZeroShot | ZeroShotClassification | image, text | eng |
| BLINKIT2IMultiChoice | VisionCentricQA | text, image | eng |
| BLINKIT2TMultiChoice | VisionCentricQA | text, image | eng |
| CVBenchCount | VisionCentricQA | image, text | eng |
| CVBenchRelation | VisionCentricQA | text, image | eng |
| CVBenchDepth | VisionCentricQA | text, image | eng |
| CVBenchDistance | VisionCentricQA | text, image | eng |
| AROCocoOrder | Compositionality | text, image | eng |
| AROFlickrOrder | Compositionality | text, image | eng |
| AROVisualAttribution | Compositionality | text, image | eng |
| AROVisualRelation | Compositionality | text, image | eng |
| SugarCrepe | Compositionality | text, image | eng |
| Winoground | Compositionality | text, image | eng |
| ImageCoDe | Compositionality | text, image | eng |
| STS12VisualSTS | VisualSTS(eng) | image | eng |
| STS13VisualSTS | VisualSTS(eng) | image | eng |
| STS14VisualSTS | VisualSTS(eng) | image | eng |
| STS15VisualSTS | VisualSTS(eng) | image | eng |
| STS16VisualSTS | VisualSTS(eng) | image | eng |
| BLINKIT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| BLINKIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
| CIRRIT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| CUB200I2IRetrieval | Any2AnyRetrieval | image | eng |
| EDIST2ITRetrieval | Any2AnyRetrieval | text, image | eng |
| Fashion200kI2TRetrieval | Any2AnyRetrieval | text, image | eng |
| Fashion200kT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| FashionIQIT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| Flickr30kI2TRetrieval | Any2AnyRetrieval | text, image | eng |
| Flickr30kT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| FORBI2IRetrieval | Any2AnyRetrieval | image | eng |
| GLDv2I2IRetrieval | Any2AnyRetrieval | image | eng |
| GLDv2I2TRetrieval | Any2AnyRetrieval | text, image | eng |
| HatefulMemesI2TRetrieval | Any2AnyRetrieval | text, image | eng |
| HatefulMemesT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| ImageCoDeT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| InfoSeekIT2ITRetrieval | Any2AnyRetrieval | text, image | eng |
| InfoSeekIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
| MemotionI2TRetrieval | Any2AnyRetrieval | text, image | eng |
| MemotionT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| METI2IRetrieval | Any2AnyRetrieval | image | eng |
| MSCOCOI2TRetrieval | Any2AnyRetrieval | text, image | eng |
| MSCOCOT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| NIGHTSI2IRetrieval | Any2AnyRetrieval | image | eng |
| OVENIT2ITRetrieval | Any2AnyRetrieval | image, text | eng |
| OVENIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
| ROxfordEasyI2IRetrieval | Any2AnyRetrieval | image | eng |
| ROxfordMediumI2IRetrieval | Any2AnyRetrieval | image | eng |
| ROxfordHardI2IRetrieval | Any2AnyRetrieval | image | eng |
| RP2kI2IRetrieval | Any2AnyRetrieval | image | eng |
| RParisEasyI2IRetrieval | Any2AnyRetrieval | image | eng |
| RParisMediumI2IRetrieval | Any2AnyRetrieval | image | eng |
| RParisHardI2IRetrieval | Any2AnyRetrieval | image | eng |
| SciMMIRI2TRetrieval | Any2AnyRetrieval | text, image | eng |
| SciMMIRT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| SketchyI2IRetrieval | Any2AnyRetrieval | image | eng |
| SOPI2IRetrieval | Any2AnyRetrieval | image | eng |
| StanfordCarsI2IRetrieval | Any2AnyRetrieval | image | eng |
| TUBerlinT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| VidoreArxivQARetrieval | DocumentUnderstanding | text, image | eng |
| VidoreDocVQARetrieval | DocumentUnderstanding | text, image | eng |
| VidoreInfoVQARetrieval | DocumentUnderstanding | text, image | eng |
| VidoreTabfquadRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreTatdqaRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreShiftProjectRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAAIRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAEnergyRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAGovernmentReportsRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAHealthcareIndustryRetrieval | DocumentUnderstanding | text, image | eng |
| VisualNewsI2TRetrieval | Any2AnyRetrieval | image, text | eng |
| VisualNewsT2IRetrieval | Any2AnyRetrieval | image, text | eng |
| VizWizIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
| VQA2IT2TRetrieval | Any2AnyRetrieval | text, image | eng |
| WebQAT2ITRetrieval | Any2AnyRetrieval | image, text | eng |
| WebQAT2TRetrieval | Any2AnyRetrieval | text | eng |
| WITT2IRetrieval | Any2AnyMultilingualRetrieval | text, image | ara, bul, dan, ell, eng, ... (11) |
| XFlickr30kCoT2IRetrieval | Any2AnyMultilingualRetrieval | text, image | deu, eng, ind, jpn, rus, ... (8) |
| XM3600T2IRetrieval | Any2AnyMultilingualRetrieval | text, image | ara, ben, ces, dan, deu, ... (38) |
| VisualSTS17Eng | VisualSTS(eng) | image | ara, deu, eng, fra, ita, ... (9) |
| VisualSTS-b-Eng | VisualSTS(eng) | image | cmn, deu, eng, fra, ita, ... (10) |
| VisualSTS17Multilingual | VisualSTS(multi) | image | ara, deu, eng, fra, ita, ... (9) |
| VisualSTS-b-Multilingual | VisualSTS(multi) | image | cmn, deu, eng, fra, ita, ... (10) |
Citation
@inproceedings{xiao2025mieb,
author = {Xiao, Chenghao and Chung, Isaac and Kerboua, Imene and Stirling, Jamie and Zhang, Xin and Kardos, M\'arton and Solomatin, Roman and Al Moubayed, Noura and Enevoldsen, Kenneth and Muennighoff, Niklas},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
pages = {22187-22198},
title = {MIEB: Massive Image Embedding Benchmark},
year = {2025},
}
MIEB(eng)¶
English image embedding quality across image classification (zero-shot and linear probing), clustering, retrieval, compositionality evaluation, document understanding, visual STS, and CV-centric tasks.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| Birdsnap | ImageClassification | image | eng |
| Caltech101 | ImageClassification | image | eng |
| CIFAR10 | ImageClassification | image | eng |
| CIFAR100 | ImageClassification | image | eng |
| Country211 | ImageClassification | image | eng |
| DTD | ImageClassification | image | eng |
| EuroSAT | ImageClassification | image | eng |
| FER2013 | ImageClassification | image | eng |
| FGVCAircraft | ImageClassification | image | eng |
| Food101Classification | ImageClassification | image | eng |
| GTSRB | ImageClassification | image | eng |
| Imagenet1k | ImageClassification | image | eng |
| MNIST | ImageClassification | image | eng |
| OxfordFlowersClassification | ImageClassification | image | eng |
| OxfordPets | ImageClassification | image | eng |
| PatchCamelyon | ImageClassification | image | eng |
| RESISC45 | ImageClassification | image | eng |
| StanfordCars | ImageClassification | image | eng |
| STL10 | ImageClassification | image | eng |
| SUN397 | ImageClassification | image | eng |
| UCF101 | ImageClassification | image | eng |
| VOC2007 | ImageClassification | image | eng |
| CIFAR10Clustering | ImageClustering | image | eng |
| CIFAR100Clustering | ImageClustering | image | eng |
| ImageNetDog15Clustering | ImageClustering | image | eng |
| ImageNet10Clustering | ImageClustering | image | eng |
| TinyImageNetClustering | ImageClustering | image | eng |
| BirdsnapZeroShot | ZeroShotClassification | image, text | eng |
| Caltech101ZeroShot | ZeroShotClassification | text, image | eng |
| CIFAR10ZeroShot | ZeroShotClassification | text, image | eng |
| CIFAR100ZeroShot | ZeroShotClassification | text, image | eng |
| CLEVRZeroShot | ZeroShotClassification | text, image | eng |
| CLEVRCountZeroShot | ZeroShotClassification | text, image | eng |
| Country211ZeroShot | ZeroShotClassification | image, text | eng |
| DTDZeroShot | ZeroShotClassification | image, text | eng |
| EuroSATZeroShot | ZeroShotClassification | image, text | eng |
| FER2013ZeroShot | ZeroShotClassification | image, text | eng |
| FGVCAircraftZeroShot | ZeroShotClassification | text, image | eng |
| Food101ZeroShot | ZeroShotClassification | text, image | eng |
| GTSRBZeroShot | ZeroShotClassification | image, text | eng |
| Imagenet1kZeroShot | ZeroShotClassification | image, text | eng |
| MNISTZeroShot | ZeroShotClassification | image, text | eng |
| OxfordPetsZeroShot | ZeroShotClassification | text, image | eng |
| PatchCamelyonZeroShot | ZeroShotClassification | image, text | eng |
| RenderedSST2 | ZeroShotClassification | text, image | eng |
| RESISC45ZeroShot | ZeroShotClassification | image, text | eng |
| StanfordCarsZeroShot | ZeroShotClassification | image, text | eng |
| STL10ZeroShot | ZeroShotClassification | image, text | eng |
| SUN397ZeroShot | ZeroShotClassification | image, text | eng |
| UCF101ZeroShot | ZeroShotClassification | image, text | eng |
| BLINKIT2IMultiChoice | VisionCentricQA | text, image | eng |
| BLINKIT2TMultiChoice | VisionCentricQA | text, image | eng |
| CVBenchCount | VisionCentricQA | image, text | eng |
| CVBenchRelation | VisionCentricQA | text, image | eng |
| CVBenchDepth | VisionCentricQA | text, image | eng |
| CVBenchDistance | VisionCentricQA | text, image | eng |
| AROCocoOrder | Compositionality | text, image | eng |
| AROFlickrOrder | Compositionality | text, image | eng |
| AROVisualAttribution | Compositionality | text, image | eng |
| AROVisualRelation | Compositionality | text, image | eng |
| SugarCrepe | Compositionality | text, image | eng |
| Winoground | Compositionality | text, image | eng |
| ImageCoDe | Compositionality | text, image | eng |
| STS12VisualSTS | VisualSTS(eng) | image | eng |
| STS13VisualSTS | VisualSTS(eng) | image | eng |
| STS14VisualSTS | VisualSTS(eng) | image | eng |
| STS15VisualSTS | VisualSTS(eng) | image | eng |
| STS16VisualSTS | VisualSTS(eng) | image | eng |
| BLINKIT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| BLINKIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
| CIRRIT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| CUB200I2IRetrieval | Any2AnyRetrieval | image | eng |
| EDIST2ITRetrieval | Any2AnyRetrieval | text, image | eng |
| Fashion200kI2TRetrieval | Any2AnyRetrieval | text, image | eng |
| Fashion200kT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| FashionIQIT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| Flickr30kI2TRetrieval | Any2AnyRetrieval | text, image | eng |
| Flickr30kT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| FORBI2IRetrieval | Any2AnyRetrieval | image | eng |
| GLDv2I2IRetrieval | Any2AnyRetrieval | image | eng |
| GLDv2I2TRetrieval | Any2AnyRetrieval | text, image | eng |
| HatefulMemesI2TRetrieval | Any2AnyRetrieval | text, image | eng |
| HatefulMemesT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| ImageCoDeT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| InfoSeekIT2ITRetrieval | Any2AnyRetrieval | text, image | eng |
| InfoSeekIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
| MemotionI2TRetrieval | Any2AnyRetrieval | text, image | eng |
| MemotionT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| METI2IRetrieval | Any2AnyRetrieval | image | eng |
| MSCOCOI2TRetrieval | Any2AnyRetrieval | text, image | eng |
| MSCOCOT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| NIGHTSI2IRetrieval | Any2AnyRetrieval | image | eng |
| OVENIT2ITRetrieval | Any2AnyRetrieval | image, text | eng |
| OVENIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
| ROxfordEasyI2IRetrieval | Any2AnyRetrieval | image | eng |
| ROxfordMediumI2IRetrieval | Any2AnyRetrieval | image | eng |
| ROxfordHardI2IRetrieval | Any2AnyRetrieval | image | eng |
| RP2kI2IRetrieval | Any2AnyRetrieval | image | eng |
| RParisEasyI2IRetrieval | Any2AnyRetrieval | image | eng |
| RParisMediumI2IRetrieval | Any2AnyRetrieval | image | eng |
| RParisHardI2IRetrieval | Any2AnyRetrieval | image | eng |
| SciMMIRI2TRetrieval | Any2AnyRetrieval | text, image | eng |
| SciMMIRT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| SketchyI2IRetrieval | Any2AnyRetrieval | image | eng |
| SOPI2IRetrieval | Any2AnyRetrieval | image | eng |
| StanfordCarsI2IRetrieval | Any2AnyRetrieval | image | eng |
| TUBerlinT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| VidoreArxivQARetrieval | DocumentUnderstanding | text, image | eng |
| VidoreDocVQARetrieval | DocumentUnderstanding | text, image | eng |
| VidoreInfoVQARetrieval | DocumentUnderstanding | text, image | eng |
| VidoreTabfquadRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreTatdqaRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreShiftProjectRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAAIRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAEnergyRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAGovernmentReportsRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAHealthcareIndustryRetrieval | DocumentUnderstanding | text, image | eng |
| VisualNewsI2TRetrieval | Any2AnyRetrieval | image, text | eng |
| VisualNewsT2IRetrieval | Any2AnyRetrieval | image, text | eng |
| VizWizIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
| VQA2IT2TRetrieval | Any2AnyRetrieval | text, image | eng |
| WebQAT2ITRetrieval | Any2AnyRetrieval | image, text | eng |
| WebQAT2TRetrieval | Any2AnyRetrieval | text | eng |
| VisualSTS17Eng | VisualSTS(eng) | image | ara, deu, eng, fra, ita, ... (9) |
| VisualSTS-b-Eng | VisualSTS(eng) | image | cmn, deu, eng, fra, ita, ... (10) |
Citation
@inproceedings{xiao2025mieb,
author = {Xiao, Chenghao and Chung, Isaac and Kerboua, Imene and Stirling, Jamie and Zhang, Xin and Kardos, M\'arton and Solomatin, Roman and Al Moubayed, Noura and Enevoldsen, Kenneth and Muennighoff, Niklas},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
pages = {22187-22198},
title = {MIEB: Massive Image Embedding Benchmark},
year = {2025},
}
MIEB(lite)¶
Multilingual image embedding quality across the same task types as MIEB(Multilingual), designed to be run at a fraction of the cost while maintaining relative model rankings.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| Country211 | ImageClassification | image | eng |
| DTD | ImageClassification | image | eng |
| EuroSAT | ImageClassification | image | eng |
| GTSRB | ImageClassification | image | eng |
| OxfordPets | ImageClassification | image | eng |
| PatchCamelyon | ImageClassification | image | eng |
| RESISC45 | ImageClassification | image | eng |
| SUN397 | ImageClassification | image | eng |
| ImageNetDog15Clustering | ImageClustering | image | eng |
| TinyImageNetClustering | ImageClustering | image | eng |
| CIFAR100ZeroShot | ZeroShotClassification | text, image | eng |
| Country211ZeroShot | ZeroShotClassification | image, text | eng |
| FER2013ZeroShot | ZeroShotClassification | image, text | eng |
| FGVCAircraftZeroShot | ZeroShotClassification | text, image | eng |
| Food101ZeroShot | ZeroShotClassification | text, image | eng |
| OxfordPetsZeroShot | ZeroShotClassification | text, image | eng |
| StanfordCarsZeroShot | ZeroShotClassification | image, text | eng |
| BLINKIT2IMultiChoice | VisionCentricQA | text, image | eng |
| CVBenchCount | VisionCentricQA | image, text | eng |
| CVBenchRelation | VisionCentricQA | text, image | eng |
| CVBenchDepth | VisionCentricQA | text, image | eng |
| CVBenchDistance | VisionCentricQA | text, image | eng |
| AROCocoOrder | Compositionality | text, image | eng |
| AROFlickrOrder | Compositionality | text, image | eng |
| AROVisualAttribution | Compositionality | text, image | eng |
| AROVisualRelation | Compositionality | text, image | eng |
| Winoground | Compositionality | text, image | eng |
| ImageCoDe | Compositionality | text, image | eng |
| STS13VisualSTS | VisualSTS(eng) | image | eng |
| STS15VisualSTS | VisualSTS(eng) | image | eng |
| VisualSTS17Multilingual | VisualSTS(multi) | image | ara, deu, eng, fra, ita, ... (9) |
| VisualSTS-b-Multilingual | VisualSTS(multi) | image | cmn, deu, eng, fra, ita, ... (10) |
| CIRRIT2IRetrieval | Any2AnyRetrieval | text, image | eng |
| CUB200I2IRetrieval | Any2AnyRetrieval | image | eng |
| Fashion200kI2TRetrieval | Any2AnyRetrieval | text, image | eng |
| HatefulMemesI2TRetrieval | Any2AnyRetrieval | text, image | eng |
| InfoSeekIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
| NIGHTSI2IRetrieval | Any2AnyRetrieval | image | eng |
| OVENIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
| RP2kI2IRetrieval | Any2AnyRetrieval | image | eng |
| VidoreDocVQARetrieval | DocumentUnderstanding | text, image | eng |
| VidoreInfoVQARetrieval | DocumentUnderstanding | text, image | eng |
| VidoreTabfquadRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreTatdqaRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreShiftProjectRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAAIRetrieval | DocumentUnderstanding | text, image | eng |
| VisualNewsI2TRetrieval | Any2AnyRetrieval | image, text | eng |
| VQA2IT2TRetrieval | Any2AnyRetrieval | text, image | eng |
| WebQAT2ITRetrieval | Any2AnyRetrieval | image, text | eng |
| WITT2IRetrieval | Any2AnyMultilingualRetrieval | text, image | ara, bul, dan, ell, eng, ... (11) |
| XM3600T2IRetrieval | Any2AnyMultilingualRetrieval | text, image | ara, ben, ces, dan, deu, ... (38) |
Citation
@inproceedings{xiao2025mieb,
author = {Xiao, Chenghao and Chung, Isaac and Kerboua, Imene and Stirling, Jamie and Zhang, Xin and Kardos, M\'arton and Solomatin, Roman and Al Moubayed, Noura and Enevoldsen, Kenneth and Muennighoff, Niklas},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
pages = {22187-22198},
title = {MIEB: Massive Image Embedding Benchmark},
year = {2025},
}
MINERSBitextMining¶
Multilingual bitext mining quality across diverse language pairs, drawn from the MINERS benchmark for evaluating semantic retrieval in multilingual settings.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| BUCC | BitextMining | text | cmn, deu, eng, fra, rus |
| LinceMTBitextMining | BitextMining | text | eng, hin |
| NollySentiBitextMining | BitextMining | text | eng, hau, ibo, pcm, yor |
| NusaXBitextMining | BitextMining | text | ace, ban, bbc, bjn, bug, ... (12) |
| NusaTranslationBitextMining | BitextMining | text | abs, bbc, bew, bhp, ind, ... (12) |
| PhincBitextMining | BitextMining | text | eng, hin |
| Tatoeba | BitextMining | text | afr, amh, ang, ara, arq, ... (113) |
Citation
@article{winata2024miners,
author = {Winata, Genta Indra and Zhang, Ruochen and Adelani, David Ifeoluwa},
journal = {arXiv preprint arXiv:2406.07424},
title = {MINERS: Multilingual Language Models as Semantic Retrievers},
year = {2024},
}
MTEB(Code, v1)¶
Code retrieval quality across a wide range of popular programming languages, covering code search, text-to-SQL, and code feedback tasks.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| AppsRetrieval | Retrieval | text | eng, python |
| CodeEditSearchRetrieval | Retrieval | text | c, c++, go, java, javascript, ... (13) |
| CodeFeedbackMT | Retrieval | text | eng |
| CodeFeedbackST | Retrieval | text | eng |
| CodeSearchNetCCRetrieval | Retrieval | text | go, java, javascript, php, python, ... (6) |
| CodeSearchNetRetrieval | Retrieval | text | go, java, javascript, php, python, ... (6) |
| CodeTransOceanContest | Retrieval | text | c++, python |
| CodeTransOceanDL | Retrieval | text | python |
| CosQA | Retrieval | text | eng, python |
| COIRCodeSearchNetRetrieval | Retrieval | text | go, java, javascript, php, python, ... (6) |
| StackOverflowQA | Retrieval | text | eng |
| SyntheticText2SQL | Retrieval | text | eng, sql |
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
doi = {10.48550/arXiv.2502.13595},
journal = {arXiv preprint arXiv:2502.13595},
publisher = {arXiv},
title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
url = {https://arxiv.org/abs/2502.13595},
year = {2025},
}
MTEB(Europe, v1)¶
Text embedding quality across European languages spanning bitext mining, classification, clustering, pair classification, retrieval, reranking, and semantic similarity.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| BornholmBitextMining | BitextMining | text | dan |
| BibleNLPBitextMining | BitextMining | text | aai, aak, aau, aaz, abt, ... (829) |
| BUCC.v2 | BitextMining | text | cmn, deu, eng, fra, rus |
| DiaBlaBitextMining | BitextMining | text | eng, fra |
| FloresBitextMining | BitextMining | text | ace, acm, acq, aeb, afr, ... (196) |
| NorwegianCourtsBitextMining | BitextMining | text | nno, nob |
| NTREXBitextMining | BitextMining | text | afr, amh, arb, aze, bak, ... (119) |
| BulgarianStoreReviewSentimentClassfication | Classification | text | bul |
| CzechProductReviewSentimentClassification | Classification | text | ces |
| GreekLegalCodeClassification | Classification | text | ell |
| DBpediaClassification | Classification | text | eng |
| FinancialPhrasebankClassification | Classification | text | eng |
| PoemSentimentClassification | Classification | text | eng |
| ToxicChatClassification | Classification | text | eng |
| ToxicConversationsClassification | Classification | text | eng |
| EstonianValenceClassification | Classification | text | est |
| ItaCaseholdClassification | Classification | text | ita |
| AmazonCounterfactualClassification | Classification | text | deu, eng, jpn |
| MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MultiHateClassification | Classification | text | ara, cmn, deu, eng, fra, ... (11) |
| ScalaClassification | Classification | text | dan, nno, nob, swe |
| SwissJudgementClassification | Classification | text | deu, fra, ita |
| TweetSentimentClassification | Classification | text | ara, deu, eng, fra, hin, ... (8) |
| CBD | Classification | text | pol |
| PolEmo2.0-OUT | Classification | text | pol |
| CSFDSKMovieReviewSentimentClassification | Classification | text | slk |
| DalajClassification | Classification | text | swe |
| WikiCitiesClustering | Clustering | text | eng |
| RomaniBibleClustering | Clustering | text | rom |
| BigPatentClustering.v2 | Clustering | text | eng |
| BiorxivClusteringP2P.v2 | Clustering | text | eng |
| AlloProfClusteringS2S.v2 | Clustering | text | fra |
| HALClusteringS2S.v2 | Clustering | text | fra |
| SIB200ClusteringS2S | Clustering | text | ace, acm, acq, aeb, afr, ... (197) |
| WikiClusteringP2P.v2 | Clustering | text | bos, cat, ces, dan, eus, ... (14) |
| StackOverflowQA | Retrieval | text | eng |
| TwitterHjerneRetrieval | Retrieval | text | dan |
| LegalQuAD | Retrieval | text | deu |
| ArguAna | Retrieval | text | eng |
| HagridRetrieval | Retrieval | text | eng |
| LegalBenchCorporateLobbying | Retrieval | text | eng |
| LEMBPasskeyRetrieval | Retrieval | text | eng |
| SCIDOCS | Retrieval | text | eng |
| SpartQA | Retrieval | text | eng |
| TempReasonL1 | Retrieval | text | eng |
| WinoGrande | Retrieval | text | eng |
| AlloprofRetrieval | Retrieval | text | fra |
| BelebeleRetrieval | Retrieval | text | acm, afr, als, amh, apc, ... (115) |
| StatcanDialogueDatasetRetrieval | Retrieval | text | eng, fra |
| WikipediaRetrievalMultilingual | Retrieval | text | ben, bul, ces, dan, deu, ... (16) |
| Core17InstructionRetrieval | InstructionReranking | text | eng |
| News21InstructionRetrieval | InstructionReranking | text | eng |
| Robust04InstructionRetrieval | InstructionReranking | text | eng |
| MalteseNewsClassification | MultilabelClassification | text | mlt |
| MultiEURLEXMultilabelClassification | MultilabelClassification | text | bul, ces, dan, deu, ell, ... (23) |
| CTKFactsNLI | PairClassification | text | ces |
| SprintDuplicateQuestions | PairClassification | text | eng |
| OpusparcusPC | PairClassification | text | deu, eng, fin, fra, rus, ... (6) |
| RTE3 | PairClassification | text | deu, eng, fra, ita |
| XNLI | PairClassification | text | ara, bul, deu, ell, eng, ... (14) |
| PSC | PairClassification | text | pol |
| WebLINXCandidatesReranking | Reranking | text | eng |
| AlloprofReranking | Reranking | text | fra |
| WikipediaRerankingMultilingual | Reranking | text | ben, bul, ces, dan, deu, ... (18) |
| SICK-R | STS | text | eng |
| STS12 | STS | text | eng |
| STS14 | STS | text | eng |
| STS15 | STS | text | eng |
| STSBenchmark | STS | text | eng |
| FinParaSTS | STS | text | fin |
| STS17 | STS | text | ara, deu, eng, fra, ita, ... (9) |
| SICK-R-PL | STS | text | pol |
| STSES | STS | text | spa |
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
doi = {10.48550/arXiv.2502.13595},
journal = {arXiv preprint arXiv:2502.13595},
publisher = {arXiv},
title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
url = {https://arxiv.org/abs/2502.13595},
year = {2025},
}
MTEB(Indic, v1)¶
Text embedding quality across Indic languages spanning bitext mining, classification, clustering, pair classification, retrieval, reranking, and semantic similarity.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| IN22ConvBitextMining | BitextMining | text | asm, ben, brx, doi, eng, ... (23) |
| IN22GenBitextMining | BitextMining | text | asm, ben, brx, doi, eng, ... (23) |
| SIB200ClusteringS2S | Clustering | text | ace, acm, acq, aeb, afr, ... (197) |
| BengaliSentimentAnalysis | Classification | text | ben |
| GujaratiNewsClassification | Classification | text | guj |
| HindiDiscourseClassification | Classification | text | hin |
| SentimentAnalysisHindi | Classification | text | hin |
| MalayalamNewsClassification | Classification | text | mal |
| MTOPIntentClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
| MultiHateClassification | Classification | text | ara, cmn, deu, eng, fra, ... (11) |
| TweetSentimentClassification | Classification | text | ara, deu, eng, fra, hin, ... (8) |
| NepaliNewsClassification | Classification | text | nep |
| PunjabiNewsClassification | Classification | text | pan |
| SanskritShlokasClassification | Classification | text | san |
| UrduRomanSentimentClassification | Classification | text | urd |
| XNLI | PairClassification | text | ara, bul, deu, ell, eng, ... (14) |
| BelebeleRetrieval | Retrieval | text | acm, afr, als, amh, apc, ... (115) |
| XQuADRetrieval | Retrieval | text | arb, deu, ell, eng, hin, ... (12) |
| WikipediaRerankingMultilingual | Reranking | text | ben, bul, ces, dan, deu, ... (18) |
| IndicCrosslingualSTS | STS | text | asm, ben, eng, guj, hin, ... (13) |
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
doi = {10.48550/arXiv.2502.13595},
journal = {arXiv preprint arXiv:2502.13595},
publisher = {arXiv},
title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
url = {https://arxiv.org/abs/2502.13595},
year = {2025},
}
MTEB(Law, v1)¶
Legal document retrieval across case documents, statutes, legal Q&A, and legal summarization in multiple languages.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| AILACasedocs | Retrieval | text | eng |
| AILAStatutes | Retrieval | text | eng |
| LegalSummarization | Retrieval | text | eng |
| GerDaLIRSmall | Retrieval | text | deu |
| LeCaRDv2 | Retrieval | text | zho |
| LegalBenchConsumerContractsQA | Retrieval | text | eng |
| LegalBenchCorporateLobbying | Retrieval | text | eng |
| LegalQuAD | Retrieval | text | deu |
MTEB(Medical, v1)¶
Medical information retrieval across clinical, biomedical, and consumer health domains, spanning retrieval, reranking, and clustering tasks.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| CUREv1 | Retrieval | text | eng, fra, spa |
| NFCorpus | Retrieval | text | eng |
| TRECCOVID | Retrieval | text | eng |
| TRECCOVID-PL | Retrieval | text | pol |
| SciFact | Retrieval | text | eng |
| SciFact-PL | Retrieval | text | pol |
| MedicalQARetrieval | Retrieval | text | eng |
| PublicHealthQA | Retrieval | text | ara, eng, fra, kor, rus, ... (8) |
| MedrxivClusteringP2P.v2 | Clustering | text | eng |
| MedrxivClusteringS2S.v2 | Clustering | text | eng |
| CmedqaRetrieval | Retrieval | text | cmn |
| CMedQAv2-reranking | Reranking | text | cmn |
MTEB(Multilingual, v1)¶
Multilingual text embedding quality across 250+ languages spanning bitext mining, classification, clustering, retrieval, reranking, and semantic similarity. Superseded by MTEB(Multilingual, v2) after SNLHierarchicalClustering was removed from Hugging Face Hub.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| BornholmBitextMining | BitextMining | text | dan |
| BibleNLPBitextMining | BitextMining | text | aai, aak, aau, aaz, abt, ... (829) |
| BUCC.v2 | BitextMining | text | cmn, deu, eng, fra, rus |
| DiaBlaBitextMining | BitextMining | text | eng, fra |
| FloresBitextMining | BitextMining | text | ace, acm, acq, aeb, afr, ... (196) |
| IN22GenBitextMining | BitextMining | text | asm, ben, brx, doi, eng, ... (23) |
| IndicGenBenchFloresBitextMining | BitextMining | text | asm, awa, ben, bgc, bho, ... (30) |
| NollySentiBitextMining | BitextMining | text | eng, hau, ibo, pcm, yor |
| NorwegianCourtsBitextMining | BitextMining | text | nno, nob |
| NTREXBitextMining | BitextMining | text | afr, amh, arb, aze, bak, ... (119) |
| NusaTranslationBitextMining | BitextMining | text | abs, bbc, bew, bhp, ind, ... (12) |
| NusaXBitextMining | BitextMining | text | ace, ban, bbc, bjn, bug, ... (12) |
| Tatoeba | BitextMining | text | afr, amh, ang, ara, arq, ... (113) |
| BulgarianStoreReviewSentimentClassfication | Classification | text | bul |
| CzechProductReviewSentimentClassification | Classification | text | ces |
| GreekLegalCodeClassification | Classification | text | ell |
| DBpediaClassification | Classification | text | eng |
| FinancialPhrasebankClassification | Classification | text | eng |
| PoemSentimentClassification | Classification | text | eng |
| ToxicConversationsClassification | Classification | text | eng |
| TweetTopicSingleClassification | Classification | text | eng |
| EstonianValenceClassification | Classification | text | est |
| FilipinoShopeeReviewsClassification | Classification | text | fil |
| GujaratiNewsClassification | Classification | text | guj |
| SentimentAnalysisHindi | Classification | text | hin |
| IndonesianIdClickbaitClassification | Classification | text | ind |
| ItaCaseholdClassification | Classification | text | ita |
| KorSarcasmClassification | Classification | text | kor |
| KurdishSentimentClassification | Classification | text | kur |
| MacedonianTweetSentimentClassification | Classification | text | mkd |
| AfriSentiClassification | Classification | text | amh, arq, ary, hau, ibo, ... (12) |
| AmazonCounterfactualClassification | Classification | text | deu, eng, jpn |
| CataloniaTweetClassification | Classification | text | cat, spa |
| CyrillicTurkicLangClassification | Classification | text | bak, chv, kaz, kir, krc, ... (9) |
| IndicLangClassification | Classification | text | asm, ben, brx, doi, gom, ... (22) |
| MasakhaNEWSClassification | Classification | text | amh, eng, fra, hau, ibo, ... (16) |
| MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MultiHateClassification | Classification | text | ara, cmn, deu, eng, fra, ... (11) |
| NordicLangClassification | Classification | text | dan, fao, isl, nno, nob, ... (6) |
| NusaParagraphEmotionClassification | Classification | text | bbc, bew, bug, jav, mad, ... (10) |
| NusaX-senti | Classification | text | ace, ban, bbc, bjn, bug, ... (12) |
| ScalaClassification | Classification | text | dan, nno, nob, swe |
| SwissJudgementClassification | Classification | text | deu, fra, ita |
| NepaliNewsClassification | Classification | text | nep |
| OdiaNewsClassification | Classification | text | ory |
| PunjabiNewsClassification | Classification | text | pan |
| PolEmo2.0-OUT | Classification | text | pol |
| PAC | Classification | text | pol |
| SinhalaNewsClassification | Classification | text | sin |
| CSFDSKMovieReviewSentimentClassification | Classification | text | slk |
| SiswatiNewsClassification | Classification | text | ssw |
| SlovakMovieReviewSentimentClassification | Classification | text | slk |
| SwahiliNewsClassification | Classification | text | swa |
| DalajClassification | Classification | text | swe |
| TswanaNewsClassification | Classification | text | tsn |
| IsiZuluNewsClassification | Classification | text | zul |
| WikiCitiesClustering | Clustering | text | eng |
| MasakhaNEWSClusteringS2S | Clustering | text | amh, eng, fra, hau, ibo, ... (16) |
| RomaniBibleClustering | Clustering | text | rom |
| ArXivHierarchicalClusteringP2P | Clustering | text | eng |
| ArXivHierarchicalClusteringS2S | Clustering | text | eng |
| BigPatentClustering.v2 | Clustering | text | eng |
| BiorxivClusteringP2P.v2 | Clustering | text | eng |
| MedrxivClusteringP2P.v2 | Clustering | text | eng |
| StackExchangeClustering.v2 | Clustering | text | eng |
| AlloProfClusteringS2S.v2 | Clustering | text | fra |
| HALClusteringS2S.v2 | Clustering | text | fra |
| SIB200ClusteringS2S | Clustering | text | ace, acm, acq, aeb, afr, ... (197) |
| WikiClusteringP2P.v2 | Clustering | text | bos, cat, ces, dan, eus, ... (14) |
| PlscClusteringP2P.v2 | Clustering | text | pol |
| SwednClusteringP2P | Clustering | text | swe |
| CLSClusteringP2P.v2 | Clustering | text | cmn |
| StackOverflowQA | Retrieval | text | eng |
| TwitterHjerneRetrieval | Retrieval | text | dan |
| AILAStatutes | Retrieval | text | eng |
| ArguAna | Retrieval | text | eng |
| HagridRetrieval | Retrieval | text | eng |
| LegalBenchCorporateLobbying | Retrieval | text | eng |
| LEMBPasskeyRetrieval | Retrieval | text | eng |
| SCIDOCS | Retrieval | text | eng |
| SpartQA | Retrieval | text | eng |
| TempReasonL1 | Retrieval | text | eng |
| TRECCOVID | Retrieval | text | eng |
| WinoGrande | Retrieval | text | eng |
| BelebeleRetrieval | Retrieval | text | acm, afr, als, amh, apc, ... (115) |
| MLQARetrieval | Retrieval | text | ara, deu, eng, hin, spa, ... (7) |
| StatcanDialogueDatasetRetrieval | Retrieval | text | eng, fra |
| WikipediaRetrievalMultilingual | Retrieval | text | ben, bul, ces, dan, deu, ... (16) |
| CovidRetrieval | Retrieval | text | cmn |
| Core17InstructionRetrieval | InstructionReranking | text | eng |
| News21InstructionRetrieval | InstructionReranking | text | eng |
| Robust04InstructionRetrieval | InstructionReranking | text | eng |
| KorHateSpeechMLClassification | MultilabelClassification | text | kor |
| MalteseNewsClassification | MultilabelClassification | text | mlt |
| MultiEURLEXMultilabelClassification | MultilabelClassification | text | bul, ces, dan, deu, ell, ... (23) |
| BrazilianToxicTweetsClassification | MultilabelClassification | text | por |
| CEDRClassification | MultilabelClassification | text | rus |
| CTKFactsNLI | PairClassification | text | ces |
| SprintDuplicateQuestions | PairClassification | text | eng |
| TwitterURLCorpus | PairClassification | text | eng |
| ArmenianParaphrasePC | PairClassification | text | hye |
| indonli | PairClassification | text | ind |
| OpusparcusPC | PairClassification | text | deu, eng, fin, fra, rus, ... (6) |
| PawsXPairClassification | PairClassification | text | cmn, deu, eng, fra, jpn, ... (7) |
| RTE3 | PairClassification | text | deu, eng, fra, ita |
| XNLI | PairClassification | text | ara, bul, deu, ell, eng, ... (14) |
| PpcPC | PairClassification | text | pol |
| TERRa | PairClassification | text | rus |
| WebLINXCandidatesReranking | Reranking | text | eng |
| AlloprofReranking | Reranking | text | fra |
| VoyageMMarcoReranking | Reranking | text | jpn |
| WikipediaRerankingMultilingual | Reranking | text | ben, bul, ces, dan, deu, ... (18) |
| RuBQReranking | Reranking | text | rus |
| T2Reranking | Reranking | text | cmn |
| GermanSTSBenchmark | STS | text | deu |
| SICK-R | STS | text | eng |
| STS12 | STS | text | eng |
| STS13 | STS | text | eng |
| STS14 | STS | text | eng |
| STS15 | STS | text | eng |
| STSBenchmark | STS | text | eng |
| FaroeseSTS | STS | text | fao |
| FinParaSTS | STS | text | fin |
| JSICK | STS | text | jpn |
| IndicCrosslingualSTS | STS | text | asm, ben, eng, guj, hin, ... (13) |
| SemRel24STS | STS | text | afr, amh, arb, arq, ary, ... (12) |
| STS17 | STS | text | ara, deu, eng, fra, ita, ... (9) |
| STS22.v2 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
| STSES | STS | text | spa |
| STSB | STS | text | cmn |
| MIRACLRetrievalHardNegatives | Retrieval | text | ara, ben, deu, eng, fas, ... (18) |
| SNLHierarchicalClusteringP2P | Clustering | text | nob |
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
doi = {10.48550/arXiv.2502.13595},
journal = {arXiv preprint arXiv:2502.13595},
publisher = {arXiv},
title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
url = {https://arxiv.org/abs/2502.13595},
year = {2025},
}
MTEB(Multilingual, v2)¶
MMTEB measures multilingual text embedding quality across 250+ languages spanning classification, clustering, retrieval semantic similarity and more, driven by curated community contributions.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| BornholmBitextMining | BitextMining | text | dan |
| BibleNLPBitextMining | BitextMining | text | aai, aak, aau, aaz, abt, ... (829) |
| BUCC.v2 | BitextMining | text | cmn, deu, eng, fra, rus |
| DiaBlaBitextMining | BitextMining | text | eng, fra |
| FloresBitextMining | BitextMining | text | ace, acm, acq, aeb, afr, ... (196) |
| IN22GenBitextMining | BitextMining | text | asm, ben, brx, doi, eng, ... (23) |
| IndicGenBenchFloresBitextMining | BitextMining | text | asm, awa, ben, bgc, bho, ... (30) |
| NollySentiBitextMining | BitextMining | text | eng, hau, ibo, pcm, yor |
| NorwegianCourtsBitextMining | BitextMining | text | nno, nob |
| NTREXBitextMining | BitextMining | text | afr, amh, arb, aze, bak, ... (119) |
| NusaTranslationBitextMining | BitextMining | text | abs, bbc, bew, bhp, ind, ... (12) |
| NusaXBitextMining | BitextMining | text | ace, ban, bbc, bjn, bug, ... (12) |
| Tatoeba | BitextMining | text | afr, amh, ang, ara, arq, ... (113) |
| BulgarianStoreReviewSentimentClassfication | Classification | text | bul |
| CzechProductReviewSentimentClassification | Classification | text | ces |
| GreekLegalCodeClassification | Classification | text | ell |
| DBpediaClassification | Classification | text | eng |
| FinancialPhrasebankClassification | Classification | text | eng |
| PoemSentimentClassification | Classification | text | eng |
| ToxicConversationsClassification | Classification | text | eng |
| TweetTopicSingleClassification | Classification | text | eng |
| EstonianValenceClassification | Classification | text | est |
| FilipinoShopeeReviewsClassification | Classification | text | fil |
| GujaratiNewsClassification | Classification | text | guj |
| SentimentAnalysisHindi | Classification | text | hin |
| IndonesianIdClickbaitClassification | Classification | text | ind |
| ItaCaseholdClassification | Classification | text | ita |
| KorSarcasmClassification | Classification | text | kor |
| KurdishSentimentClassification | Classification | text | kur |
| MacedonianTweetSentimentClassification | Classification | text | mkd |
| AfriSentiClassification | Classification | text | amh, arq, ary, hau, ibo, ... (12) |
| AmazonCounterfactualClassification | Classification | text | deu, eng, jpn |
| CataloniaTweetClassification | Classification | text | cat, spa |
| CyrillicTurkicLangClassification | Classification | text | bak, chv, kaz, kir, krc, ... (9) |
| IndicLangClassification | Classification | text | asm, ben, brx, doi, gom, ... (22) |
| MasakhaNEWSClassification | Classification | text | amh, eng, fra, hau, ibo, ... (16) |
| MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MultiHateClassification | Classification | text | ara, cmn, deu, eng, fra, ... (11) |
| NordicLangClassification | Classification | text | dan, fao, isl, nno, nob, ... (6) |
| NusaParagraphEmotionClassification | Classification | text | bbc, bew, bug, jav, mad, ... (10) |
| NusaX-senti | Classification | text | ace, ban, bbc, bjn, bug, ... (12) |
| ScalaClassification | Classification | text | dan, nno, nob, swe |
| SwissJudgementClassification | Classification | text | deu, fra, ita |
| NepaliNewsClassification | Classification | text | nep |
| OdiaNewsClassification | Classification | text | ory |
| PunjabiNewsClassification | Classification | text | pan |
| PolEmo2.0-OUT | Classification | text | pol |
| PAC | Classification | text | pol |
| SinhalaNewsClassification | Classification | text | sin |
| CSFDSKMovieReviewSentimentClassification | Classification | text | slk |
| SiswatiNewsClassification | Classification | text | ssw |
| SlovakMovieReviewSentimentClassification | Classification | text | slk |
| SwahiliNewsClassification | Classification | text | swa |
| DalajClassification | Classification | text | swe |
| TswanaNewsClassification | Classification | text | tsn |
| IsiZuluNewsClassification | Classification | text | zul |
| WikiCitiesClustering | Clustering | text | eng |
| MasakhaNEWSClusteringS2S | Clustering | text | amh, eng, fra, hau, ibo, ... (16) |
| RomaniBibleClustering | Clustering | text | rom |
| ArXivHierarchicalClusteringP2P | Clustering | text | eng |
| ArXivHierarchicalClusteringS2S | Clustering | text | eng |
| BigPatentClustering.v2 | Clustering | text | eng |
| BiorxivClusteringP2P.v2 | Clustering | text | eng |
| MedrxivClusteringP2P.v2 | Clustering | text | eng |
| StackExchangeClustering.v2 | Clustering | text | eng |
| AlloProfClusteringS2S.v2 | Clustering | text | fra |
| HALClusteringS2S.v2 | Clustering | text | fra |
| SIB200ClusteringS2S | Clustering | text | ace, acm, acq, aeb, afr, ... (197) |
| WikiClusteringP2P.v2 | Clustering | text | bos, cat, ces, dan, eus, ... (14) |
| PlscClusteringP2P.v2 | Clustering | text | pol |
| SwednClusteringP2P | Clustering | text | swe |
| CLSClusteringP2P.v2 | Clustering | text | cmn |
| StackOverflowQA | Retrieval | text | eng |
| TwitterHjerneRetrieval | Retrieval | text | dan |
| AILAStatutes | Retrieval | text | eng |
| ArguAna | Retrieval | text | eng |
| HagridRetrieval | Retrieval | text | eng |
| LegalBenchCorporateLobbying | Retrieval | text | eng |
| LEMBPasskeyRetrieval | Retrieval | text | eng |
| SCIDOCS | Retrieval | text | eng |
| SpartQA | Retrieval | text | eng |
| TempReasonL1 | Retrieval | text | eng |
| TRECCOVID | Retrieval | text | eng |
| WinoGrande | Retrieval | text | eng |
| BelebeleRetrieval | Retrieval | text | acm, afr, als, amh, apc, ... (115) |
| MLQARetrieval | Retrieval | text | ara, deu, eng, hin, spa, ... (7) |
| StatcanDialogueDatasetRetrieval | Retrieval | text | eng, fra |
| WikipediaRetrievalMultilingual | Retrieval | text | ben, bul, ces, dan, deu, ... (16) |
| CovidRetrieval | Retrieval | text | cmn |
| Core17InstructionRetrieval | InstructionReranking | text | eng |
| News21InstructionRetrieval | InstructionReranking | text | eng |
| Robust04InstructionRetrieval | InstructionReranking | text | eng |
| KorHateSpeechMLClassification | MultilabelClassification | text | kor |
| MalteseNewsClassification | MultilabelClassification | text | mlt |
| MultiEURLEXMultilabelClassification | MultilabelClassification | text | bul, ces, dan, deu, ell, ... (23) |
| BrazilianToxicTweetsClassification | MultilabelClassification | text | por |
| CEDRClassification | MultilabelClassification | text | rus |
| CTKFactsNLI | PairClassification | text | ces |
| SprintDuplicateQuestions | PairClassification | text | eng |
| TwitterURLCorpus | PairClassification | text | eng |
| ArmenianParaphrasePC | PairClassification | text | hye |
| indonli | PairClassification | text | ind |
| OpusparcusPC | PairClassification | text | deu, eng, fin, fra, rus, ... (6) |
| PawsXPairClassification | PairClassification | text | cmn, deu, eng, fra, jpn, ... (7) |
| RTE3 | PairClassification | text | deu, eng, fra, ita |
| XNLI | PairClassification | text | ara, bul, deu, ell, eng, ... (14) |
| PpcPC | PairClassification | text | pol |
| TERRa | PairClassification | text | rus |
| WebLINXCandidatesReranking | Reranking | text | eng |
| AlloprofReranking | Reranking | text | fra |
| VoyageMMarcoReranking | Reranking | text | jpn |
| WikipediaRerankingMultilingual | Reranking | text | ben, bul, ces, dan, deu, ... (18) |
| RuBQReranking | Reranking | text | rus |
| T2Reranking | Reranking | text | cmn |
| GermanSTSBenchmark | STS | text | deu |
| SICK-R | STS | text | eng |
| STS12 | STS | text | eng |
| STS13 | STS | text | eng |
| STS14 | STS | text | eng |
| STS15 | STS | text | eng |
| STSBenchmark | STS | text | eng |
| FaroeseSTS | STS | text | fao |
| FinParaSTS | STS | text | fin |
| JSICK | STS | text | jpn |
| IndicCrosslingualSTS | STS | text | asm, ben, eng, guj, hin, ... (13) |
| SemRel24STS | STS | text | afr, amh, arb, arq, ary, ... (12) |
| STS17 | STS | text | ara, deu, eng, fra, ita, ... (9) |
| STS22.v2 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
| STSES | STS | text | spa |
| STSB | STS | text | cmn |
| MIRACLRetrievalHardNegatives | Retrieval | text | ara, ben, deu, eng, fas, ... (18) |
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
doi = {10.48550/arXiv.2502.13595},
journal = {arXiv preprint arXiv:2502.13595},
publisher = {arXiv},
title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
url = {https://arxiv.org/abs/2502.13595},
year = {2025},
}
MTEB(Scandinavian, v1)¶
Scandinavian text embedding quality covering Danish, Swedish, Norwegian Bokmål, and Nynorsk and spanning classification, clustering, retrieval as well as bitext tasks across dialects or written forms.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| BornholmBitextMining | BitextMining | text | dan |
| NorwegianCourtsBitextMining | BitextMining | text | nno, nob |
| AngryTweetsClassification | Classification | text | dan |
| DanishPoliticalCommentsClassification | Classification | text | dan |
| DalajClassification | Classification | text | swe |
| DKHateClassification | Classification | text | dan |
| LccSentimentClassification | Classification | text | dan |
| MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| NordicLangClassification | Classification | text | dan, fao, isl, nno, nob, ... (6) |
| NoRecClassification | Classification | text | nob |
| NorwegianParliamentClassification | Classification | text | nob |
| ScalaClassification | Classification | text | dan, nno, nob, swe |
| SwedishSentimentClassification | Classification | text | swe |
| SweRecClassification | Classification | text | swe |
| DanFeverRetrieval | Retrieval | text | dan |
| NorQuadRetrieval | Retrieval | text | nob |
| SNLRetrieval | Retrieval | text | nob |
| SwednRetrieval | Retrieval | text | swe |
| SweFaqRetrieval | Retrieval | text | swe |
| TV2Nordretrieval | Retrieval | text | dan |
| TwitterHjerneRetrieval | Retrieval | text | dan |
| SNLHierarchicalClusteringS2S | Clustering | text | nob |
| SNLHierarchicalClusteringP2P | Clustering | text | nob |
| SwednClusteringP2P | Clustering | text | swe |
| SwednClusteringS2S | Clustering | text | swe |
| VGHierarchicalClusteringS2S | Clustering | text | nob |
| VGHierarchicalClusteringP2P | Clustering | text | nob |
Citation
@article{enevoldsenScandinavianEmbeddingBenchmarks2024,
author = {Enevoldsen, Kenneth and Kardos, Márton and Muennighoff, Niklas and Nielbo, Kristoffer},
language = {en},
month = feb,
shorttitle = {The {Scandinavian} {Embedding} {Benchmarks}},
title = {The {Scandinavian} {Embedding} {Benchmarks}: {Comprehensive} {Assessment} of {Multilingual} and {Monolingual} {Text} {Embedding}},
url = {https://openreview.net/forum?id=pJl_i7HIA72},
urldate = {2024-04-12},
year = {2024},
}
MTEB(cmn, v1)¶
Chinese text embedding quality across retrieval, reranking, pair classification, clustering, classification, and semantic similarity.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| T2Retrieval | Retrieval | text | cmn |
| MMarcoRetrieval | Retrieval | text | cmn |
| DuRetrieval | Retrieval | text | cmn |
| CovidRetrieval | Retrieval | text | cmn |
| CmedqaRetrieval | Retrieval | text | cmn |
| EcomRetrieval | Retrieval | text | cmn |
| MedicalRetrieval | Retrieval | text | cmn |
| VideoRetrieval | Retrieval | text | cmn |
| T2Reranking | Reranking | text | cmn |
| MMarcoReranking | Reranking | text | cmn |
| CMedQAv1-reranking | Reranking | text | cmn |
| CMedQAv2-reranking | Reranking | text | cmn |
| Ocnli | PairClassification | text | cmn |
| Cmnli | PairClassification | text | cmn |
| CLSClusteringS2S | Clustering | text | cmn |
| CLSClusteringP2P | Clustering | text | cmn |
| ThuNewsClusteringS2S | Clustering | text | cmn |
| ThuNewsClusteringP2P | Clustering | text | cmn |
| LCQMC | STS | text | cmn |
| PAWSX | STS | text | cmn |
| AFQMC | STS | text | cmn |
| QBQTC | STS | text | cmn |
| TNews | Classification | text | cmn |
| IFlyTek | Classification | text | cmn |
| Waimai | Classification | text | cmn |
| OnlineShopping | Classification | text | cmn |
| JDReview | Classification | text | cmn |
| MultilingualSentiment | Classification | text | cmn |
| ATEC | STS | text | cmn |
| BQ | STS | text | cmn |
| STSB | STS | text | cmn |
Citation
@misc{xiao2024cpackpackagedresourcesadvance,
archiveprefix = {arXiv},
author = {Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff and Defu Lian and Jian-Yun Nie},
eprint = {2309.07597},
primaryclass = {cs.CL},
title = {C-Pack: Packaged Resources To Advance General Chinese Embedding},
url = {https://arxiv.org/abs/2309.07597},
year = {2024},
}
MTEB(deu, v1)¶
German text embedding quality across classification, clustering, pair classification, reranking, retrieval, and semantic similarity.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| AmazonCounterfactualClassification | Classification | text | deu, eng, jpn |
| AmazonReviewsClassification | Classification | text | cmn, deu, eng, fra, jpn, ... (6) |
| MTOPDomainClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
| MTOPIntentClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
| MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| BlurbsClusteringP2P | Clustering | text | deu |
| BlurbsClusteringS2S | Clustering | text | deu |
| TenKGnadClusteringP2P | Clustering | text | deu |
| TenKGnadClusteringS2S | Clustering | text | deu |
| FalseFriendsGermanEnglish | PairClassification | text | deu |
| PawsXPairClassification | PairClassification | text | cmn, deu, eng, fra, jpn, ... (7) |
| MIRACLReranking | Reranking | text | ara, ben, deu, eng, fas, ... (18) |
| GermanQuAD-Retrieval | Retrieval | text | deu |
| GermanDPR | Retrieval | text | deu |
| XMarket | Retrieval | text | deu, eng, spa |
| GerDaLIR | Retrieval | text | deu |
| GermanSTSBenchmark | STS | text | deu |
| STS22 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
Citation
@misc{wehrli2024germantextembeddingclustering,
archiveprefix = {arXiv},
author = {Silvan Wehrli and Bert Arnrich and Christopher Irrgang},
eprint = {2401.02709},
primaryclass = {cs.CL},
title = {German Text Embedding Clustering Benchmark},
url = {https://arxiv.org/abs/2401.02709},
year = {2024},
}
MTEB(eng, v1)¶
English text embedding quality across classification, clustering, retrieval, reranking, pair classification, and semantic similarity. We recommend using MTEB(eng, v2) instead, which resolves a known scoring bug, uses updated task versions, and removes common fine-tuning datasets such as MSMARCO for more comparable scores.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| AmazonPolarityClassification | Classification | text | eng |
| AmazonReviewsClassification | Classification | text | cmn, deu, eng, fra, jpn, ... (6) |
| ArguAna | Retrieval | text | eng |
| ArxivClusteringP2P | Clustering | text | eng |
| ArxivClusteringS2S | Clustering | text | eng |
| AskUbuntuDupQuestions | Reranking | text | eng |
| BIOSSES | STS | text | eng |
| Banking77Classification | Classification | text | eng |
| BiorxivClusteringP2P | Clustering | text | eng |
| BiorxivClusteringS2S | Clustering | text | eng |
| CQADupstackRetrieval | Retrieval | text | eng |
| ClimateFEVER | Retrieval | text | eng |
| DBPedia | Retrieval | text | eng |
| EmotionClassification | Classification | text | eng |
| FEVER | Retrieval | text | eng |
| FiQA2018 | Retrieval | text | eng |
| HotpotQA | Retrieval | text | eng |
| ImdbClassification | Classification | text | eng |
| MTOPDomainClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
| MTOPIntentClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
| MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MedrxivClusteringP2P | Clustering | text | eng |
| MedrxivClusteringS2S | Clustering | text | eng |
| MindSmallReranking | Reranking | text | eng |
| NFCorpus | Retrieval | text | eng |
| NQ | Retrieval | text | eng |
| QuoraRetrieval | Retrieval | text | eng |
| RedditClustering | Clustering | text | eng |
| RedditClusteringP2P | Clustering | text | eng |
| SCIDOCS | Retrieval | text | eng |
| SICK-R | STS | text | eng |
| STS12 | STS | text | eng |
| STS13 | STS | text | eng |
| STS14 | STS | text | eng |
| STS15 | STS | text | eng |
| STS16 | STS | text | eng |
| STSBenchmark | STS | text | eng |
| SciDocsRR | Reranking | text | eng |
| SciFact | Retrieval | text | eng |
| SprintDuplicateQuestions | PairClassification | text | eng |
| StackExchangeClustering | Clustering | text | eng |
| StackExchangeClusteringP2P | Clustering | text | eng |
| StackOverflowDupQuestions | Reranking | text | eng |
| SummEval | Summarization | text | eng |
| TRECCOVID | Retrieval | text | eng |
| Touche2020 | Retrieval | text | eng |
| ToxicConversationsClassification | Classification | text | eng |
| TweetSentimentExtractionClassification | Classification | text | eng |
| TwentyNewsgroupsClustering | Clustering | text | eng |
| TwitterSemEval2015 | PairClassification | text | eng |
| TwitterURLCorpus | PairClassification | text | eng |
| MSMARCO | Retrieval | text | eng |
| AmazonCounterfactualClassification | Classification | text | deu, eng, jpn |
| STS17 | STS | text | ara, deu, eng, fra, ita, ... (9) |
| STS22 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
Citation
@article{muennighoff2022mteb,
author = {Muennighoff, Niklas and Tazi, Nouamane and Magne, Loïc and Reimers, Nils},
doi = {10.48550/ARXIV.2210.07316},
journal = {arXiv preprint arXiv:2210.07316},
publisher = {arXiv},
title = {MTEB: Massive Text Embedding Benchmark},
url = {https://arxiv.org/abs/2210.07316},
year = {2022},
}
MTEB(eng, v2)¶
English text embedding quality across classification, clustering, retrieval, reranking, pair classification, and semantic similarity, prioritizing tasks not commonly used for fine-tuning to give a more realistic estimate of generalization performance. The original v1 leaderboard is available under MTEB(eng, v1).
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| ArguAna | Retrieval | text | eng |
| ArXivHierarchicalClusteringP2P | Clustering | text | eng |
| ArXivHierarchicalClusteringS2S | Clustering | text | eng |
| AskUbuntuDupQuestions | Reranking | text | eng |
| BIOSSES | STS | text | eng |
| Banking77Classification | Classification | text | eng |
| BiorxivClusteringP2P.v2 | Clustering | text | eng |
| CQADupstackGamingRetrieval | Retrieval | text | eng |
| CQADupstackUnixRetrieval | Retrieval | text | eng |
| ClimateFEVERHardNegatives | Retrieval | text | eng |
| FEVERHardNegatives | Retrieval | text | eng |
| FiQA2018 | Retrieval | text | eng |
| HotpotQAHardNegatives | Retrieval | text | eng |
| ImdbClassification | Classification | text | eng |
| MTOPDomainClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
| MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MedrxivClusteringP2P.v2 | Clustering | text | eng |
| MedrxivClusteringS2S.v2 | Clustering | text | eng |
| MindSmallReranking | Reranking | text | eng |
| SCIDOCS | Retrieval | text | eng |
| SICK-R | STS | text | eng |
| STS12 | STS | text | eng |
| STS13 | STS | text | eng |
| STS14 | STS | text | eng |
| STS15 | STS | text | eng |
| STSBenchmark | STS | text | eng |
| SprintDuplicateQuestions | PairClassification | text | eng |
| StackExchangeClustering.v2 | Clustering | text | eng |
| StackExchangeClusteringP2P.v2 | Clustering | text | eng |
| TRECCOVID | Retrieval | text | eng |
| Touche2020Retrieval.v3 | Retrieval | text | eng |
| ToxicConversationsClassification | Classification | text | eng |
| TweetSentimentExtractionClassification | Classification | text | eng |
| TwentyNewsgroupsClustering.v2 | Clustering | text | eng |
| TwitterSemEval2015 | PairClassification | text | eng |
| TwitterURLCorpus | PairClassification | text | eng |
| SummEvalSummarization.v2 | Summarization | text | eng |
| AmazonCounterfactualClassification | Classification | text | deu, eng, jpn |
| STS17 | STS | text | ara, deu, eng, fra, ita, ... (9) |
| STS22.v2 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
doi = {10.48550/arXiv.2502.13595},
journal = {arXiv preprint arXiv:2502.13595},
publisher = {arXiv},
title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
url = {https://arxiv.org/abs/2502.13595},
year = {2025},
}
MTEB(fas, v1)¶
Persian text embedding quality across classification, clustering, pair classification, reranking, retrieval, semantic similarity, and summarization retrieval.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| PersianFoodSentimentClassification | Classification | text | fas |
| SynPerChatbotConvSAClassification | Classification | text | fas |
| SynPerChatbotConvSAToneChatbotClassification | Classification | text | fas |
| SynPerChatbotConvSAToneUserClassification | Classification | text | fas |
| SynPerChatbotSatisfactionLevelClassification | Classification | text | fas |
| SynPerChatbotRAGToneChatbotClassification | Classification | text | fas |
| SynPerChatbotRAGToneUserClassification | Classification | text | fas |
| SynPerChatbotToneChatbotClassification | Classification | text | fas |
| SynPerChatbotToneUserClassification | Classification | text | fas |
| SynPerTextToneClassification | Classification | text | fas |
| SIDClassification | Classification | text | fas |
| DeepSentiPers | Classification | text | fas |
| PersianTextEmotion | Classification | text | fas |
| SentimentDKSF | Classification | text | fas |
| NLPTwitterAnalysisClassification | Classification | text | fas |
| DigikalamagClassification | Classification | text | fas |
| MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| BeytooteClustering | Clustering | text | fas |
| DigikalamagClustering | Clustering | text | fas |
| HamshahriClustring | Clustering | text | fas |
| NLPTwitterAnalysisClustering | Clustering | text | fas |
| SIDClustring | Clustering | text | fas |
| FarsTail | PairClassification | text | fas |
| CExaPPC | PairClassification | text | fas |
| SynPerChatbotRAGFAQPC | PairClassification | text | fas |
| FarsiParaphraseDetection | PairClassification | text | fas |
| SynPerTextKeywordsPC | PairClassification | text | fas |
| SynPerQAPC | PairClassification | text | fas |
| ParsinluEntail | PairClassification | text | fas |
| ParsinluQueryParaphPC | PairClassification | text | fas |
| MIRACLReranking | Reranking | text | ara, ben, deu, eng, fas, ... (18) |
| WikipediaRerankingMultilingual | Reranking | text | ben, bul, ces, dan, deu, ... (18) |
| SynPerQARetrieval | Retrieval | text | fas |
| SynPerChatbotTopicsRetrieval | Retrieval | text | fas |
| SynPerChatbotRAGTopicsRetrieval | Retrieval | text | fas |
| SynPerChatbotRAGFAQRetrieval | Retrieval | text | fas |
| PersianWebDocumentRetrieval | Retrieval | text | fas |
| WikipediaRetrievalMultilingual | Retrieval | text | ben, bul, ces, dan, deu, ... (16) |
| MIRACLRetrieval | Retrieval | text | ara, ben, deu, eng, fas, ... (18) |
| ClimateFEVER-Fa | Retrieval | text | fas |
| DBPedia-Fa | Retrieval | text | fas |
| HotpotQA-Fa | Retrieval | text | fas |
| MSMARCO-Fa | Retrieval | text | fas |
| NQ-Fa | Retrieval | text | fas |
| ArguAna-Fa | Retrieval | text | fas |
| CQADupstackRetrieval-Fa | Retrieval | text | fas |
| FiQA2018-Fa | Retrieval | text | fas |
| NFCorpus-Fa | Retrieval | text | fas |
| QuoraRetrieval-Fa | Retrieval | text | fas |
| SCIDOCS-Fa | Retrieval | text | fas |
| SciFact-Fa | Retrieval | text | fas |
| TRECCOVID-Fa | Retrieval | text | fas |
| Touche2020-Fa | Retrieval | text | fas |
| Farsick | STS | text | fas |
| SynPerSTS | STS | text | fas |
| Query2Query | STS | text | fas |
| SAMSumFa | BitextMining | text | fas |
| SynPerChatbotSumSRetrieval | BitextMining | text | fas |
| SynPerChatbotRAGSumSRetrieval | BitextMining | text | fas |
Citation
@article{zinvandi2025famteb,
author = {Zinvandi, Erfan and Alikhani, Morteza and Sarmadi, Mehran and Pourbahman, Zahra and Arvin, Sepehr and Kazemi, Reza and Amini, Arash},
journal = {arXiv preprint arXiv:2502.11571},
title = {Famteb: Massive text embedding benchmark in persian language},
year = {2025},
}
MTEB(fas, v2)¶
Persian text embedding quality across classification, clustering, pair classification, reranking, retrieval, semantic similarity, and summarization retrieval. In v2, large datasets were optimized for accessibility, low-quality datasets were removed, and higher-quality data was added; see the main PR for details.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| PersianFoodSentimentClassification | Classification | text | fas |
| SynPerChatbotConvSAClassification | Classification | text | fas |
| SynPerChatbotConvSAToneChatbotClassification | Classification | text | fas |
| SynPerChatbotConvSAToneUserClassification | Classification | text | fas |
| SynPerChatbotSatisfactionLevelClassification | Classification | text | fas |
| SynPerTextToneClassification.v3 | Classification | text | fas |
| SIDClassification.v2 | Classification | text | fas |
| DeepSentiPers.v2 | Classification | text | fas |
| PersianTextEmotion.v2 | Classification | text | fas |
| NLPTwitterAnalysisClassification.v2 | Classification | text | fas |
| DigikalamagClassification | Classification | text | fas |
| MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| StyleClassification | Classification | text | fas |
| PerShopDomainClassification | Classification | text | fas |
| PerShopIntentClassification | Classification | text | fas |
| BeytooteClustering | Clustering | text | fas |
| DigikalamagClustering | Clustering | text | fas |
| HamshahriClustring | Clustering | text | fas |
| NLPTwitterAnalysisClustering | Clustering | text | fas |
| SIDClustring | Clustering | text | fas |
| FarsTail | PairClassification | text | fas |
| SynPerChatbotRAGFAQPC | PairClassification | text | fas |
| FarsiParaphraseDetection | PairClassification | text | fas |
| SynPerTextKeywordsPC | PairClassification | text | fas |
| SynPerQAPC | PairClassification | text | fas |
| ParsinluEntail | PairClassification | text | fas |
| ParsinluQueryParaphPC | PairClassification | text | fas |
| MIRACLReranking | Reranking | text | ara, ben, deu, eng, fas, ... (18) |
| WikipediaRerankingMultilingual | Reranking | text | ben, bul, ces, dan, deu, ... (18) |
| SynPerQARetrieval | Retrieval | text | fas |
| SynPerChatbotRAGFAQRetrieval | Retrieval | text | fas |
| PersianWebDocumentRetrieval | Retrieval | text | fas |
| WikipediaRetrievalMultilingual | Retrieval | text | ben, bul, ces, dan, deu, ... (16) |
| MIRACLRetrievalHardNegatives | Retrieval | text | ara, ben, deu, eng, fas, ... (18) |
| HotpotQA-FaHardNegatives | Retrieval | text | fas |
| MSMARCO-FaHardNegatives | Retrieval | text | fas |
| NQ-FaHardNegatives | Retrieval | text | fas |
| ArguAna-Fa.v2 | Retrieval | text | fas |
| FiQA2018-Fa.v2 | Retrieval | text | fas |
| QuoraRetrieval-Fa.v2 | Retrieval | text | fas |
| SCIDOCS-Fa.v2 | Retrieval | text | fas |
| SciFact-Fa.v2 | Retrieval | text | fas |
| TRECCOVID-Fa.v2 | Retrieval | text | fas |
| FEVER-FaHardNegatives | Retrieval | text | fas |
| NeuCLIR2023RetrievalHardNegatives | Retrieval | text | fas, rus, zho |
| WebFAQRetrieval | Retrieval | text | ara, aze, ben, bul, cat, ... (51) |
| Farsick | STS | text | fas |
| SynPerSTS | STS | text | fas |
| SAMSumFa | BitextMining | text | fas |
| SynPerChatbotSumSRetrieval | BitextMining | text | fas |
| SynPerChatbotRAGSumSRetrieval | BitextMining | text | fas |
Citation
@article{zinvandi2025famteb,
author = {Zinvandi, Erfan and Alikhani, Morteza and Sarmadi, Mehran and Pourbahman, Zahra and Arvin, Sepehr and Kazemi, Reza and Amini, Arash},
journal = {arXiv preprint arXiv:2502.11571},
title = {Famteb: Massive text embedding benchmark in persian language},
year = {2025},
}
MTEB(fra, v1)¶
French text embedding quality across classification, clustering, pair classification, reranking, retrieval, and semantic similarity, using high-quality native French datasets.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| AmazonReviewsClassification | Classification | text | cmn, deu, eng, fra, jpn, ... (6) |
| MasakhaNEWSClassification | Classification | text | amh, eng, fra, hau, ibo, ... (16) |
| MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MTOPDomainClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
| MTOPIntentClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
| AlloProfClusteringP2P | Clustering | text | fra |
| AlloProfClusteringS2S | Clustering | text | fra |
| HALClusteringS2S | Clustering | text | fra |
| MasakhaNEWSClusteringP2P | Clustering | text | amh, eng, fra, hau, ibo, ... (16) |
| MasakhaNEWSClusteringS2S | Clustering | text | amh, eng, fra, hau, ibo, ... (16) |
| MLSUMClusteringP2P | Clustering | text | deu, fra, rus, spa |
| MLSUMClusteringS2S | Clustering | text | deu, fra, rus, spa |
| PawsXPairClassification | PairClassification | text | cmn, deu, eng, fra, jpn, ... (7) |
| AlloprofReranking | Reranking | text | fra |
| SyntecReranking | Reranking | text | fra |
| AlloprofRetrieval | Retrieval | text | fra |
| BSARDRetrieval | Retrieval | text | fra |
| MintakaRetrieval | Retrieval | text | ara, deu, fra, hin, ita, ... (8) |
| SyntecRetrieval | Retrieval | text | fra |
| XPQARetrieval | Retrieval | text | ara, cmn, deu, eng, fra, ... (13) |
| SICKFr | STS | text | fra |
| STSBenchmarkMultilingualSTS | STS | text | cmn, deu, eng, fra, ita, ... (10) |
| SummEvalFr | Summarization | text | fra |
| STS22 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
Citation
@misc{ciancone2024mtebfrenchresourcesfrenchsentence,
archiveprefix = {arXiv},
author = {Mathieu Ciancone and Imene Kerboua and Marion Schaeffer and Wissam Siblini},
eprint = {2405.20468},
primaryclass = {cs.CL},
title = {MTEB-French: Resources for French Sentence Embedding Evaluation and Analysis},
url = {https://arxiv.org/abs/2405.20468},
year = {2024},
}
MTEB(jpn, v1)¶
Japanese text embedding quality across clustering, classification, semantic similarity, pair classification, retrieval, and reranking.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| LivedoorNewsClustering.v2 | Clustering | text | jpn |
| MewsC16JaClustering | Clustering | text | jpn |
| AmazonReviewsClassification | Classification | text | cmn, deu, eng, fra, jpn, ... (6) |
| AmazonCounterfactualClassification | Classification | text | deu, eng, jpn |
| MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| JSTS | STS | text | jpn |
| JSICK | STS | text | jpn |
| PawsXPairClassification | PairClassification | text | cmn, deu, eng, fra, jpn, ... (7) |
| JaqketRetrieval | Retrieval | text | jpn |
| MrTidyRetrieval | Retrieval | text | ara, ben, eng, fin, ind, ... (11) |
| JaGovFaqsRetrieval | Retrieval | text | jpn |
| NLPJournalTitleAbsRetrieval | Retrieval | text | jpn |
| NLPJournalAbsIntroRetrieval | Retrieval | text | jpn |
| NLPJournalTitleIntroRetrieval | Retrieval | text | jpn |
| ESCIReranking | Reranking | text | eng, jpn, spa |
MTEB(kor, v1)¶
Korean text embedding quality across classification, reranking, retrieval, and semantic similarity.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| KLUE-TC | Classification | text | kor |
| MIRACLReranking | Reranking | text | ara, ben, deu, eng, fas, ... (18) |
| MIRACLRetrieval | Retrieval | text | ara, ben, deu, eng, fas, ... (18) |
| Ko-StrategyQA | Retrieval | text | kor |
| KLUE-STS | STS | text | kor |
| KorSTS | STS | text | kor |
MTEB(nld, v1)¶
Dutch text embedding quality across classification, clustering, pair classification, multilabel classification, reranking, retrieval, and semantic similarity.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| DutchBookReviewSentimentClassification.v2 | Classification | text | nld |
| MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| SIB200Classification | Classification | text | ace, acm, acq, aeb, afr, ... (197) |
| MultiHateClassification | Classification | text | ara, cmn, deu, eng, fra, ... (11) |
| VaccinChatNLClassification | Classification | text | nld |
| DutchColaClassification | Classification | text | nld |
| DutchGovernmentBiasClassification | Classification | text | nld |
| DutchSarcasticHeadlinesClassification | Classification | text | nld |
| DutchNewsArticlesClassification | Classification | text | nld |
| OpenTenderClassification | Classification | text | nld |
| IconclassClassification | Classification | text | nld |
| SICKNLPairClassification | PairClassification | text | nld |
| XLWICNLPairClassification | PairClassification | text | nld |
| CovidDisinformationNLMultiLabelClassification | MultilabelClassification | text | nld |
| MultiEURLEXMultilabelClassification | MultilabelClassification | text | bul, ces, dan, deu, ell, ... (23) |
| VABBMultiLabelClassification | MultilabelClassification | text | nld |
| DutchNewsArticlesClusteringS2S | Clustering | text | nld |
| DutchNewsArticlesClusteringP2P | Clustering | text | nld |
| SIB200ClusteringS2S | Clustering | text | ace, acm, acq, aeb, afr, ... (197) |
| VABBClusteringS2S | Clustering | text | nld |
| VABBClusteringP2P | Clustering | text | nld |
| OpenTenderClusteringS2S | Clustering | text | nld |
| OpenTenderClusteringP2P | Clustering | text | nld |
| IconclassClusteringS2S | Clustering | text | nld |
| WikipediaRerankingMultilingual | Reranking | text | ben, bul, ces, dan, deu, ... (18) |
| ArguAna-NL.v2 | Retrieval | text | nld |
| SCIDOCS-NL.v2 | Retrieval | text | nld |
| SciFact-NL.v2 | Retrieval | text | nld |
| NFCorpus-NL.v2 | Retrieval | text | nld |
| BelebeleRetrieval | Retrieval | text | acm, afr, als, amh, apc, ... (115) |
| WebFAQRetrieval | Retrieval | text | ara, aze, ben, bul, cat, ... (51) |
| DutchNewsArticlesRetrieval | Retrieval | text | nld |
| bBSARDNLRetrieval | Retrieval | text | nld |
| LegalQANLRetrieval | Retrieval | text | nld |
| OpenTenderRetrieval | Retrieval | text | nld |
| VABBRetrieval | Retrieval | text | nld |
| WikipediaRetrievalMultilingual | Retrieval | text | ben, bul, ces, dan, deu, ... (16) |
| SICK-NL-STS | STS | text | nld |
| STSBenchmarkMultilingualSTS | STS | text | cmn, deu, eng, fra, ita, ... (10) |
Citation
@misc{banar2025mtebnle5nlembeddingbenchmark,
archiveprefix = {arXiv},
author = {Nikolay Banar and Ehsan Lotfi and Jens Van Nooten and Cristina Arhiliuc and Marija Kliocaite and Walter Daelemans},
eprint = {22509.12340},
primaryclass = {cs.CL},
title = {MTEB-NL and E5-NL: Embedding Benchmark and Models for Dutch},
url = {https://arxiv.org/abs/2509.12340},
year = {2025},
}
MTEB(pol, v1)¶
Polish text embedding quality across classification, clustering, pair classification, retrieval, and semantic similarity, combining adapted community datasets with a novel Polish scientific literature corpus (PLSC).
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| AllegroReviews | Classification | text | pol |
| CBD | Classification | text | pol |
| MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| PolEmo2.0-IN | Classification | text | pol |
| PolEmo2.0-OUT | Classification | text | pol |
| PAC | Classification | text | pol |
| EightTagsClustering | Clustering | text | pol |
| PlscClusteringS2S | Clustering | text | pol |
| PlscClusteringP2P | Clustering | text | pol |
| CDSC-E | PairClassification | text | pol |
| PpcPC | PairClassification | text | pol |
| PSC | PairClassification | text | pol |
| SICK-E-PL | PairClassification | text | pol |
| CDSC-R | STS | text | pol |
| SICK-R-PL | STS | text | pol |
| STS22 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
Citation
@article{poswiata2024plmteb,
author = {Rafał Poświata and Sławomir Dadas and Michał Perełkiewicz},
journal = {arXiv preprint arXiv:2405.10138},
title = {PL-MTEB: Polish Massive Text Embedding Benchmark},
year = {2024},
}
MTEB(por, v1)¶
Portuguese text embedding quality benchmark across semantic text similarity, classification, reranking and retrieval.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| MultiHateClassification | Classification | text | ara, cmn, deu, eng, fra, ... (11) |
| TweetSentimentClassification | Classification | text | ara, deu, eng, fra, hin, ... (8) |
| WebFAQRetrieval | Retrieval | text | ara, aze, ben, bul, cat, ... (51) |
MTEB(rus, v1)¶
Russian text embedding quality across classification, clustering, reranking, pair classification, retrieval, and semantic similarity, including novel Russian-specific tasks in each category.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| GeoreviewClassification | Classification | text | rus |
| HeadlineClassification | Classification | text | rus |
| InappropriatenessClassification | Classification | text | rus |
| KinopoiskClassification | Classification | text | rus |
| MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| RuReviewsClassification | Classification | text | rus |
| RuSciBenchGRNTIClassification | Classification | text | rus |
| RuSciBenchOECDClassification | Classification | text | rus |
| GeoreviewClusteringP2P | Clustering | text | rus |
| RuSciBenchGRNTIClusteringP2P | Clustering | text | rus |
| RuSciBenchOECDClusteringP2P | Clustering | text | rus |
| CEDRClassification | MultilabelClassification | text | rus |
| SensitiveTopicsClassification | MultilabelClassification | text | rus |
| TERRa | PairClassification | text | rus |
| MIRACLReranking | Reranking | text | ara, ben, deu, eng, fas, ... (18) |
| RuBQReranking | Reranking | text | rus |
| MIRACLRetrieval | Retrieval | text | ara, ben, deu, eng, fas, ... (18) |
| RiaNewsRetrieval | Retrieval | text | rus |
| RuBQRetrieval | Retrieval | text | rus |
| RUParaPhraserSTS | STS | text | rus |
| STS22 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
| RuSTSBenchmarkSTS | STS | text | rus |
Citation
@misc{snegirev2024russianfocusedembeddersexplorationrumteb,
archiveprefix = {arXiv},
author = {Artem Snegirev and Maria Tikhonova and Anna Maksimova and Alena Fenogenova and Alexander Abramov},
eprint = {2408.12503},
primaryclass = {cs.CL},
title = {The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design},
url = {https://arxiv.org/abs/2408.12503},
year = {2024},
}
MTEB(rus, v1.1)¶
Russian text embedding quality across classification, clustering, reranking, pair classification, retrieval, and semantic similarity. In v1.1, MIRACLRetrieval and RiaNewsRetrieval were replaced with their HardNegatives variants (v2), which include improved default prompts.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| GeoreviewClassification | Classification | text | rus |
| HeadlineClassification | Classification | text | rus |
| InappropriatenessClassification | Classification | text | rus |
| KinopoiskClassification | Classification | text | rus |
| MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
| RuReviewsClassification | Classification | text | rus |
| RuSciBenchGRNTIClassification | Classification | text | rus |
| RuSciBenchOECDClassification | Classification | text | rus |
| GeoreviewClusteringP2P | Clustering | text | rus |
| RuSciBenchGRNTIClusteringP2P | Clustering | text | rus |
| RuSciBenchOECDClusteringP2P | Clustering | text | rus |
| CEDRClassification | MultilabelClassification | text | rus |
| SensitiveTopicsClassification | MultilabelClassification | text | rus |
| TERRa | PairClassification | text | rus |
| MIRACLReranking | Reranking | text | ara, ben, deu, eng, fas, ... (18) |
| RuBQReranking | Reranking | text | rus |
| MIRACLRetrievalHardNegatives.v2 | Retrieval | text | ara, ben, deu, eng, fas, ... (18) |
| RiaNewsRetrievalHardNegatives.v2 | Retrieval | text | rus |
| RuBQRetrieval | Retrieval | text | rus |
| RUParaPhraserSTS | STS | text | rus |
| STS22 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
| RuSTSBenchmarkSTS | STS | text | rus |
Citation
@misc{snegirev2024russianfocusedembeddersexplorationrumteb,
archiveprefix = {arXiv},
author = {Artem Snegirev and Maria Tikhonova and Anna Maksimova and Alena Fenogenova and Alexander Abramov},
eprint = {2408.12503},
primaryclass = {cs.CL},
title = {The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design},
url = {https://arxiv.org/abs/2408.12503},
year = {2024},
}
MTEB(spa, v1)¶
Spanish text embedding quality across classification, clustering, pair classification, reranking, retrieval, and semantic similarity. For discussion on benchmark construction, see the original submission.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| SpanishNewsClassification.v2 | Classification | text | spa |
| SpanishSentimentClassification.v2 | Classification | text | spa |
| MLSUMClusteringP2P | Clustering | text | deu, fra, rus, spa |
| MLSUMClusteringS2S | Clustering | text | deu, fra, rus, spa |
| PawsXPairClassification | PairClassification | text | cmn, deu, eng, fra, jpn, ... (7) |
| XNLI | PairClassification | text | ara, bul, deu, ell, eng, ... (14) |
| MIRACLReranking | Reranking | text | ara, ben, deu, eng, fas, ... (18) |
| MIRACLRetrievalHardNegatives.v2 | Retrieval | text | ara, ben, deu, eng, fas, ... (18) |
| MintakaRetrieval | Retrieval | text | ara, deu, fra, hin, ita, ... (8) |
| SpanishPassageRetrievalS2P | Retrieval | text | spa |
| SpanishPassageRetrievalS2S | Retrieval | text | spa |
| XPQARetrieval | Retrieval | text | ara, cmn, deu, eng, fra, ... (13) |
| STSES | STS | text | spa |
| STSBenchmarkMultilingualSTS | STS | text | cmn, deu, eng, fra, ita, ... (10) |
| STS17 | STS | text | ara, deu, eng, fra, ita, ... (9) |
| STS22 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
MTEB(tha, v1)¶
Thai text embedding quality across classification, clustering, pair classification, reranking, and retrieval. Tasks are native Thai or high-quality human translations; machine-translated and cross-lingual tasks are excluded.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| MTOPDomainClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
| MTOPIntentClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
| SIB200Classification | Classification | text | ace, acm, acq, aeb, afr, ... (197) |
| WisesightSentimentClassification.v2 | Classification | text | tha |
| SIB200ClusteringS2S | Clustering | text | ace, acm, acq, aeb, afr, ... (197) |
| XNLI | PairClassification | text | ara, bul, deu, ell, eng, ... (14) |
| MIRACLReranking | Reranking | text | ara, ben, deu, eng, fas, ... (18) |
| MultiLongDocReranking | Reranking | text | ara, deu, eng, fra, hin, ... (13) |
| BelebeleRetrieval | Retrieval | text | acm, afr, als, amh, apc, ... (115) |
| MIRACLRetrievalHardNegatives.v2 | Retrieval | text | ara, ben, deu, eng, fas, ... (18) |
| MKQARetrieval | Retrieval | text | ara, dan, deu, eng, fin, ... (26) |
| MrTidyRetrieval | Retrieval | text | ara, ben, eng, fin, ind, ... (11) |
| MultiLongDocRetrieval | Retrieval | text | ara, cmn, deu, eng, fra, ... (13) |
| WebFAQRetrieval | Retrieval | text | ara, aze, ben, bul, cat, ... (51) |
| XQuADRetrieval | Retrieval | text | arb, deu, ell, eng, hin, ... (12) |
MVEB(beta)¶
Audio-visual video embedding quality across retrieval, classification, clustering, pair classification, zero-shot classification, and video-centric QA, with tasks selected to maximize coverage of audio-video joint modality inputs.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| AVMemeExamAT2VRetrieval | Any2AnyRetrieval | audio, text, video | eng |
| ActivityNetCaptionsT2VRetrieval | Any2AnyRetrieval | video, text | eng |
| AudioCapsAVVA2TRetrieval | Any2AnyRetrieval | audio, video, text | eng |
| AudioCapsAVVT2ARetrieval | Any2AnyRetrieval | video, text, audio | eng |
| MSVDT2VRetrieval | Any2AnyRetrieval | text, video | eng |
| VALOR32KT2VARetrieval | Any2AnyRetrieval | text, audio, video | eng |
| VATEXV2ARetrieval | Any2AnyRetrieval | video, audio | eng |
| VATEXVA2TRetrieval | Any2AnyRetrieval | audio, video, text | eng |
| VGGSoundAVA2VRetrieval | Any2AnyRetrieval | audio, video | eng |
| YouCook2T2VARetrieval | Any2AnyRetrieval | text, audio, video | eng |
| EgoSchemaVideoCentricQA | VideoCentricQA | video, text | eng |
| AVEDatasetClassification | VideoClassification | video, audio | eng |
| AVMemeAudioVideoClassification | VideoClassification | video, audio | bos, bre, deu, eng, fas, ... (16) |
| BreakfastClassification | VideoClassification | video | eng |
| Kinetics700VA | VideoClassification | video, audio | eng |
| RAVDESSAVClassification | VideoClassification | video, audio | eng |
| UCF101VideoAudioClassification | VideoClassification | video, audio | eng |
| MELDEmotionAudioVideoClustering | VideoClustering | video, audio | eng |
| MusicAVQACLSAudioVideoClustering | VideoClustering | video, audio | eng |
| HumanAnimalCartoonVAPairClassification | VideoPairClassification | video, audio | eng |
| MusicAVQAVAPairClassification | VideoPairClassification | video, audio | eng |
| HMDB51ZeroShot | VideoZeroshotClassification | video, text | eng |
| WorldSenseAudioVideoZeroShot | VideoZeroshotClassification | video, audio, text | eng |
MVEB(text, video, beta)¶
Text and video embedding quality across retrieval, classification, clustering, pair classification, zero-shot classification, and video-centric QA, for models without an audio encoder.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| AVMemeExamT2VRetrieval | Any2AnyRetrieval | text, video | eng |
| ActivityNetCaptionsT2VRetrieval | Any2AnyRetrieval | video, text | eng |
| AudioCapsAVT2VRetrieval | Any2AnyRetrieval | text, video | eng |
| DiDeMoV2TRetrieval | Any2AnyRetrieval | video, text | eng |
| MSVDV2TRetrieval | Any2AnyRetrieval | video, text | eng |
| Panda70MT2VRetrieval | Any2AnyRetrieval | text, video | eng |
| VALOR32KT2VRetrieval | Any2AnyRetrieval | text, video | eng |
| VATEXT2VRetrieval | Any2AnyRetrieval | text, video | eng |
| OmniVideoBenchVideoCentricQA | VideoCentricQA | video, text | eng |
| AVMemeVideoClassification | VideoClassification | video | bos, bre, deu, eng, fas, ... (16) |
| BreakfastClassification | VideoClassification | video | eng |
| Kinetics700V | VideoClassification | video | eng |
| VGGSoundV | VideoClassification | video | eng |
| RAVDESSVideoClustering | VideoClustering | video | eng |
| HumanAnimalCartoonVPairClassification | VideoPairClassification | video | eng |
| Kinetics400ZeroShot | VideoZeroshotClassification | video, text | eng |
| MELDVideoZeroShot | VideoZeroshotClassification | video, text | eng |
| UCF101VideoZeroShotClassification | VideoZeroshotClassification | video, text | eng |
| WorldSenseVideoZeroShot | VideoZeroshotClassification | video, text | eng |
MVEB(video, beta)¶
Video-only embedding quality across classification and pair classification, for encoders without a text component. Retrieval, QA, and zero-shot tasks are excluded as they require a text encoder.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| AVMemeVideoClassification | VideoClassification | video | bos, bre, deu, eng, fas, ... (16) |
| BreakfastClassification | VideoClassification | video | eng |
| HMDB51Classification | VideoClassification | video | eng |
| Kinetics600V | VideoClassification | video | eng |
| MELDVideoClassification | VideoClassification | video | eng |
| WorldSenseVideoClassification | VideoClassification | video | eng |
| HumanAnimalCartoonVPairClassification | VideoPairClassification | video | eng |
| MusicAVQAVPairClassification | VideoPairClassification | video | eng |
| RAVDESSAVVPairClassification | VideoPairClassification | video | eng |
NanoBEIR¶
Zero-shot retrieval quality using subsets of the BEIR datasets, designed for faster evaluation with reduced computational cost.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| NanoArguAnaRetrieval | Retrieval | text | eng |
| NanoClimateFeverRetrieval | Retrieval | text | eng |
| NanoDBPediaRetrieval | Retrieval | text | eng |
| NanoFEVERRetrieval | Retrieval | text | eng |
| NanoFiQA2018Retrieval | Retrieval | text | eng |
| NanoHotpotQARetrieval | Retrieval | text | eng |
| NanoMSMARCORetrieval | Retrieval | text | eng |
| NanoNFCorpusRetrieval | Retrieval | text | eng |
| NanoNQRetrieval | Retrieval | text | eng |
| NanoQuoraRetrieval | Retrieval | text | eng |
| NanoSCIDOCSRetrieval | Retrieval | text | eng |
| NanoSciFactRetrieval | Retrieval | text | eng |
| NanoTouche2020Retrieval | Retrieval | text | eng |
R2MED¶
Reasoning-driven medical retrieval quality across biology, bioinformatics, medical sciences, clinical, and treatment scenarios, requiring models to perform multi-step reasoning over medical literature.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| R2MEDBiologyRetrieval | Retrieval | text | eng |
| R2MEDBioinformaticsRetrieval | Retrieval | text | eng |
| R2MEDMedicalSciencesRetrieval | Retrieval | text | eng |
| R2MEDMedXpertQAExamRetrieval | Retrieval | text | eng |
| R2MEDMedQADiagRetrieval | Retrieval | text | eng |
| R2MEDPMCTreatmentRetrieval | Retrieval | text | eng |
| R2MEDPMCClinicalRetrieval | Retrieval | text | eng |
| R2MEDIIYiClinicalRetrieval | Retrieval | text | eng |
Citation
@article{li2025r2med,
author = {Li, Lei and Zhou, Xiao and Liu, Zheng},
journal = {arXiv preprint arXiv:2505.14558},
title = {R2MED: A Benchmark for Reasoning-Driven Medical Retrieval},
year = {2025},
}
RAR-b¶
Reasoning capabilities of retrieval models, framing commonsense, temporal, and domain-specific reasoning tasks as retrieval problems.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| ARCChallenge | Retrieval | text | eng |
| AlphaNLI | Retrieval | text | eng |
| HellaSwag | Retrieval | text | eng |
| WinoGrande | Retrieval | text | eng |
| PIQA | Retrieval | text | eng |
| SIQA | Retrieval | text | eng |
| Quail | Retrieval | text | eng |
| SpartQA | Retrieval | text | eng |
| TempReasonL1 | Retrieval | text | eng |
| TempReasonL2Pure | Retrieval | text | eng |
| TempReasonL2Fact | Retrieval | text | eng |
| TempReasonL2Context | Retrieval | text | eng |
| TempReasonL3Pure | Retrieval | text | eng |
| TempReasonL3Fact | Retrieval | text | eng |
| TempReasonL3Context | Retrieval | text | eng |
| RARbCode | Retrieval | text | eng |
| RARbMath | Retrieval | text | eng |
Citation
@article{xiao2024rar,
author = {Xiao, Chenghao and Hudson, G Thomas and Al Moubayed, Noura},
journal = {arXiv preprint arXiv:2404.06347},
title = {RAR-b: Reasoning as Retrieval Benchmark},
year = {2024},
}
RTEB(Code, beta)¶
Retrieval quality in the code domain across algorithmic problems, data science tasks, code evaluation, SQL retrieval, and multilingual code retrieval, with tasks representative of real-world production retrieval demands. A domain-specific subset of RTEB. Includes both open and closed datasets; to submit results on private tasks, please open an issue.
Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| AppsRetrieval | Retrieval | text | eng, python |
| DS1000Retrieval | Retrieval | text | eng, python |
| HumanEvalRetrieval | Retrieval | text | eng, python |
| MBPPRetrieval | Retrieval | text | eng, python |
| WikiSQLRetrieval | Retrieval | text | eng, sql |
| FreshStackRetrieval | Retrieval | text | eng, go, javascript, python |
| SWEbenchCodeRetrieval | Retrieval | text | eng, python |
| Code1Retrieval | Retrieval | text | eng |
| JapaneseCode1Retrieval | Retrieval | text | jpn |
Citation
@article{rteb2025,
author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
year = {2025},
}
RTEB(Health, beta)¶
Retrieval quality in the healthcare and medical domain across medical Q&A, healthcare information retrieval, and multilingual medical consultation, with tasks representative of real-world production retrieval demands. A domain-specific subset of RTEB. Includes both open and closed datasets; to submit results on private tasks, please open an issue.
Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| ChatDoctorRetrieval | Retrieval | text | eng |
| CUREv1 | Retrieval | text | eng, fra, spa |
| EnglishHealthcare1Retrieval | Retrieval | text | eng |
| GermanHealthcare1Retrieval | Retrieval | text | deu |
Citation
@article{rteb2025,
author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
year = {2025},
}
RTEB(Law, beta)¶
Retrieval quality in the legal domain across case documents, statutes, legal summarization, and multilingual legal Q&A, with tasks representative of real-world production retrieval demands. A domain-specific subset of RTEB. Includes both open and closed datasets; to submit results on private tasks, please open an issue.
Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| AILACasedocs | Retrieval | text | eng |
| AILAStatutes | Retrieval | text | eng |
| LegalSummarization | Retrieval | text | eng |
| LegalQuAD | Retrieval | text | deu |
| FrenchLegal1Retrieval | Retrieval | text | fra |
| GermanLegal1Retrieval | Retrieval | text | deu |
| JapaneseLegal1Retrieval | Retrieval | text | jpn |
Citation
@article{rteb2025,
author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
year = {2025},
}
RTEB(beta)¶
Retrieval quality across specialized domains including legal, finance, code, and healthcare in multiple languages, with tasks representative of real-world production retrieval demands. Includes both open and closed datasets; to submit results on private tasks, please open an issue.
Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| AILACasedocs | Retrieval | text | eng |
| AILAStatutes | Retrieval | text | eng |
| LegalSummarization | Retrieval | text | eng |
| LegalQuAD | Retrieval | text | deu |
| FinanceBenchRetrieval | Retrieval | text | eng |
| HC3FinanceRetrieval | Retrieval | text | eng |
| FinQARetrieval | Retrieval | text | eng |
| AppsRetrieval | Retrieval | text | eng, python |
| DS1000Retrieval | Retrieval | text | eng, python |
| HumanEvalRetrieval | Retrieval | text | eng, python |
| MBPPRetrieval | Retrieval | text | eng, python |
| WikiSQLRetrieval | Retrieval | text | eng, sql |
| FreshStackRetrieval | Retrieval | text | eng, go, javascript, python |
| SWEbenchCodeRetrieval | Retrieval | text | eng, python |
| ChatDoctorRetrieval | Retrieval | text | eng |
| CUREv1 | Retrieval | text | eng, fra, spa |
| MIRACLRetrievalHardNegatives | Retrieval | text | ara, ben, deu, eng, fas, ... (18) |
| Code1Retrieval | Retrieval | text | eng |
| JapaneseCode1Retrieval | Retrieval | text | jpn |
| EnglishFinance1Retrieval | Retrieval | text | eng |
| EnglishFinance2Retrieval | Retrieval | text | eng |
| EnglishFinance3Retrieval | Retrieval | text | eng |
| EnglishFinance4Retrieval | Retrieval | text | eng |
| EnglishHealthcare1Retrieval | Retrieval | text | eng |
| French1Retrieval | Retrieval | text | fra |
| FrenchLegal1Retrieval | Retrieval | text | fra |
| German1Retrieval | Retrieval | text | deu |
| GermanHealthcare1Retrieval | Retrieval | text | deu |
| GermanLegal1Retrieval | Retrieval | text | deu |
| JapaneseLegal1Retrieval | Retrieval | text | jpn |
Citation
@article{rteb2025,
author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
year = {2025},
}
RTEB(deu, beta)¶
Retrieval quality in German across legal, healthcare, and business domains, with tasks representative of real-world production retrieval demands. A German-language subset of RTEB. Includes both open and closed datasets; to submit results on private tasks, please open an issue.
Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| LegalQuAD | Retrieval | text | deu |
| German1Retrieval | Retrieval | text | deu |
| GermanHealthcare1Retrieval | Retrieval | text | deu |
| GermanLegal1Retrieval | Retrieval | text | deu |
Citation
@article{rteb2025,
author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
year = {2025},
}
RTEB(eng, beta)¶
Retrieval quality in English across legal, finance, code, and healthcare domains, with tasks representative of real-world production retrieval demands. An English-only subset of RTEB. Includes both open and closed datasets; to submit results on private tasks, please open an issue.
Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| AILACasedocs | Retrieval | text | eng |
| AILAStatutes | Retrieval | text | eng |
| LegalSummarization | Retrieval | text | eng |
| FinanceBenchRetrieval | Retrieval | text | eng |
| HC3FinanceRetrieval | Retrieval | text | eng |
| FinQARetrieval | Retrieval | text | eng |
| AppsRetrieval | Retrieval | text | eng, python |
| DS1000Retrieval | Retrieval | text | eng, python |
| HumanEvalRetrieval | Retrieval | text | eng, python |
| MBPPRetrieval | Retrieval | text | eng, python |
| WikiSQLRetrieval | Retrieval | text | eng, sql |
| FreshStackRetrieval | Retrieval | text | eng, go, javascript, python |
| SWEbenchCodeRetrieval | Retrieval | text | eng, python |
| ChatDoctorRetrieval | Retrieval | text | eng |
| Code1Retrieval | Retrieval | text | eng |
| EnglishFinance1Retrieval | Retrieval | text | eng |
| EnglishFinance2Retrieval | Retrieval | text | eng |
| EnglishFinance3Retrieval | Retrieval | text | eng |
| EnglishFinance4Retrieval | Retrieval | text | eng |
| EnglishHealthcare1Retrieval | Retrieval | text | eng |
| CUREv1 | Retrieval | text | eng, fra, spa |
Citation
@article{rteb2025,
author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
year = {2025},
}
RTEB(fin, beta)¶
Retrieval quality in the financial domain across finance benchmarks, Q&A, financial document retrieval, and corporate governance, with tasks representative of real-world production retrieval demands. A domain-specific subset of RTEB. Includes both open and closed datasets; to submit results on private tasks, please open an issue.
Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| FinanceBenchRetrieval | Retrieval | text | eng |
| HC3FinanceRetrieval | Retrieval | text | eng |
| FinQARetrieval | Retrieval | text | eng |
| EnglishFinance1Retrieval | Retrieval | text | eng |
| EnglishFinance2Retrieval | Retrieval | text | eng |
| EnglishFinance3Retrieval | Retrieval | text | eng |
| EnglishFinance4Retrieval | Retrieval | text | eng |
Citation
@article{rteb2025,
author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
year = {2025},
}
RTEB(fra, beta)¶
Retrieval quality in French across legal and general knowledge domains, with tasks representative of real-world production retrieval demands. A French-language subset of RTEB. Includes both open and closed datasets; to submit results on private tasks, please open an issue.
Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| CUREv1 | Retrieval | text | eng, fra, spa |
| French1Retrieval | Retrieval | text | fra |
| FrenchLegal1Retrieval | Retrieval | text | fra |
Citation
@article{rteb2025,
author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
year = {2025},
}
RTEB(jpn, beta)¶
Retrieval quality in Japanese across legal and code domains, with tasks representative of real-world production retrieval demands. A Japanese-language subset of RTEB. Includes both open and closed datasets; to submit results on private tasks, please open an issue.
Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| JapaneseCode1Retrieval | Retrieval | text | jpn |
| JapaneseLegal1Retrieval | Retrieval | text | jpn |
Citation
@article{rteb2025,
author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
year = {2025},
}
RuSciBench¶
Scientific text embedding quality in Russian and English across bitext mining, classification, retrieval, and regression tasks, using data sourced from eLibrary, Russia's largest electronic library of scientific publications.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| RuSciBenchBitextMining.v2 | BitextMining | text | eng, rus |
| RuSciBenchCoreRiscClassification | Classification | text | eng, rus |
| RuSciBenchGRNTIClassification.v2 | Classification | text | eng, rus |
| RuSciBenchOECDClassification.v2 | Classification | text | eng, rus |
| RuSciBenchPubTypeClassification | Classification | text | eng, rus |
| RuSciBenchCiteRetrieval | Retrieval | text | eng, rus |
| RuSciBenchCociteRetrieval | Retrieval | text | eng, rus |
| RuSciBenchCitedCountRegression | Regression | text | eng, rus |
| RuSciBenchYearPublRegression | Regression | text | eng, rus |
Citation
@article{vatolin2024ruscibench,
author = {Vatolin, A. and Gerasimenko, N. and Ianina, A. and Vorontsov, K.},
doi = {10.1134/S1064562424602191},
issn = {1531-8362},
journal = {Doklady Mathematics},
month = {12},
number = {1},
pages = {S251--S260},
title = {RuSciBench: Open Benchmark for Russian and English Scientific Document Representations},
url = {https://doi.org/10.1134/S1064562424602191},
volume = {110},
year = {2024},
}
VN-MTEB (vie, v1)¶
Vietnamese text embedding quality across retrieval, classification, pair classification, clustering, reranking, and semantic similarity.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| ArguAna-VN | Retrieval | text | vie |
| SciFact-VN | Retrieval | text | vie |
| ClimateFEVER-VN | Retrieval | text | vie |
| FEVER-VN | Retrieval | text | vie |
| DBPedia-VN | Retrieval | text | vie |
| NQ-VN | Retrieval | text | vie |
| HotpotQA-VN | Retrieval | text | vie |
| MSMARCO-VN | Retrieval | text | vie |
| TRECCOVID-VN | Retrieval | text | vie |
| FiQA2018-VN | Retrieval | text | vie |
| NFCorpus-VN | Retrieval | text | vie |
| SCIDOCS-VN | Retrieval | text | vie |
| Touche2020-VN | Retrieval | text | vie |
| Quora-VN | Retrieval | text | vie |
| CQADupstackAndroid-VN | Retrieval | text | vie |
| CQADupstackGis-VN | Retrieval | text | vie |
| CQADupstackMathematica-VN | Retrieval | text | vie |
| CQADupstackPhysics-VN | Retrieval | text | vie |
| CQADupstackProgrammers-VN | Retrieval | text | vie |
| CQADupstackStats-VN | Retrieval | text | vie |
| CQADupstackTex-VN | Retrieval | text | vie |
| CQADupstackUnix-VN | Retrieval | text | vie |
| CQADupstackWebmasters-VN | Retrieval | text | vie |
| CQADupstackWordpress-VN | Retrieval | text | vie |
| Banking77VNClassification | Classification | text | vie |
| EmotionVNClassification | Classification | text | vie |
| AmazonCounterfactualVNClassification | Classification | text | vie |
| MTOPDomainVNClassification | Classification | text | vie |
| TweetSentimentExtractionVNClassification | Classification | text | vie |
| ToxicConversationsVNClassification | Classification | text | vie |
| ImdbVNClassification | Classification | text | vie |
| MTOPIntentVNClassification | Classification | text | vie |
| MassiveScenarioVNClassification | Classification | text | vie |
| MassiveIntentVNClassification | Classification | text | vie |
| AmazonReviewsVNClassification | Classification | text | vie |
| AmazonPolarityVNClassification | Classification | text | vie |
| SprintDuplicateQuestions-VN | PairClassification | text | vie |
| TwitterSemEval2015-VN | PairClassification | text | vie |
| TwitterURLCorpus-VN | PairClassification | text | vie |
| TwentyNewsgroupsClustering-VN | Clustering | text | vie |
| RedditClusteringP2P-VN | Clustering | text | vie |
| StackExchangeClusteringP2P-VN | Clustering | text | vie |
| StackExchangeClustering-VN | Clustering | text | vie |
| RedditClustering-VN | Clustering | text | vie |
| SciDocsRR-VN | Reranking | text | vie |
| AskUbuntuDupQuestions-VN | Reranking | text | vie |
| StackOverflowDupQuestions-VN | Reranking | text | vie |
| BIOSSES-VN | STS | text | vie |
| SICK-R-VN | STS | text | vie |
| STSBenchmark-VN | STS | text | vie |
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
archiveprefix = {arXiv},
author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
eprint = {2507.21500},
primaryclass = {cs.CL},
title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
url = {https://arxiv.org/abs/2507.21500},
year = {2025},
}
ViDoRe(v1&v2)¶
Visual document retrieval across diverse document types and domains, combining the ViDoRe v1 and v2 task sets.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| VidoreArxivQARetrieval | DocumentUnderstanding | text, image | eng |
| VidoreDocVQARetrieval | DocumentUnderstanding | text, image | eng |
| VidoreInfoVQARetrieval | DocumentUnderstanding | text, image | eng |
| VidoreTabfquadRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreTatdqaRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreShiftProjectRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAAIRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAEnergyRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAGovernmentReportsRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAHealthcareIndustryRetrieval | DocumentUnderstanding | text, image | eng |
| Vidore2ESGReportsRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, spa |
| Vidore2EconomicsReportsRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, spa |
| Vidore2BioMedicalLecturesRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, spa |
| Vidore2ESGReportsHLRetrieval | DocumentUnderstanding | text, image | eng |
Citation
@article{mace2025vidorev2,
author = {Macé, Quentin and Loison António and Faysse, Manuel},
journal = {arXiv preprint arXiv:2505.17166},
title = {ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval},
year = {2025},
}
ViDoRe(v1)¶
Visual document retrieval across diverse document types and domains, matching natural language queries to document page images.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| VidoreArxivQARetrieval | DocumentUnderstanding | text, image | eng |
| VidoreDocVQARetrieval | DocumentUnderstanding | text, image | eng |
| VidoreInfoVQARetrieval | DocumentUnderstanding | text, image | eng |
| VidoreTabfquadRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreTatdqaRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreShiftProjectRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAAIRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAEnergyRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAGovernmentReportsRetrieval | DocumentUnderstanding | text, image | eng |
| VidoreSyntheticDocQAHealthcareIndustryRetrieval | DocumentUnderstanding | text, image | eng |
Citation
@article{faysse2024colpali,
author = {Faysse, Manuel and Sibille, Hugues and Wu, Tony and Viaud, Gautier and Hudelot, C{\'e}line and Colombo, Pierre},
journal = {arXiv preprint arXiv:2407.01449},
title = {ColPali: Efficient Document Retrieval with Vision Language Models},
year = {2024},
}
ViDoRe(v2)¶
Visual document retrieval across ESG reports, economics reports, biomedical lectures, and related enterprise document types, matching natural language queries to document page images.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| Vidore2ESGReportsRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, spa |
| Vidore2EconomicsReportsRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, spa |
| Vidore2BioMedicalLecturesRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, spa |
| Vidore2ESGReportsHLRetrieval | DocumentUnderstanding | text, image | eng |
Citation
@article{mace2025vidorev2,
author = {Macé, Quentin and Loison António and Faysse, Manuel},
journal = {arXiv preprint arXiv:2505.17166},
title = {ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval},
year = {2025},
}
ViDoRe(v3)¶
Visual document retrieval across multi-modal enterprise documents spanning finance, industrial, computer science, pharmaceutical, and other professional domains. Includes both open and closed datasets; to submit results on private tasks, please open an issue.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| Vidore3FinanceEnRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3IndustrialRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3ComputerScienceRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3PharmaceuticalsRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3HrRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3FinanceFrRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3PhysicsRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3EnergyRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3TelecomRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3NuclearRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
Citation
@article{loison2026vidorev3comprehensiveevaluation,
archiveprefix = {arXiv},
author = {António Loison and Quentin Macé and Antoine Edy and Victor Xing and Tom Balough and Gabriel Moreira and Bo Liu and Manuel Faysse and Céline Hudelot and Gautier Viaud},
eprint = {2601.08620},
primaryclass = {cs.AI},
title = {ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios},
url = {https://arxiv.org/abs/2601.08620},
year = {2026},
}
ViDoRe(v3.1)¶
Visual document retrieval across multi-modal enterprise documents spanning finance, industrial, computer science, pharmaceutical, and other professional domains. Includes both open and closed datasets; to submit results on private tasks, please open an issue. v3.1 adds markdown derived from OCR to support text-only and joint image-text baselines.
Tasks
| name | type | modalities | languages |
|---|---|---|---|
| Vidore3FinanceEnRetrieval.v2 | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3IndustrialRetrieval.v2 | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3ComputerScienceRetrieval.v2 | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3PharmaceuticalsRetrieval.v2 | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3HrRetrieval.v2 | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3FinanceFrRetrieval.v2 | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3PhysicsRetrieval.v2 | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3EnergyRetrieval.v2 | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3TelecomRetrieval.v2 | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
| Vidore3NuclearRetrieval.v2 | DocumentUnderstanding | text, image | deu, eng, fra, ita, por, ... (6) |
Citation
@article{loison2026vidorev3comprehensiveevaluation,
archiveprefix = {arXiv},
author = {António Loison and Quentin Macé and Antoine Edy and Victor Xing and Tom Balough and Gabriel Moreira and Bo Liu and Manuel Faysse and Céline Hudelot and Gautier Viaud},
eprint = {2601.08620},
primaryclass = {cs.AI},
title = {ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios},
url = {https://arxiv.org/abs/2601.08620},
year = {2026},
}