Hybrid retrieval — ハイブリッド検索

Hybrid Retrieval（ハイブリッド検索） — レキシカル（sparse）シグナルとセマンティック（dense/late‑interaction）シグナルを組み合わせ、検索結果の再現率と適合率を向上させる情報検索手法の一群です。ハイブリッド方式は、用語の完全一致（BM25/TF-IDF）とベクトル近接性（バイエンコーダ、後期インタラクションマルチベクトルモデル）の利点を組み合わせ、さらにスケールの異なるスコアに対して頑健なランキング融合手法（例：Reciprocal Rank Fusion、CombSUM/CombMNZ）やクロスエンコーダによる再ランキングを活用します。^[1]^[2]^[3]

定義と動機

ハイブリッド検索とは、2つ（またはそれ以上）の独立したシグナルチャネルによる並列またはカスケード検索を行い、その後にマージや再ランキングを行うプロセスです。主な動機は以下の通りです：(i) 「用語のミスマッチ」の克服（同義語、言い換え）、(ii) タイプミスや形態素への頑健性、(iii) 特定のコードや識別子の抽出（sparseモデルが得意とする領域）、(iv) 新しいドメインや言語への転移（denseモデルがセマンティックな般化を提供）。^[4]^[5]^[6]

ハイブリッド検索の構成要素

レキシカル (sparse)

古典的モデル TF-IDFやBM25/BM25Fは、転置インデックスに基づく標準的なベースライン手法です。BM25は確率的関連性フレームワーク（PRF）で理論的に裏付けられており、ランキングの第一段階で広く使用されています。^[7]
学習可能なsparseモデル
- SPLADE / SPLADE++/v3 スパース性正則化を伴うMLMヘッドを通じて、用語の拡張と重み付けを学習するニューラルスパースモデル。強力な性能と優れた転移性（BEIR）を示します。^[8]^[9]^[10]
- uniCOIL/COIL 文脈化された転置リストとその簡略版であるuniCOIL。古典的な転置インデックスと互換性があります。^[11]

セマンティック (dense/late‑interaction)

バイエンコーダ (単一ベクトル) クエリとドキュメントがベクトルモデルによってエンコードされ、類似度はドット積/MIPSで計算されます。例：DPR,^[12] ANCE,^[13] Contriever,^[14] GTR,^[15] E5。^[16]
後期インタラクション (マルチベクトル) ColBERT/ColBERTv2のようなモデルは、「後期」のインタラクションにおいてトークンレベルの対応をモデル化します。トレードオフとして、インデックスサイズとレイテンシーが増加する代わりに精度が向上しますが、これはPLAIDやWARPといったエンジニアリングによる高速化手法で緩和されます。^[17]^[18]^[19]

ハイブリッド化とランキング融合の方式

並列検索と候補のマージ sparseとdenseの候補リストをそれぞれの内部スコアと共に独立して取得し、その後ランキングを融合します。^[20]
RRF (Reciprocal Rank Fusion) 比較不能なランキングスコアに対して頑健な手法で、逆順位を合計します：

$R R F (d) = \sum_{i = 1}^{m} \frac{1}{k + {r a n k}_{i} (d)}$ 、ここで通常 $k \approx 60$ です。^[21] Elasticsearch/OpenSearchなどの商用エンジンでは、組み込みのリトリーバー/プロセッサとしてサポートされています。^[22]^[23]

CombSUM/CombMNZなど 古典的な「スコア加算」関数（必要に応じて正規化を伴う）。^[24]^[25]^[26]
重み付き線形結合

$S (d) = α \cdot S_{sparse} (d) + (1 - α) \cdot S_{dense} (d)$ , $α \in [0, 1]$ 。 $α$ の選択は、固定的または学習可能（コレクションごと/クエリごと）に設定できます。^[27]

スコアの正規化 CombSUM/CombMNZでは、スケールを一致させるためにmin‑max正規化やz‑scoreなどがよく使用されます。^[28] 一方、RRFは順位のみに依存します。
動的/適応的な重み付け クエリルーティング、クエリの特徴、およびチャネルの選択/重み付けのためのLTRモデル。近年の研究では、単純な学習済み混合がRRFを上回り、正規化への感度も低いことが示されています。^[29]

再ランキングと多段階パイプライン

ハイブリッドシステムは通常、retrieval → fusion → rerank という構造で構築されます。再ランキングには以下の手法が用いられます：

クロスエンコーダ (BERT/T5) 最も高精度ですが高コストな手法です。MonoBERT/MonoT5が上位N件の候補を並べ替えるために使用されます。^[30]^[31]
再ランキングモデルとしての後期インタラクション ColBERTファミリーは再ランキングモデルとしても機能します。PLAIDやWARPなどの最新の高速化手法により、品質を損なうことなくレイテンシーを削減できます。^[32]^[33]

「品質 ↔ 遅延/コスト」のトレードオフは、RAGや厳しいSLA（テールレイテンシー p95/p99 を参照）の文脈で特に重要です。^[34]

ベンチマークによる評価

BEIR 多様なコレクション/タスクからなる統一されたセットで、リトリーバーのゼロショット/ドメイン外評価に使用されます（例：TREC‑COVID, NFCorpus, NQ, HotpotQA, FiQA‑2018, DBPedia‑entity, ArguAna, Webis‑Touché‑2020, FEVER/Climate‑FEVER, Scidocs, SciFact, CQADupStackなど）。^[35]
TREC Deep Learning / MS MARCO ビッグデータ環境でのリトリーバーと再ランキングモデルの学習/評価のための古典的なリソースです。^[36]^[37]^[38]
品質メトリクス nDCG@k, Recall@k, MRR。パフォーマンスメトリクスとしては、latency p50/p95/p99, QPS。運用メトリクスとしては、メモリ/コスト（CPU/GPU, インデックス）などがあります。^[39]^[40]
アブレーションスタディ 各チャネル/重みの貢献度や、RRFの $k$ や混合比の $α$ といったパラメータへの感度を特定することが推奨されます。また、言い換えやOOD（Out-of-Domain）シフトへの頑健性を評価することも重要です。^[41]^[42]

エンジニアリングとプロダクションでの実践

インデックスとANN MIPS/コサイン類似度のためのFAISS (Flat/HNSW/IVF‑PQ)、HNSW、ScaNNなど。^[43]^[44]^[45]
IRスタック sparse/denseおよびハイブリッドパイプラインのためのLucene/Anserini/Pyserini。BEIRでの高い再現性を提供します。^[46]^[47]
ベクトルデータベースと検索エンジン Qdrant, Weaviate, pgvector/PostgreSQL, Vespa, Elasticsearch/OpenSearchは、ネイティブなハイブリッド検索モード（BM25F+vector）やRRF/線形混合をサポートしています。^[48]^[49]^[50]^[51]^[52]
RAGパターン アーキテクチャ: retrieval → fusion → rerank → LLMコンテキスト。トークン数の制限とソースの追跡を伴います。^[53]
インデックスの更新、重複排除、トークン化 BM25とベクタライザの間でトークン化を一致させることが重要です。また、混合前にスコアのキャリブレーション（正規化/スケーリング）が必要です。^[54]

制限事項と未解決の問題

転移性と多言語対応 denseモデル（GTR/E5）は転移性を向上させますが、ドメイン/言語に敏感です。一方、sparseモデル（SPLADE）はOODに対してより頑健なことが多いです。^[55]^[56]
LLMとの統合とハルシネーション ハイブリッド検索はRAGのコンテキストにおける情報の欠落やノイズを低減しますが、ハルシネーションを完全には排除できません。厳格な再ランキングモデルとソースのフィルタリングが必要です。^[57]
コストとプライバシー マルチベクトルインデックスの保存、圧縮、暗号化、およびオンプレミススタック。TCO（総所有コスト）の評価。
トレンド ドキュメント/クエリ拡張としてのHyDE/doc2query/PRF。^[58]^[59] 混合比の学習（クエリごとの $α$ ）、より効率的な後期インタラクションモデル（PLAID/WARP）、長文ドキュメントとマルチベクトルインデックスなど。^[60]^[61]

手法の比較表

2025年9月10日時点（BEIRコレクション trec‑covid での例; nDCG@10 / Recall@100）:^[62]

*trec‑covid*における手法の比較
手法	タイプ (sparse/dense/hybrid)	アイデア/モデル	融合方式	再ランキングモデル	nDCG@10 / R@100	レイテンシー (相対)	出典
BM25	sparse	用語の完全一致 (PRF/BM25)	—	—	0.595 / 0.109	非常に低い	^[63]^[64]
SPLADE++ (ED)	sparse (learned)	スパースな拡張/用語の重み付け	—	—	0.727 / 0.128	低い–中程度	^[65]^[66]
Contriever (MS MARCO FT)	dense	対照学習によるバイエンコーダ	—	—	0.596 / 0.091	中程度	^[67]^[68]
BGE‑base‑en‑v1.5	dense	強力な汎用エンベッダ	—	—	0.781 / 0.141	中程度	^[69]
Cohere embed‑english‑v3.0	dense	商用のテキスト埋め込みモデル	—	—	0.818 / 0.159	中程度	^[70]
BM25 + dense (例: BM25+BGE)	hybrid	並列リトリーブ + リストのマージ	RRF (k≈60) または重み付き混合	任意: MonoT5/ColBERT	(実装に依存; 通常は単一チャネルの最良値を上回る)	中程度	^[71]^[72]^[73]

注: 最終行は方式を説明するためのものです。正確な数値は、エンベッダの選択、正規化、および融合パラメータに依存します（出典およびPyseriniの再現可能なスクリプトを参照）。

参考文献

Manning, C.D., Raghavan, P., Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press. ISBN 978‑0521865715.
Robertson, S., Zaragoza, H. (2009). The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval 3(4):333–389. DOI:10.1561/1500000019.
Lin, J. et al. (2021). Pyserini: A Python Toolkit for Reproducible IR. SIGIR.
Järvelin, K., Kekäläinen, J. (2002). Cumulated Gain‑Based Evaluation of IR Techniques. Information Retrieval 6:241–256. DOI:10.1023/A:1016043826386.
Dean, J., Barroso, L.A. (2013). The Tail at Scale. CACM 56(2):74–80. DOI:10.1145/2408776.2408794.

外部リンク

Pyserini / Anserini: github.com/castorini/pyserini • github.com/castorini/anserini
FAISS: arXiv:1702.08734
Weaviate (Hybrid search): docs.weaviate.io/weaviate/search/hybrid
pgvector: github.com/pgvector/pgvector
Vespa (Hybrid search tutorial): docs.vespa.ai/en/tutorials/hybrid-search.html

注釈

↑ Robertson, S., Zaragoza, H. (2009). The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389. DOI:10.1561/1500000019.
↑ Cormack, G.V., Clarke, C.L.A., Büttcher, S. (2009). Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods. SIGIR 2009, 758–759. PDF.
↑ Bruch, S., Gai, S., Ingber, A. (2023). An Analysis of Fusion Functions for Hybrid Retrieval. ACM TOIS 42(1):1–35. DOI:10.1145/3596512 • arXiv:2210.11934.
↑ Manning, C.D., Raghavan, P., Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press. ISBN 978‑0521865715 (TF‑IDF、評価、vocabulary mismatch 問題に関する章を参照)。
↑ Izacard, G. et al. (2022). Unsupervised Dense Information Retrieval with Contrastive Learning (Contriever). TACL 10:1089–1108. arXiv:2112.09118.
↑ Wang, L. et al. (2022/2024). Text Embeddings by Weakly‑Supervised Contrastive Pre‑training (E5). arXiv:2212.03533.
↑ Robertson, S., Zaragoza, H. (2009). The Probabilistic Relevance Framework: BM25 and Beyond. DOI:10.1561/1500000019.
↑ Formal, T., Piwowarski, B., Clinchant, S. (2021). SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking. arXiv:2107.05720.
↑ Formal, T. et al. (2022). Making Sparse Neural IR Models More Effective. Findings of EMNLP. arXiv:2205.04733.
↑ Formal, T. et al. (2024). SPLADE‑v3: New baselines for SPLADE. arXiv:2403.06789.
↑ Lin, J., Ma, X. (2021). A Few Brief Notes on DeepImpact, COIL, and uniCOIL. arXiv:2106.14807.
↑ Karpukhin, V. et al. (2020). Dense Passage Retrieval for Open‑Domain QA. EMNLP. arXiv:2004.04906.
↑ Xiong, L. et al. (2021). Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval (ANCE). ICLR. arXiv:2007.00808.
↑ Izacard, G. et al. (2022). TACL. arXiv:2112.09118.
↑ Ni, J. et al. (2021/2022). Large Dual Encoders Are Generalizable Retrievers (GTR). EMNLP. arXiv:2112.07899.
↑ Wang, L. et al. (2022/2024). arXiv:2212.03533.
↑ Khattab, O., Zaharia, M. (2020). ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. SIGIR. arXiv:2004.12832.
↑ Santhanam, K. et al. (2022). ColBERTv2 & PLAID. NAACL/ArXiv. arXiv:2112.01488; arXiv:2205.09707.
↑ Scheerer, J.L. et al. (2025). WARP: An Efficient Engine for Multi‑Vector Retrieval. arXiv:2501.17788.
↑ Lin, J. et al. (2021). Pyserini: A Python Toolkit for Reproducible IR with Sparse and Dense Representations. SIGIR. PDF.
↑ Cormack, G.V., Clarke, C.L.A., Büttcher, S. (2009). SIGIR. PDF.
↑ Elastic Docs. Reciprocal Rank Fusion. (2025‑09‑10閲覧). elastic.co/docs/.../reciprocal-rank-fusion.
↑ OpenSearch Docs. Score ranker processor (RRF). (2025‑09‑10閲覧). docs.opensearch.org/.../score-ranker-processor/.
↑ Fox, E.A., Shaw, J.A. (1994). Combination of Multiple Searches. TREC‑2, NIST SP 500‑215, 243–252. PDF.
↑ Lee, J.H. (1997). Analyses of Multiple Evidence Combination. SIGIR, 267–276. DOI:10.1145/258525.258587.
↑ Hsu, D.F., Taksa, I. (2005). Comparing Rank and Score Combination Methods for Data Fusion in IR. (Tech. report). PDF.
↑ Bruch, S., Gai, S., Ingber, A. (2023). TOIS. DOI:10.1145/3596512.
↑ Hsu, D.F., Taksa, I. (2005). 上記参照。
↑ Bruch, S., Gai, S., Ingber, A. (2023). TOIS. DOI:10.1145/3596512.
↑ Nogueira, R., Cho, K. (2019). Passage Re‑ranking with BERT. arXiv:1901.04085.
↑ Nogueira, R., Jiang, Z., Lin, J. (2020). Document Ranking with a Pretrained Sequence‑to‑Sequence Model (MonoT5). Findings of EMNLP. arXiv:2003.06713.
↑ Santhanam, K. et al. (2022). arXiv:2205.09707.
↑ Scheerer, J.L. et al. (2025). arXiv:2501.17788.
↑ Dean, J., Barroso, L.A. (2013). The Tail at Scale. CACM 56(2):74–80. DOI:10.1145/2408776.2408794.
↑ Thakur, N. et al. (2021). BEIR: A Heterogeneous Benchmark for Zero‑shot Evaluation of IR Models. NeurIPS Datasets & Benchmarks. arXiv:2104.08663.
↑ Craswell, N. et al. (2020). Overview of the TREC 2019 Deep Learning Track. arXiv:2003.07820.
↑ Craswell, N. et al. (2021). Overview of the TREC 2020 Deep Learning Track. arXiv:2102.07662.
↑ Bajaj, P. et al. (2016). MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268.
↑ Järvelin, K., Kekäläinen, J. (2002). Cumulated Gain‑Based Evaluation of IR Techniques. Information Retrieval 6:241–256. DOI:10.1023/A:1016043826386.
↑ Dean, J., Barroso, L.A. (2013). CACM. DOI:10.1145/2408776.2408794.
↑ Bruch, S. et al. (2023). DOI:10.1145/3596512.
↑ Ni, J. et al. (2021/2022). arXiv:2112.07899.
↑ Johnson, J., Douze, M., Jégou, H. (2017). Billion‑scale Similarity Search with GPUs (FAISS). arXiv:1702.08734.
↑ Malkov, Y., Yashunin, D. (2020). HNSW. IEEE TPAMI 42(4):824–836. DOI:10.1109/TPAMI.2018.2889473.
↑ Guo, R. et al. (2020). ScaNN: Efficient Vector Similarity Search at Scale. arXiv:1908.10396.
↑ Yang, P., Fang, H., Lin, J. (2018). Anserini: Reproducible IR Research with Lucene. JDIQ 10(4):1–20. DOI:10.1145/3239571.
↑ Lin, J. et al. (2021). SIGIR. PDF.
↑ Qdrant Docs. Hybrid queries (RRF, DBSF). (2025‑09‑10閲覧). qdrant.tech/.../hybrid-queries/.
↑ Weaviate Docs. Hybrid search. (2025‑09‑10閲覧). docs.weaviate.io/weaviate/search/hybrid.
↑ pgvector GitHub. (2025‑09‑10閲覧). github.com/pgvector/pgvector.
↑ Vespa Docs. Hybrid Text Search Tutorial. (2025‑09‑10閲覧). docs.vespa.ai/.../hybrid-search.html.
↑ Elastic Docs. Reciprocal Rank Fusion. (2025‑09‑10閲覧). elastic.co/docs/.../rrf.
↑ Lewis, P. et al. (2020). Retrieval‑Augmented Generation for Knowledge‑Intensive NLP Tasks. NeurIPS. arXiv:2005.11401.
↑ Hsu, D.F., Taksa, I. (2005). 上記参照。
↑ Ni, J. et al. (2021/2022). arXiv:2112.07899.
↑ Formal, T. et al. (2021, 2022, 2024). arXiv:2107.05720; 2205.04733; 2403.06789.
↑ Lewis, P. et al. (2020). arXiv:2005.11401.
↑ Gao, L. et al. (2023). Precise Zero‑Shot Dense Retrieval without Relevance Labels (HyDE). ACL. arXiv:2212.10496.
↑ Nogueira, R. et al. (2019). Document Expansion by Query Prediction. arXiv:1904.08375; docTTTTTquery. PDF.
↑ Santhanam, K. et al. (2022). arXiv:2205.09707.
↑ Scheerer, J.L. et al. (2025). arXiv:2501.17788.
↑ Pyserini BEIR Regressions (2025‑09‑10閲覧): BM25/SPLADE/Contriever/BGE/Cohereに関するtrec‑covidの結果。 castorini.github.io/pyserini/2cr/beir.html.
↑ Robertson, S., Zaragoza, H. (2009). DOI:10.1561/1500000019.
↑ Pyserini BEIR. 上記リンク参照。
↑ Formal, T. et al. (2021, 2022). arXiv:2107.05720; 2205.04733.
↑ Pyserini BEIR.
↑ Izacard, G. et al. (2022). arXiv:2112.09118.
↑ Pyserini BEIR.
↑ Pyserini BEIR.
↑ Pyserini BEIR.
↑ Cormack et al. (2009). SIGIR. RRF.
↑ Bruch et al. (2023). TOIS.
↑ Elastic/OpenSearch RRF Docs.

[1] Robertson, S., Zaragoza, H. (2009). The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389. DOI:10.1561/1500000019.

[2] Cormack, G.V., Clarke, C.L.A., Büttcher, S. (2009). Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods. SIGIR 2009, 758–759. PDF.

[3] Bruch, S., Gai, S., Ingber, A. (2023). An Analysis of Fusion Functions for Hybrid Retrieval. ACM TOIS 42(1):1–35. DOI:10.1145/3596512 • arXiv:2210.11934.

[4] Manning, C.D., Raghavan, P., Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press. ISBN 978‑0521865715 (TF‑IDF、評価、vocabulary mismatch 問題に関する章を参照)。

[5] Izacard, G. et al. (2022). Unsupervised Dense Information Retrieval with Contrastive Learning (Contriever). TACL 10:1089–1108. arXiv:2112.09118.

[6] Wang, L. et al. (2022/2024). Text Embeddings by Weakly‑Supervised Contrastive Pre‑training (E5). arXiv:2212.03533.

[7] Robertson, S., Zaragoza, H. (2009). The Probabilistic Relevance Framework: BM25 and Beyond. DOI:10.1561/1500000019.

[8] Formal, T., Piwowarski, B., Clinchant, S. (2021). SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking. arXiv:2107.05720.

[9] Formal, T. et al. (2022). Making Sparse Neural IR Models More Effective. Findings of EMNLP. arXiv:2205.04733.

[10] Formal, T. et al. (2024). SPLADE‑v3: New baselines for SPLADE. arXiv:2403.06789.

[11] Lin, J., Ma, X. (2021). A Few Brief Notes on DeepImpact, COIL, and uniCOIL. arXiv:2106.14807.

[12] Karpukhin, V. et al. (2020). Dense Passage Retrieval for Open‑Domain QA. EMNLP. arXiv:2004.04906.

[13] Xiong, L. et al. (2021). Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval (ANCE). ICLR. arXiv:2007.00808.

[14] Izacard, G. et al. (2022). TACL. arXiv:2112.09118.

[15] Ni, J. et al. (2021/2022). Large Dual Encoders Are Generalizable Retrievers (GTR). EMNLP. arXiv:2112.07899.

[16] Wang, L. et al. (2022/2024). arXiv:2212.03533.

[17] Khattab, O., Zaharia, M. (2020). ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. SIGIR. arXiv:2004.12832.

[18] Santhanam, K. et al. (2022). ColBERTv2 & PLAID. NAACL/ArXiv. arXiv:2112.01488; arXiv:2205.09707.

[19] Scheerer, J.L. et al. (2025). WARP: An Efficient Engine for Multi‑Vector Retrieval. arXiv:2501.17788.

[20] Lin, J. et al. (2021). Pyserini: A Python Toolkit for Reproducible IR with Sparse and Dense Representations. SIGIR. PDF.

[21] Cormack, G.V., Clarke, C.L.A., Büttcher, S. (2009). SIGIR. PDF.

[22] Elastic Docs. Reciprocal Rank Fusion. (2025‑09‑10閲覧). elastic.co/docs/.../reciprocal-rank-fusion.

[23] OpenSearch Docs. Score ranker processor (RRF). (2025‑09‑10閲覧). docs.opensearch.org/.../score-ranker-processor/.

[24] Fox, E.A., Shaw, J.A. (1994). Combination of Multiple Searches. TREC‑2, NIST SP 500‑215, 243–252. PDF.

[25] Lee, J.H. (1997). Analyses of Multiple Evidence Combination. SIGIR, 267–276. DOI:10.1145/258525.258587.

[26] Hsu, D.F., Taksa, I. (2005). Comparing Rank and Score Combination Methods for Data Fusion in IR. (Tech. report). PDF.

[27] Bruch, S., Gai, S., Ingber, A. (2023). TOIS. DOI:10.1145/3596512.

[28] Hsu, D.F., Taksa, I. (2005). 上記参照。

[29] Bruch, S., Gai, S., Ingber, A. (2023). TOIS. DOI:10.1145/3596512.

[30] Nogueira, R., Cho, K. (2019). Passage Re‑ranking with BERT. arXiv:1901.04085.

[31] Nogueira, R., Jiang, Z., Lin, J. (2020). Document Ranking with a Pretrained Sequence‑to‑Sequence Model (MonoT5). Findings of EMNLP. arXiv:2003.06713.

[32] Santhanam, K. et al. (2022). arXiv:2205.09707.

[33] Scheerer, J.L. et al. (2025). arXiv:2501.17788.

[34] Dean, J., Barroso, L.A. (2013). The Tail at Scale. CACM 56(2):74–80. DOI:10.1145/2408776.2408794.

[35] Thakur, N. et al. (2021). BEIR: A Heterogeneous Benchmark for Zero‑shot Evaluation of IR Models. NeurIPS Datasets & Benchmarks. arXiv:2104.08663.

[36] Craswell, N. et al. (2020). Overview of the TREC 2019 Deep Learning Track. arXiv:2003.07820.

[37] Craswell, N. et al. (2021). Overview of the TREC 2020 Deep Learning Track. arXiv:2102.07662.

[38] Bajaj, P. et al. (2016). MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268.

[39] Järvelin, K., Kekäläinen, J. (2002). Cumulated Gain‑Based Evaluation of IR Techniques. Information Retrieval 6:241–256. DOI:10.1023/A:1016043826386.

[40] Dean, J., Barroso, L.A. (2013). CACM. DOI:10.1145/2408776.2408794.

[41] Bruch, S. et al. (2023). DOI:10.1145/3596512.

[42] Ni, J. et al. (2021/2022). arXiv:2112.07899.

[43] Johnson, J., Douze, M., Jégou, H. (2017). Billion‑scale Similarity Search with GPUs (FAISS). arXiv:1702.08734.

[44] Malkov, Y., Yashunin, D. (2020). HNSW. IEEE TPAMI 42(4):824–836. DOI:10.1109/TPAMI.2018.2889473.

[45] Guo, R. et al. (2020). ScaNN: Efficient Vector Similarity Search at Scale. arXiv:1908.10396.

[46] Yang, P., Fang, H., Lin, J. (2018). Anserini: Reproducible IR Research with Lucene. JDIQ 10(4):1–20. DOI:10.1145/3239571.

[47] Lin, J. et al. (2021). SIGIR. PDF.

[48] Qdrant Docs. Hybrid queries (RRF, DBSF). (2025‑09‑10閲覧). qdrant.tech/.../hybrid-queries/.

[49] Weaviate Docs. Hybrid search. (2025‑09‑10閲覧). docs.weaviate.io/weaviate/search/hybrid.

[50] vector GitHub. (2025‑09‑10閲覧). github.com/pgvector/pgvector.

[51] Vespa Docs. Hybrid Text Search Tutorial. (2025‑09‑10閲覧). docs.vespa.ai/.../hybrid-search.html.

[52] Elastic Docs. Reciprocal Rank Fusion. (2025‑09‑10閲覧). elastic.co/docs/.../rrf.

[53] Lewis, P. et al. (2020). Retrieval‑Augmented Generation for Knowledge‑Intensive NLP Tasks. NeurIPS. arXiv:2005.11401.

[54] Hsu, D.F., Taksa, I. (2005). 上記参照。

[55] Ni, J. et al. (2021/2022). arXiv:2112.07899.

[56] Formal, T. et al. (2021, 2022, 2024). arXiv:2107.05720; 2205.04733; 2403.06789.

[57] Lewis, P. et al. (2020). arXiv:2005.11401.

[58] Gao, L. et al. (2023). Precise Zero‑Shot Dense Retrieval without Relevance Labels (HyDE). ACL. arXiv:2212.10496.

[59] Nogueira, R. et al. (2019). Document Expansion by Query Prediction. arXiv:1904.08375; docTTTTTquery. PDF.

[60] Santhanam, K. et al. (2022). arXiv:2205.09707.

[61] Scheerer, J.L. et al. (2025). arXiv:2501.17788.

[62] Pyserini BEIR Regressions (2025‑09‑10閲覧): BM25/SPLADE/Contriever/BGE/Cohereに関するtrec‑covidの結果。 castorini.github.io/pyserini/2cr/beir.html.

[63] Robertson, S., Zaragoza, H. (2009). DOI:10.1561/1500000019.

[64] Pyserini BEIR. 上記リンク参照。

[65] Formal, T. et al. (2021, 2022). arXiv:2107.05720; 2205.04733.

[66] Pyserini BEIR.

[67] Izacard, G. et al. (2022). arXiv:2112.09118.

[68] Pyserini BEIR.

[69] Pyserini BEIR.

[70] Pyserini BEIR.

[71] Cormack et al. (2009). SIGIR. RRF.

[72] Bruch et al. (2023). TOIS.

[73] Elastic/OpenSearch RRF Docs.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]