Open Access
Lecture Notes in Computer Science, pages 147-161
How Train–Test Leakage Affects Zero-Shot Retrieval
Publication type: Book Chapter
Publication date: 2022-10-31
Journal:
Lecture Notes in Computer Science
Q2
SJR: 0.606
CiteScore: 2.6
Impact factor: —
ISSN: 03029743, 16113349, 18612075, 18612083
Abstract
Neural retrieval models are often trained on (subsets of) the millions of queries of the MS MARCO/ORCAS datasets and then tested on the 250 Robust04 queries or other TREC benchmarks with often only 50 queries. In such setups, many of the few test queries can be very similar to queries from the huge training data—in fact, 69% of the Robust04 queries have near-duplicates in MS MARCO/ORCAS. We investigate the impact of this unintended train–test leakage by training neural retrieval models on combinations of a fixed number of MS MARCO/ORCAS queries, which are very similar to actual test queries, and an increasing number of other queries. We find that leakage can improve effectiveness and even change the ranking of systems. However, these effects diminish the smaller, and thus more realistic, the extent of leakage is in all training instances.
Found
Found
Top-30
Journals
1
|
|
Electronics (Switzerland)
1 publication, 100%
|
|
1
|
Publishers
1
|
|
MDPI
1 publication, 100%
|
|
1
|
- We do not take into account publications without a DOI.
- Statistics recalculated only for publications connected to researchers, organizations and labs registered on the platform.
- Statistics recalculated weekly.
Are you a researcher?
Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
Cite this
GOST |
RIS |
BibTex
Cite this
GOST
Copy
Fröbe M. et al. How Train–Test Leakage Affects Zero-Shot Retrieval // Lecture Notes in Computer Science. 2022. pp. 147-161.
GOST all authors (up to 50)
Copy
Fröbe M., Akiki C., Potthast M., Hagen M. How Train–Test Leakage Affects Zero-Shot Retrieval // Lecture Notes in Computer Science. 2022. pp. 147-161.
Cite this
RIS
Copy
TY - GENERIC
DO - 10.1007/978-3-031-20643-6_11
UR - https://doi.org/10.1007/978-3-031-20643-6_11
TI - How Train–Test Leakage Affects Zero-Shot Retrieval
T2 - Lecture Notes in Computer Science
AU - Fröbe, Maik
AU - Akiki, Christopher
AU - Potthast, Martin
AU - Hagen, Matthias
PY - 2022
DA - 2022/10/31
PB - Springer Nature
SP - 147-161
SN - 0302-9743
SN - 1611-3349
SN - 1861-2075
SN - 1861-2083
ER -
Cite this
BibTex (up to 50 authors)
Copy
@incollection{2022_Fröbe,
author = {Maik Fröbe and Christopher Akiki and Martin Potthast and Matthias Hagen},
title = {How Train–Test Leakage Affects Zero-Shot Retrieval},
publisher = {Springer Nature},
year = {2022},
pages = {147--161},
month = {oct}
}