том 248 страницы 103393

The Sampling Threat when Mining Generalizable Inter-Library Usage Patterns

Тип публикацииJournal Article
Дата публикации2026-03-01
scimago Q3
wos Q3
white level БС2
SJR0.429
CiteScore4
Impact factor1.4
ISSN01676423, 18727964
Краткое описание
Tool support in software engineering often relies on relationships, regularities, patterns, or rules mined from other users’ code. Examples include approaches to bug prediction, code recommendation, and code autocompletion. Mining is typically performed on samples of code rather than the entirety of available software projects. While sampling is crucial for scaling data analysis, it can affect the generalization of the mined patterns.This paper focuses on sampling software projects filtered for specific libraries and frameworks, and on mining patterns that connect different libraries. We call these inter-library patterns. We observe that limiting the sample to a specific library may hinder the generalization of inter-library patterns, posing a threat to their use or interpretation. Using a simulation and a real case study, we show this threat for different sampling methods. Our simulation shows that only when sampling for the disjunction of both libraries involved in the implication of a pattern, the implication generalizes well. Additionally, we show that real empirical data sampled using the GitHub search API does not behave as expected from our simulation. This identifies a potential threat relevant for many studies that use the GitHub search API for studying inter-library patterns.

Вы ученый?

Создайте профиль, чтобы получать персональные рекомендации коллег, конференций и новых статей.
Метрики
0
Поделиться
Цитировать
ГОСТ |
Цитировать
Pacheco Y., De Roover C., Härtel J. The Sampling Threat when Mining Generalizable Inter-Library Usage Patterns // Science of Computer Programming. 2026. Vol. 248. p. 103393.
ГОСТ со всеми авторами (до 50) Скопировать
Pacheco Y., De Roover C., Härtel J. The Sampling Threat when Mining Generalizable Inter-Library Usage Patterns // Science of Computer Programming. 2026. Vol. 248. p. 103393.
RIS |
Цитировать
TY - JOUR
DO - 10.1016/j.scico.2025.103393
UR - https://linkinghub.elsevier.com/retrieve/pii/S0167642325001327
TI - The Sampling Threat when Mining Generalizable Inter-Library Usage Patterns
T2 - Science of Computer Programming
AU - Pacheco, Yunior
AU - De Roover, Coen
AU - Härtel, Johannes
PY - 2026
DA - 2026/03/01
PB - Elsevier
SP - 103393
VL - 248
SN - 0167-6423
SN - 1872-7964
ER -
BibTex
Цитировать
BibTex (до 50 авторов) Скопировать
@article{2026_Pacheco,
author = {Yunior Pacheco and Coen De Roover and Johannes Härtel},
title = {The Sampling Threat when Mining Generalizable Inter-Library Usage Patterns},
journal = {Science of Computer Programming},
year = {2026},
volume = {248},
publisher = {Elsevier},
month = {mar},
url = {https://linkinghub.elsevier.com/retrieve/pii/S0167642325001327},
pages = {103393},
doi = {10.1016/j.scico.2025.103393}
}
Ошибка в публикации?