The Sampling Threat when Mining Generalizable Inter-Library Usage Patterns
Тип публикации: Journal Article
Дата публикации: 2026-03-01
scimago Q3
wos Q3
white level БС2
SJR: 0.429
CiteScore: 4
Impact factor: 1.4
ISSN: 01676423, 18727964
Краткое описание
Tool support in software engineering often relies on relationships, regularities, patterns, or rules mined from other users’ code. Examples include approaches to bug prediction, code recommendation, and code autocompletion. Mining is typically performed on samples of code rather than the entirety of available software projects. While sampling is crucial for scaling data analysis, it can affect the generalization of the mined patterns.This paper focuses on sampling software projects filtered for specific libraries and frameworks, and on mining patterns that connect different libraries. We call these inter-library patterns. We observe that limiting the sample to a specific library may hinder the generalization of inter-library patterns, posing a threat to their use or interpretation. Using a simulation and a real case study, we show this threat for different sampling methods. Our simulation shows that only when sampling for the disjunction of both libraries involved in the implication of a pattern, the implication generalizes well. Additionally, we show that real empirical data sampled using the GitHub search API does not behave as expected from our simulation. This identifies a potential threat relevant for many studies that use the GitHub search API for studying inter-library patterns.
Найдено
Ничего не найдено, попробуйте изменить настройки фильтра.
Вы ученый?
Создайте профиль, чтобы получать персональные рекомендации коллег, конференций и новых статей.
Метрики
0
Всего цитирований:
0
Цитировать
ГОСТ |
RIS |
BibTex
Цитировать
ГОСТ
Скопировать
Pacheco Y., De Roover C., Härtel J. The Sampling Threat when Mining Generalizable Inter-Library Usage Patterns // Science of Computer Programming. 2026. Vol. 248. p. 103393.
ГОСТ со всеми авторами (до 50)
Скопировать
Pacheco Y., De Roover C., Härtel J. The Sampling Threat when Mining Generalizable Inter-Library Usage Patterns // Science of Computer Programming. 2026. Vol. 248. p. 103393.
Цитировать
RIS
Скопировать
TY - JOUR
DO - 10.1016/j.scico.2025.103393
UR - https://linkinghub.elsevier.com/retrieve/pii/S0167642325001327
TI - The Sampling Threat when Mining Generalizable Inter-Library Usage Patterns
T2 - Science of Computer Programming
AU - Pacheco, Yunior
AU - De Roover, Coen
AU - Härtel, Johannes
PY - 2026
DA - 2026/03/01
PB - Elsevier
SP - 103393
VL - 248
SN - 0167-6423
SN - 1872-7964
ER -
Цитировать
BibTex (до 50 авторов)
Скопировать
@article{2026_Pacheco,
author = {Yunior Pacheco and Coen De Roover and Johannes Härtel},
title = {The Sampling Threat when Mining Generalizable Inter-Library Usage Patterns},
journal = {Science of Computer Programming},
year = {2026},
volume = {248},
publisher = {Elsevier},
month = {mar},
url = {https://linkinghub.elsevier.com/retrieve/pii/S0167642325001327},
pages = {103393},
doi = {10.1016/j.scico.2025.103393}
}
Ошибка в публикации?