Open Access
Open access
volume 15 issue 3 pages 1411

Feature Learning via Correlation Analysis for Effective Duplicate Detection

Publication typeJournal Article
Publication date2025-01-30
scimago Q2
wos Q2
SJR0.521
CiteScore5.5
Impact factor2.5
ISSN20763417
Abstract

With the growing reliance on software, the frequency of software bugs has increased significantly. To address these issues, users or developers typically submit bug reports, which developers analyze and resolve. However, many submitted bug reports are duplicates of previously reported issues, creating inefficiencies in the bug resolution process. To enhance developer productivity, an automatic method for detecting duplicate bug reports is essential. In this study, we present a novel approach for identifying duplicate and nonduplicate bug reports using feature learning through correlation analysis. Our method utilizes bug report features, including product and component information, extracted from bug repositories. The process begins with preprocessing the bug reports to ensure data quality. Next, a feature selection algorithm identifies relevant features, which are then used to train a machine learning model based on bidirectional encoder representations from transformers (BERT). The proposed model’s effectiveness was evaluated across multiple datasets: Apache, JDT, Platform, KDE, Core, Firefox, and Thunderbird. Our results show detection accuracies of 91.41%, 88.66%, 86.08%, 92.94%, 90.68%, 88.25%, and 91.62%, respectively. These outcomes represent a significant improvement of 32% to 41% compared to baseline models, including convolutional neural networks (CNNs), long short-term memory networks (LSTMs), convolutional LSTMs (CNN-LSTMs), Naive Bayes classifiers, and random forest classifiers. Our findings show that the proposed model is highly effective for duplicate bug report prediction and offers substantial advancements over existing methods. This approach has the potential to streamline bug management processes and improve overall software development efficiency.

Found 

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
0
Share
Cite this
GOST |
Cite this
GOST Copy
Yang G., Ji J., Kim T. Feature Learning via Correlation Analysis for Effective Duplicate Detection // Applied Sciences (Switzerland). 2025. Vol. 15. No. 3. p. 1411.
GOST all authors (up to 50) Copy
Yang G., Ji J., Kim T. Feature Learning via Correlation Analysis for Effective Duplicate Detection // Applied Sciences (Switzerland). 2025. Vol. 15. No. 3. p. 1411.
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.3390/app15031411
UR - https://www.mdpi.com/2076-3417/15/3/1411
TI - Feature Learning via Correlation Analysis for Effective Duplicate Detection
T2 - Applied Sciences (Switzerland)
AU - Yang, Geunseok
AU - Ji, Jinfeng
AU - Kim, Taemin
PY - 2025
DA - 2025/01/30
PB - MDPI
SP - 1411
IS - 3
VL - 15
SN - 2076-3417
ER -
BibTex |
Cite this
BibTex (up to 50 authors) Copy
@article{2025_Yang,
author = {Geunseok Yang and Jinfeng Ji and Taemin Kim},
title = {Feature Learning via Correlation Analysis for Effective Duplicate Detection},
journal = {Applied Sciences (Switzerland)},
year = {2025},
volume = {15},
publisher = {MDPI},
month = {jan},
url = {https://www.mdpi.com/2076-3417/15/3/1411},
number = {3},
pages = {1411},
doi = {10.3390/app15031411}
}
MLA
Cite this
MLA Copy
Yang, Geunseok, et al. “Feature Learning via Correlation Analysis for Effective Duplicate Detection.” Applied Sciences (Switzerland), vol. 15, no. 3, Jan. 2025, p. 1411. https://www.mdpi.com/2076-3417/15/3/1411.