Open Access

Applied Sciences (Switzerland)

, volume 15 , issue 3 , pages 1411

Feature Learning via Correlation Analysis for Effective Duplicate Detection

Geunseok Yang ¹

Jinfeng Ji ²

Taemin Kim ³

Hide authors affiliations Show authors affiliations: 3 affiliations

Department of Computer Applied Mathematics (Computer System Institute), Hankyong National University, Anseong-si 17579, Republic of Korea |

Department of Computer Applied Mathematics, Hankyong National University, Anseong-si 17579, Republic of Korea |

Department of Computer Engineering, Kyungnam University, Changwon-si 51767, Republic of Korea |

Publication type: Journal Article

Publication date: 2025-01-30

MDPI

Applied Sciences (Switzerland)

scimago Q2

wos Q2

SJR: 0.521

CiteScore: 5.5

Impact factor: 2.5

ISSN: 20763417

DOI: 10.3390/app15031411

Copy DOI

Abstract

With the growing reliance on software, the frequency of software bugs has increased significantly. To address these issues, users or developers typically submit bug reports, which developers analyze and resolve. However, many submitted bug reports are duplicates of previously reported issues, creating inefficiencies in the bug resolution process. To enhance developer productivity, an automatic method for detecting duplicate bug reports is essential. In this study, we present a novel approach for identifying duplicate and nonduplicate bug reports using feature learning through correlation analysis. Our method utilizes bug report features, including product and component information, extracted from bug repositories. The process begins with preprocessing the bug reports to ensure data quality. Next, a feature selection algorithm identifies relevant features, which are then used to train a machine learning model based on bidirectional encoder representations from transformers (BERT). The proposed model’s effectiveness was evaluated across multiple datasets: Apache, JDT, Platform, KDE, Core, Firefox, and Thunderbird. Our results show detection accuracies of 91.41%, 88.66%, 86.08%, 92.94%, 90.68%, 88.25%, and 91.62%, respectively. These outcomes represent a significant improvement of 32% to 41% compared to baseline models, including convolutional neural networks (CNNs), long short-term memory networks (LSTMs), convolutional LSTMs (CNN-LSTMs), Naive Bayes classifiers, and random forest classifiers. Our findings show that the proposed model is highly effective for duplicate bug report prediction and offers substantial advancements over existing methods. This approach has the potential to streamline bug management processes and improve overall software development efficiency.

Found

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.

PDF

Metrics

Cite this

GOST |

Cite this

GOST Copy

Yang G., Ji J., Kim T. Feature Learning via Correlation Analysis for Effective Duplicate Detection // Applied Sciences (Switzerland). 2025. Vol. 15. No. 3. p. 1411.

GOST all authors (up to 50) Copy

Yang G., Ji J., Kim T. Feature Learning via Correlation Analysis for Effective Duplicate Detection // Applied Sciences (Switzerland). 2025. Vol. 15. No. 3. p. 1411.

RIS |

Cite this

RIS Copy

TY - JOUR

DO - 10.3390/app15031411

UR - https://www.mdpi.com/2076-3417/15/3/1411

TI - Feature Learning via Correlation Analysis for Effective Duplicate Detection

T2 - Applied Sciences (Switzerland)

AU - Yang, Geunseok

AU - Ji, Jinfeng

AU - Kim, Taemin

PY - 2025

DA - 2025/01/30

PB - MDPI

SP - 1411

IS - 3

VL - 15

SN - 2076-3417

ER -

BibTex |

Cite this

BibTex (up to 50 authors) Copy

@article{2025_Yang,

author = {Geunseok Yang and Jinfeng Ji and Taemin Kim},

title = {Feature Learning via Correlation Analysis for Effective Duplicate Detection},

journal = {Applied Sciences (Switzerland)},

year = {2025},

volume = {15},

publisher = {MDPI},

month = {jan},

url = {https://www.mdpi.com/2076-3417/15/3/1411},

number = {3},

pages = {1411},

doi = {10.3390/app15031411}

}

MLA

Cite this

MLA Copy

Yang, Geunseok, et al. “Feature Learning via Correlation Analysis for Effective Duplicate Detection.” Applied Sciences (Switzerland), vol. 15, no. 3, Jan. 2025, p. 1411. https://www.mdpi.com/2076-3417/15/3/1411.

Publisher

MDPI

Journal

Applied Sciences (Switzerland)

scimago Q2

wos Q2

SJR

0.521

CiteScore

5.5

Impact factor

2.5

ISSN

20763417 (Electronic)