Semantic Code Clone Detection Based on Community Detection

Publication typeJournal Article
Publication date2024-07-26
scimago Q3
wos Q4
SJR0.206
CiteScore1.8
Impact factor0.6
ISSN02181940, 17936403
Abstract

Semantic code clone detection is to find code snippets that are structurally or syntactically different, but semantically identical. It plays an important role in software reuse, code compression. Many existing studies have achieved good performance in non-semantic clone, but semantic clone is still a challenging task. Recently, several works have used tree or graph, such as Abstract Syntax Tree (AST), Control Flow Graph (CFG) or Program Dependency Graph (PDG) to extract semantic information from source codes. In order to reduce the complexity of tree and graph, some studies transform them into node sequences. However, this transformation will lose some semantic information. To address this issue, we propose a novel high-performance method that utilizes community detection to extract features of AST while preserving its semantic information. First, based on the AST of source code, we exploit community detection to split AST into different subtrees to extract the underlying semantics information of different code blocks, and use centrality analysis to quantify the semantic information as the weight of AST nodes. Then, the AST is converted into a sequence of tokens with weights, and a Siamese neural network model is used to detect the similarity of token sequences for semantic code clone detection. Finally, to evaluate our approach, we conduct experiments on two standard benchmark datasets, Google Code Jam (GCJ) and BigCloneBench (BCB). Experimental results show that our model outperforms the eight publicly available state-of-the-art methods in detecting code clones. It is five times faster than the tree-based method (ASTNN) in terms of time complexity.

Found 

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
0
Share
Cite this
GOST |
Cite this
GOST Copy
Wan Z. et al. Semantic Code Clone Detection Based on Community Detection // International Journal of Software Engineering and Knowledge Engineering. 2024. Vol. 34. No. 10. pp. 1661-1692.
GOST all authors (up to 50) Copy
Wan Z., Xie C., Lv Q., Fan Y. Semantic Code Clone Detection Based on Community Detection // International Journal of Software Engineering and Knowledge Engineering. 2024. Vol. 34. No. 10. pp. 1661-1692.
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1142/s0218194024500323
UR - https://www.worldscientific.com/doi/10.1142/S0218194024500323
TI - Semantic Code Clone Detection Based on Community Detection
T2 - International Journal of Software Engineering and Knowledge Engineering
AU - Wan, Zexuan
AU - Xie, Chunli
AU - Lv, Quanrun
AU - Fan, Yasheng
PY - 2024
DA - 2024/07/26
PB - World Scientific
SP - 1661-1692
IS - 10
VL - 34
SN - 0218-1940
SN - 1793-6403
ER -
BibTex |
Cite this
BibTex (up to 50 authors) Copy
@article{2024_Wan,
author = {Zexuan Wan and Chunli Xie and Quanrun Lv and Yasheng Fan},
title = {Semantic Code Clone Detection Based on Community Detection},
journal = {International Journal of Software Engineering and Knowledge Engineering},
year = {2024},
volume = {34},
publisher = {World Scientific},
month = {jul},
url = {https://www.worldscientific.com/doi/10.1142/S0218194024500323},
number = {10},
pages = {1661--1692},
doi = {10.1142/s0218194024500323}
}
MLA
Cite this
MLA Copy
Wan, Zexuan, et al. “Semantic Code Clone Detection Based on Community Detection.” International Journal of Software Engineering and Knowledge Engineering, vol. 34, no. 10, Jul. 2024, pp. 1661-1692. https://www.worldscientific.com/doi/10.1142/S0218194024500323.