IRaDT: LLVM IR as Target for Efficient Neural Decompilation

Publication typeJournal Article
Publication date2024-10-17
scimago Q3
wos Q4
SJR0.206
CiteScore1.8
Impact factor0.6
ISSN02181940, 17936403
Abstract

Decompilation is a widely utilized technique in reverse engineering, aimed at restoring binary code to human-readable high-level language code. However, the readability of the output from traditional decompilers is often poor. With advancements in language models, several learning-based decompilation methods have emerged. Nevertheless, the probabilistic nature of language models leads to outputs whose correctness cannot be guaranteed, necessitating further analysis by engineers to identify the corresponding functionality of the code. Inspired by compiler toolchains, we propose a novel approach to enhance the effectiveness of language models in decompilation tasks. Traditional rule-based methods and learning-based techniques are fused together in our approach, drawing insights from both paradigms. Specifically, we present a pre-trained sequence-to-sequence model called IRaDT tailored to refine decompilation outputs at the intermediate representation level. Through this hybridization, we aim to address the limitations of existing methodologies and achieve more accurate and robust decompilation. We construct a diverse decompilation dataset targeting IR and evaluated IRaDT based on this dataset. The experimental results indicate that IRaDT has the ability to improve the readability of IR while ensuring its compileability, achieving a 74% improvement compared to RetDec and a 93% improvement compared to ChatGPT.

Found 

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
0
Share
Cite this
GOST |
Cite this
GOST Copy
Li Y. et al. IRaDT: LLVM IR as Target for Efficient Neural Decompilation // International Journal of Software Engineering and Knowledge Engineering. 2024. Vol. 34. No. 12. pp. 1971-1992.
GOST all authors (up to 50) Copy
Li Y., Xu T., Wang C. IRaDT: LLVM IR as Target for Efficient Neural Decompilation // International Journal of Software Engineering and Knowledge Engineering. 2024. Vol. 34. No. 12. pp. 1971-1992.
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1142/s0218194024500463
UR - https://www.worldscientific.com/doi/10.1142/S0218194024500463
TI - IRaDT: LLVM IR as Target for Efficient Neural Decompilation
T2 - International Journal of Software Engineering and Knowledge Engineering
AU - Li, Yuzhang
AU - Xu, Tao
AU - Wang, Chunlu
PY - 2024
DA - 2024/10/17
PB - World Scientific
SP - 1971-1992
IS - 12
VL - 34
SN - 0218-1940
SN - 1793-6403
ER -
BibTex |
Cite this
BibTex (up to 50 authors) Copy
@article{2024_Li,
author = {Yuzhang Li and Tao Xu and Chunlu Wang},
title = {IRaDT: LLVM IR as Target for Efficient Neural Decompilation},
journal = {International Journal of Software Engineering and Knowledge Engineering},
year = {2024},
volume = {34},
publisher = {World Scientific},
month = {oct},
url = {https://www.worldscientific.com/doi/10.1142/S0218194024500463},
number = {12},
pages = {1971--1992},
doi = {10.1142/s0218194024500463}
}
MLA
Cite this
MLA Copy
Li, Yuzhang, et al. “IRaDT: LLVM IR as Target for Efficient Neural Decompilation.” International Journal of Software Engineering and Knowledge Engineering, vol. 34, no. 12, Oct. 2024, pp. 1971-1992. https://www.worldscientific.com/doi/10.1142/S0218194024500463.