Open Access

Open access

Applied Sciences (Switzerland), volume 15, issue 6, pages 2978

Deep Defense Against Mal-Doc: Utilizing Transformer and SeqGAN for Detecting and Classifying Document Type Malware

Gati Lother Martin ¹

,

Sang-Min Lee ²

,

Jonghyun Kim ³

,

Young‐Seob Jeong ⁴

,

Ah Reum Kang ⁵

,

Jiyoung Woo ¹

Hide authors affiliations

¹

Department of Future Convergence Technology, Soonchunhyang University, Asan 31538, Republic of Korea |

²

Electronics and Telecommunications Research Institute, Daejeon 34129, Republic of Korea |

³

Department of Information Security, Sejong University, Seoul 05006, Republic of Korea

⁴

Department of Computer Engineering, Chungbuk National University, Cheongju 28644, Republic of Korea |

⁵

Department of Information Security, PaiChai University, Daejeon 35345, Republic of Korea

Publication type: Journal Article

Publication date: 2025-03-10

MDPI

MDPI

Journal: Applied Sciences (Switzerland)

scimago Q2

SJR: 0.508

CiteScore: 5.3

Impact factor: 2.5

ISSN: 20763417

DOI: 10.3390/app15062978

Copy DOI

Abstract

The prevalence of non-executable malware is on the rise, presenting a major threat to users, including major public institutions and corporations. While extensive research has been conducted on detecting malware threats, there is a noticeable gap in studying document-type malware compared with executable files. The proposed model will solve this gap by detecting and classifying document-type malware families using script codes, including tags, to write documents and script languages to execute malicious functions. These script codes offer insights into how the malware was constructed and operates on the victim’s system. Additionally, we leverage language models in our approach. Initially, we develop MalCode2Vec to learn associations between source codes and represent them as numeric vectors. Subsequently, we design a Transformer-based model for document malware detection and family classification. Detection is conducted at both the stream and file levels. To solve the class imbalance issue in the malware family, we utilize a generative adversarial network to generate malware samples. Our experimental domain focuses on the Hangul (Korean) word processor, a tool notably used by North Korea in targeting the South Korean government.

Found 24

By date By citations

Springer Nature

File-level malware detection using byte streams

Jeong Y., Mswahili M.E., Kang A.R.

Scientific Reports scimago Q1 wos Q1 Open Access

Open Access

,

2023-06-01, citations by CoLab: 2 , PDF, Abstract

MDPI

MalBERTv2: Code Aware BERT-Based Model for Malware Identification

Rahali A., Akhloufi M.A.

Big Data and Cognitive Computing scimago Q2 wos Q1 Open Access

Open Access

,

2023-03-24, citations by CoLab: 22 , PDF, Abstract

Wiley

Malware detection method based on image analysis and generative adversarial networks

Liu Y., Li J., Liu B., Gao X., Liu X.

Concurrency Computation Practice and Experience scimago Q2 wos Q2 ,

2022-07-08, citations by CoLab: 5

Institute of Electrical and Electronics Engineers (IEEE)

Static Malware Detection Using Stacked BiLSTM and GPT-2

Demirci D., Sahin N., Sirlancis M., Acarturk C.

IEEE Access scimago Q1 wos Q2 Open Access

Open Access

,

2022-05-30, citations by CoLab: 36 , Abstract

MDPI

North Korea’s Cyber Capabilities and Their Implications for International Security

Kim M.

Sustainability scimago Q1 wos Q2 Open Access

Open Access

,

2022-02-02, citations by CoLab: 9 , PDF, Abstract

Institute of Electrical and Electronics Engineers (IEEE)

MalBERT: Malware Detection using Bidirectional Encoder Representations from Transformers

Rahali A., Akhloufi M.A.

2021-10-17, citations by CoLab: 29 , Abstract

Wiley

Boosting training for PDF malware classifier via active learning

Li Y., Wang X., Shi Z., Zhang R., Xue J., Wang Z.

International Journal of Intelligent Systems scimago Q1 wos Q1 ,

2021-05-16, citations by CoLab: 22 , PDF

Springer Nature

Detection of malicious javascript on an imbalanced dataset

Phung N.M., Mimura M.

Internet of Things scimago Q1 wos Q1 ,

2021-03-01, citations by CoLab: 22 , Abstract

Wiley

Fileless cyberattacks: Analysis and classification

Lee G., Shim S., Cho B., Kim T., Kim K.

ETRI Journal scimago Q2 wos Q4 Open Access

Open Access

,

2020-12-17, citations by CoLab: 17 , PDF

Wiley

COVID ‐19 pandemic cybersecurity issues

Pranggono B., Arabo A.

Internet Technology Letters scimago Q3 wos Q4 ,

2020-10-14, citations by CoLab: 132 , Abstract

MDPI

Malware Detection of Hangul Word Processor Files Using Spatial Pyramid Average Pooling

Jeong Y., Woo J., Lee S., Kang A.R.

Sensors scimago Q1 wos Q2 Open Access

Open Access

,

2020-09-15, citations by CoLab: 7 , PDF, Abstract

MDPI

Attention-Based Automated Feature Extraction for Malware Analysis

Choi S., Bae J., Lee C., Kim Y., Kim J.

Sensors scimago Q1 wos Q2 Open Access

Open Access

,

2020-05-20, citations by CoLab: 25 , PDF, Abstract

Institute of Electrical and Electronics Engineers (IEEE)

Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results

Mohammed R., Rawashdeh J., Abdullah M.

2020-04-28, citations by CoLab: 395 , Abstract

Taylor & Francis

Malware Detection in PDF and Office Documents: A survey

Singh P., Tapaswi S., Gupta S.

Information Security Journal scimago Q2 wos Q3 ,

2020-02-13, citations by CoLab: 28 , Abstract

Institute of Electrical and Electronics Engineers (IEEE)

ScriptNet: Neural Static Analysis for Malicious JavaScript Detection

Stokes J.W., Agrawal R., McDonald G., Hausknecht M.

2019-11-01, citations by CoLab: 8 , Abstract

1
2

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.

Login with ORCID

Publication PDF

Metrics

Share

Cite this

GOST | RIS | BibTex | MLA

Found error?

Publisher

MDPI

MDPI

Journal

Applied Sciences (Switzerland)

scimago Q2

SJR

0.508

CiteScore

5.3

Impact factor

2.5

ISSN

20763417 (Electronic)