Open Access
Open access
volume 15 issue 8 pages 4243

Intrusion Detection Method Based on Preprocessing of Highly Correlated and Imbalanced Data

Serhii Semenov 1
Magdalena Krupska-Klimczak 1
Roman Czapla 1
Beata Krzaczek 1
Svitlana Gavrylenko 2
Vadym Poltoratskyi 2
Zozulia Vladislav 2
2
 
Department of “Computer Engineering and Programming”, National Technical University «Kharkiv Polytechnic Institute», 61000 Kharkiv, Ukraine
Publication typeJournal Article
Publication date2025-04-11
scimago Q2
wos Q2
SJR0.521
CiteScore5.5
Impact factor2.5
ISSN20763417
Abstract

This paper examines traditional machine learning algorithms, neural networks, and the benefits of utilizing ensemble models. Data preprocessing methods for improving the quality of classification models are considered. To balance the classes, Undersampling, Oversampling, and their combination (Over + Undersampling) algorithms are explored. A procedure for reducing feature correlation is proposed. Classification models based on meta-algorithms such as SVM, KNN Naive Bayes, Perceptron, Bagging, Random Forest, AdaBoost, and Gradient Boosting have been thoroughly investigated. The settings of the base classifiers and meta-algorithm parameters have been optimized. The best result was obtained by using an ensemble classifier based on the Random Forest algorithm. Thus, an intrusion detection method based on the preprocessing of highly correlated and imbalanced data has been proposed. The scientific novelty of the obtained results lies in the integrated use of the developed procedure for reducing feature correlation, the application of the SMOTEENN data balancing method, the selection of an appropriate classifier, and the fine tuning of its parameters. The integration of these procedures and methods resulted in a higher F1 score, reduced training time, and faster recognition speed for the model. This allows us to recommend this method for practical use to improve the quality of network intrusion detection.

Found 
Found 

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
3
Share
Cite this
GOST |
Cite this
GOST Copy
Semenov S. et al. Intrusion Detection Method Based on Preprocessing of Highly Correlated and Imbalanced Data // Applied Sciences (Switzerland). 2025. Vol. 15. No. 8. p. 4243.
GOST all authors (up to 50) Copy
Semenov S., Krupska-Klimczak M., Czapla R., Krzaczek B., Gavrylenko S., Poltoratskyi V., Vladislav Z. Intrusion Detection Method Based on Preprocessing of Highly Correlated and Imbalanced Data // Applied Sciences (Switzerland). 2025. Vol. 15. No. 8. p. 4243.
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.3390/app15084243
UR - https://www.mdpi.com/2076-3417/15/8/4243
TI - Intrusion Detection Method Based on Preprocessing of Highly Correlated and Imbalanced Data
T2 - Applied Sciences (Switzerland)
AU - Semenov, Serhii
AU - Krupska-Klimczak, Magdalena
AU - Czapla, Roman
AU - Krzaczek, Beata
AU - Gavrylenko, Svitlana
AU - Poltoratskyi, Vadym
AU - Vladislav, Zozulia
PY - 2025
DA - 2025/04/11
PB - MDPI
SP - 4243
IS - 8
VL - 15
SN - 2076-3417
ER -
BibTex |
Cite this
BibTex (up to 50 authors) Copy
@article{2025_Semenov,
author = {Serhii Semenov and Magdalena Krupska-Klimczak and Roman Czapla and Beata Krzaczek and Svitlana Gavrylenko and Vadym Poltoratskyi and Zozulia Vladislav},
title = {Intrusion Detection Method Based on Preprocessing of Highly Correlated and Imbalanced Data},
journal = {Applied Sciences (Switzerland)},
year = {2025},
volume = {15},
publisher = {MDPI},
month = {apr},
url = {https://www.mdpi.com/2076-3417/15/8/4243},
number = {8},
pages = {4243},
doi = {10.3390/app15084243}
}
MLA
Cite this
MLA Copy
Semenov, Serhii, et al. “Intrusion Detection Method Based on Preprocessing of Highly Correlated and Imbalanced Data.” Applied Sciences (Switzerland), vol. 15, no. 8, Apr. 2025, p. 4243. https://www.mdpi.com/2076-3417/15/8/4243.