Intrusion Detection Method Based on Preprocessing of Highly Correlated and Imbalanced Data
This paper examines traditional machine learning algorithms, neural networks, and the benefits of utilizing ensemble models. Data preprocessing methods for improving the quality of classification models are considered. To balance the classes, Undersampling, Oversampling, and their combination (Over + Undersampling) algorithms are explored. A procedure for reducing feature correlation is proposed. Classification models based on meta-algorithms such as SVM, KNN Naive Bayes, Perceptron, Bagging, Random Forest, AdaBoost, and Gradient Boosting have been thoroughly investigated. The settings of the base classifiers and meta-algorithm parameters have been optimized. The best result was obtained by using an ensemble classifier based on the Random Forest algorithm. Thus, an intrusion detection method based on the preprocessing of highly correlated and imbalanced data has been proposed. The scientific novelty of the obtained results lies in the integrated use of the developed procedure for reducing feature correlation, the application of the SMOTEENN data balancing method, the selection of an appropriate classifier, and the fine tuning of its parameters. The integration of these procedures and methods resulted in a higher F1 score, reduced training time, and faster recognition speed for the model. This allows us to recommend this method for practical use to improve the quality of network intrusion detection.