Chan, Jia Lin (2022) Evaluating oversampling techniques for network intrusion detection data. Final Year Project, UTAR.
Abstract
In this digital era, the amount of information being exchanged over the networks has increased exponentially due to technological advancement. Thus, cyberattacks have in creased in tandem with the exponential expansion of digitalisation worldwide. As a result, implementing an IDS is one of the approaches to overcome the security problem in the network. Many network intrusion data sets are introduced and used as a benchmark to train predictive models and evaluate the IDS. However, the unbalanced class distribution in network intrusion data sets has become a significant challenge in building classification models, leading to low intrusion detection rates (DR). This research i dentified four unbalanced network intrusion detection data sets: UNSWNB15, NSL KDD, CICIDS2017, and CICDDOS 2019, with low detection rates in minority attack classes. Five oversampling techniques: ROS, SMOTE, Borderline SMOTE (BSMOTE), ADASYN and KMean S MOTE (KMSMOTE), were then applied to the minority attack classes in the datasets. Eventually, models, i.e., Gaussian Bayes, Logistic Regression and Decision Tree, were built using the data sets, and the model performance was compared. According to the analysis, each data set has a different oversampling method that outperforms. KMSMOTE outperforms in UNSW NB15, ROS excels in NSL KDD, and SMOTE outperforms in CICIDS 2017 and CICDDOS 2019, while SMOTE has the highest number of topperforming occurrences among all data sets. In general, oversampling can increase the detection rate (DR) for the minority attack classes, the DR increment ranging from 11.93 % in CICDDOS 2019 to a maximum of 20.02 % in NSL KDD.
Actions (login required)