Machine Learning Supported Network Attack Detection with a Novel Method Based on Random Forest-Based Feature Fusion


Çinar C., Doğru İ. A., Atacak İ.

2024 17th International Conference on Information Security and Cryptology (ISCTürkiye), Ankara, Türkiye, 16 Ekim - 17 Aralık 2024, ss.1-6

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/isctrkiye64784.2024.10779248
  • Basıldığı Şehir: Ankara
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.1-6
  • Gazi Üniversitesi Adresli: Evet

Özet

Nowadays, Innovative techniques, including artificial intelligence and machine learning (ML), have been employed in attack detection systems across numerous domains to create effective detection mechanisms that identify attacks more quickly and accurately. In this study, we propose a novel machine learning-based method that utilizes Random Forest (RF)-based feature fusion (RFBFF) to detect anomalies in computer networks. This method integrates two basic procedures: feature fusion and classification. In feature fusion, unlike traditional feature selection, all features are utilized in the creation of the new data set. This allows for the incorporation of the impact of each feature on the outcome. The feature fusion procedure generates a reduced dataset by combining similar features based on RF-derived gain scores, while the classification procedure uses this refined dataset to label the attack status in the network using various ML algorithms. Specifically, Decision Tree (DT), K-Nearest Neighbor (k-NN), Additive Logistic Regression (Logitboost), and Naïve Bayes (NB) have been employed as classification algorithms. The effectiveness of the proposed method in network attack detection has been validated by comparing it with ML methods based on RF-based feature selection (RFBFS). For a robust comparison and see the performance of the model in the real world, the NSL-KDD and UNSW_NB15 data sets, which contain data packet examples of normal or attack traffic that can be encountered in the internet environment, categorized with binary classification, were preferred as the benchmark data set. Experimental results demonstrate that the proposed approach excels in detecting network attacks, achieving an accuracy rate of 0.995 on the NSL-KDD dataset and 0.924 on the UNSW-NB15 dataset. Leveraging the gain scores of all features, even if they are relatively low, has been found to positively impact the model's overall performance.