STL-HDL: A new hybrid network intrusion detection system for imbalanced dataset on big data environment


COMPUTERS & SECURITY, vol.110, 2021 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 110
  • Publication Date: 2021
  • Doi Number: 10.1016/j.cose.2021.102435
  • Journal Name: COMPUTERS & SECURITY
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, PASCAL, ABI/INFORM, Aerospace Database, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Computer & Applied Sciences, Criminal Justice Abstracts, INSPEC, Metadex, Civil Engineering Abstracts
  • Keywords: Big data, Intrusion detection, Deep learning, Machine learning, Classification, Apache spark, Imbalanced data, SMOTE, Tomek-Links, DEEP LEARNING APPROACH, SMOTE
  • Gazi University Affiliated: Yes


The ability to process large amounts of data in real time using big data analytics tools brings many advantages that can be used in intrusion detection systems. Deep learning approaches have also been increasingly used in big data analysis and intrusion detection systems in recent years. In this study, a new classification-based network attack detection system is proposed on network flow traffic generating big data. In the proposed system, a Hybrid Deep Learning (HDL) network consisting of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) is used for a better intrusion detection system. In addition, data imbalance processing consisting of Synthetic Minority Oversampling Technique (SMOTE) and Tomek-Links sampling methods called STL was used to reduce the effects of data imbalance on system performance. In the study, PySpark providing Python support on Apache Spark platform in Google Colab environment was used. The multiclass evaluation of the model was made on the CIDDS-001 data set, and the binary classification evaluation was made on the UNS-NB15 data set. Nine different machine learning and deep learning algorithms have been compared to the proposed method. The results obtained were evaluated using the parameters of Accuracy, F-Measure, Precision, Recall, ROC Curve and Precision Recall Curve. As a result, the proposed method has reached 99.83% accuracy in multiclass classification and 99.17% accuracy in binary classification. According to the results, the proposed method has achieved quite successful results in detecting network attacks in imbalanced data sets compared to current methods. (c) 2021 Elsevier Ltd. All rights reserved.