A new classification method for encrypted internet traffic using machine learning


Ugurlu M., DOĞRU İ. A., ARSLAN R. S.

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, cilt.29, sa.5, ss.2450-2468, 2021 (SCI-Expanded) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 29 Sayı: 5
  • Basım Tarihi: 2021
  • Doi Numarası: 10.3906/elk-2011-31
  • Dergi Adı: TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.2450-2468
  • Anahtar Kelimeler: Internet traffic classification, traffic identification, machine learning, cyber security
  • Gazi Üniversitesi Adresli: Evet

Özet

The rate of internet usage in the world is over 62% and this rate is increasing day by day. With this increase, it becomes important to ensure the confidentiality of the information in the traffic flowing over the internet. Encryption algorithms and protocols are used for this purpose. This situation, which is beneficial for normal users, is also used by attackers to hide. Cyber attackers or hackers gain the ability to bypass security precautions such as IDS/IPS and antivirus systems with using encrypted traffic. Since payload analysis cannot be performed without deciphering the encrypted traffic, existing commercial security solutions fall short in this situation. In this study, it is aimed to classify the network traffic by analysing the outgoing and incoming data over the encrypted traffic using extreme gradient boosting (XGBoost), decision tree and random forest classification methods. Thus, without deciphering, it is possible to classify packets passing through encrypted traffic using some metadata like size and duration and to take precautions against attacks. ISCX VPN-NonVPN dataset was used to test the proposed model in this study. With the created framework, encrypted traffic was classified with a high success rate and 94.53% success was achieved by using the XGBoost classification method.