Malware Detection Using Memory Analysis Data in Big Data Environment


DENER M., Ok G., ORMAN A.

APPLIED SCIENCES-BASEL, vol.12, no.17, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 12 Issue: 17
  • Publication Date: 2022
  • Doi Number: 10.3390/app12178604
  • Journal Name: APPLIED SCIENCES-BASEL
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Agricultural & Environmental Science Database, Applied Science & Technology Source, Communication Abstracts, INSPEC, Metadex, Directory of Open Access Journals, Civil Engineering Abstracts
  • Keywords: malware memory analysis, big data, machine learning, deep learning, Apache spark, classification, MACHINE LEARNING TECHNIQUES
  • Gazi University Affiliated: Yes

Abstract

Malware is a significant threat that has grown with the spread of technology. This makes detecting malware a critical issue. Static and dynamic methods are widely used in the detection of malware. However, traditional static and dynamic malware detection methods may fall short in advanced malware detection. Data obtained through memory analysis can provide important insights into the behavior and patterns of malware. This is because malwares leave various traces on memories. For this reason, the memory analysis method is one of the issues that should be studied in malware detection. In this study, the use of memory data in malware detection is suggested. Malware detection was carried out by using various deep learning and machine learning approaches in a big data environment with memory data. This study was carried out with Pyspark on Apache Spark big data platform in Google Colaboratory. Experiments were performed on the balanced CIC-MalMem-2022 dataset. Binary classification was made using Random Forest, Decision Tree, Gradient Boosted Tree, Logistic Regression, Naive Bayes, Linear Vector Support Machine, Multilayer Perceptron, Deep Feed Forward Neural Network, and Long Short-Term Memory algorithms. The performances of the algorithms used have been compared. The results were evaluated using the Accuracy, F1-score, Precision, Recall, and AUC performance metrics. As a result, the most successful malware detection was obtained with the Logistic Regression algorithm, with an accuracy level of 99.97% in malware detection by memory analysis. Gradient Boosted Tree follows the Logistic Regression algorithm with 99.94% accuracy. The Naive Bayes algorithm showed the lowest performance in malware analysis with memory data, with an accuracy of 98.41%. In addition, many of the algorithms used have achieved very successful results. According to the results obtained, the data obtained from memory analysis is very useful in detecting malware. In addition, deep learning and machine learning approaches were trained with memory datasets and achieved very successful results in malware detection.