Exploratory data analysis, time series analysis, crime type prediction, and trend forecasting in crime data using machine learning, deep learning, and statistical methods


İlgün E. G., DENER M.

Neural Computing and Applications, cilt.37, sa.18, ss.11773-11798, 2025 (Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 37 Sayı: 18
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1007/s00521-025-11094-9
  • Dergi Adı: Neural Computing and Applications
  • Derginin Tarandığı İndeksler: Scopus, Compendex, Index Islamicus, INSPEC, zbMATH
  • Sayfa Sayıları: ss.11773-11798
  • Anahtar Kelimeler: Crime data, Crime-type prediction, Time series analysis, Trend forecasting, Visualization
  • Gazi Üniversitesi Adresli: Evet

Özet

Criminal activities are a critical obstacle to socioeconomic development and must be controlled. However, human surveillance-based control methods are prone to error, raise legal concerns, and necessitate the development of more robust alternatives. This study aims to contribute to the development of strategies for reducing and preventing crime by ensuring the optimal allocation of police resources to locations at the right time. To achieve this goal, crime datasets from three of the most metropolitan cities in the USA—San Francisco, Chicago, and Philadelphia—were subjected to comprehensive preprocessing and exploratory data analysis. The analysis identified the most reliable and dangerous months, days, and hours in terms of the frequency of criminal incidents, the most common types of crimes, and the police districts with the highest crime rates. Crime-type prediction models were developed using machine learning algorithms, including XGBoost, CatBoost, random forest (RF), decision tree (DT), multilayer perceptron (MLP), K-nearest neighbors (KNN), Gaussian Naive Bayes (GNB), and logistic regression (LR). Additionally, time series analyses were conducted in 10, 22, and 22 different police districts for the three datasets, respectively, using deep learning models such as long short-term memory (LSTM) and bidirectional long short-term memory (BLSTM) and statistical methods such as Holt–Winters exponential smoothing (HWES), Prophet, and seasonal autoregressive integrated moving average (SARIMA). The primary objective was to accurately predict future high-crime hot spots. Furthermore, crime trend forecasts for the next 5 years were made using the best models, based on the lowest root-mean-squared error (RMSE) values obtained through statistical methods. By combining traditional machine learning methods, deep learning approaches, and statistical techniques, this study analyzed criminal incidents from various perspectives, including crime-type prediction, regional crime prediction, trend forecasting, and exploratory data analysis. The results obtained are expected to contribute to the development of proactive policing strategies.