A Hybrid Deep Learning Model Based on Local and Global Features for Amazon Product Reviews: An Optimal ALBERT-Cascade CNN Approach


Creative Commons License

Abbas I. M., Atacak İ., Toklu S., Barışçı N., Doğru İ. A.

Applied Sciences, cilt.16, sa.1, ss.1-22, 2025 (SCI-Expanded, Scopus)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 16 Sayı: 1
  • Basım Tarihi: 2025
  • Doi Numarası: 10.3390/app16010025
  • Dergi Adı: Applied Sciences
  • Derginin Tarandığı İndeksler: Scopus, Science Citation Index Expanded (SCI-EXPANDED)
  • Sayfa Sayıları: ss.1-22
  • Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
  • Gazi Üniversitesi Adresli: Evet

Özet

Natural Language Processing (NLP) is a valuable technology and business topic as it helps turn data into useful information with the spread of digital information. Nevertheless, there are some difficulties in its use, including the language’s complexity and the data quality. To address these challenges, in this study, the researchers first performed a series of ablation experiments on 14 models derived from various variations in Deep Learning (DL) methods, including A Lite BERT (ALBERT) together with Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Max Pooling layer, and attention mechanism. Subsequently, they proposed an ALBERT-cascaded CNN hybrid model as an effective method to overcome the related challenges by evaluating the performance results obtained from these models. In the proposed model, a transformer architecture with parallel processing capability for both word and subword tokenization is used in addition to creating contextualized word embeddings. Local and global feature extraction was also performed using two 1-D CNN blocks before classification to improve the model performance. The model was optimized using an advanced hyperparameter optimization tool called OPTUNA. The findings of the experiment conducted with the proposed model were obtained based on Amazon Fashion 2023 data under 5-fold cross-validation conditions. The experimental results demonstrate that the proposed hybrid model exhibits good performance with average scores of 0.9308 (accuracy), 0.9296 (F1 score), 0.9412 (precision), 0.9182 (recall), and 0.9797 (AUC) in the validation dataset, and scores of 0.9313, 0.9305, 0.9414, 0.9199, and 0.9800 in the test dataset. In addition, comparisons of the model with models in studies using similar datasets support the experimental results and reveal that it can be used as a competitive approach for solving the problems encountered in the NLP field.