Deobfuscating Iraqi Arabic Leetspeak for Hate Speech Detection Using AraBERT and Hierarchical Attention Network (HAN)


Marzoog D., ÇAKIR H.

Electronics (Switzerland), cilt.14, sa.21, 2025 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 14 Sayı: 21
  • Basım Tarihi: 2025
  • Doi Numarası: 10.3390/electronics14214318
  • Dergi Adı: Electronics (Switzerland)
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC
  • Anahtar Kelimeler: Arabic Natural Language Processing, deep learning, hate speech detection, Hierarchical Attention Network (HAN), Iraqi Arabic, leetspeak normalization, text deobfuscation
  • Gazi Üniversitesi Adresli: Evet

Özet

The widespread use of leetspeak and dialectal Arabic on social media poses a critical challenge to automated hate speech detection systems. Existing Arabic NLP models, largely trained on Modern Standard Arabic (MSA), struggle with obfuscated, noisy, and dialect-specific text, leading to poor generalization in real-world scenarios. This study introduces a Hybrid AraBERT–Hierarchical Attention Network (HAN) framework for deobfuscating Iraqi Arabic leetspeak and accurately classifying hate speech. The proposed model employs a custom normalization pipeline that converts digits, symbols, and Latin-script substitutions (e.g., "3يب" → "عيب") into canonical Arabic forms, thereby enhancing tokenization and embedding quality. AraBERT provides deep contextualized representations optimized for Arabic morphology, while HAN hierarchically aggregates and attends to critical words and sentences to improve interpretability and semantic focus. Experimental evaluation on an Iraqi Arabic social media dataset demonstrates that the proposed model achieves 97% accuracy, 96% precision, 96% recall, 96% F1-score, and 0.98 ROC–AUC, outperforming standalone AraBERT and HAN models by up to 6% in F1-score and 4% in AUC. Ablation studies confirm the important role of the normalization stage (F1 = 0.91 without it) and the contribution of hierarchical attention in balancing precision and recall. Robustness testing under controlled perturbations (including character substitutions, symbol obfuscations, typographical noise, and class imbalance) shows performance retention above 91% F1, validating the framework’s noise tolerance and generalization capability. Comparative analysis with state-of-the-art approaches such as DRNNs, arHateDetector, and ensemble BERT systems further highlights the hybrid model’s effectiveness in handling noisy, dialectal, and adversarial text.