A Novel Approach to SPAM Detection in Social Networks-Light-ANFIS: Integrating Gradient-Based One-Sided Sampling and Random Forest-Based Feature Clustering Techniques with Adaptive Neuro-Fuzzy Inference Systems

Çıtlak, Oğuzhan; ATACAK, İSMAİL; DOĞRU, İBRAHİM

doi:10.3390/app151810049

A Novel Approach to SPAM Detection in Social Networks-Light-ANFIS: Integrating Gradient-Based One-Sided Sampling and Random Forest-Based Feature Clustering Techniques with Adaptive Neuro-Fuzzy Inference Systems

Çıtlak O., ATACAK İ., DOĞRU İ. A.

Applied Sciences (Switzerland), cilt.15, sa.18, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 15 Sayı: 18
Basım Tarihi: 2025
Doi Numarası: 10.3390/app151810049
Dergi Adı: Applied Sciences (Switzerland)
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Agricultural & Environmental Science Database, Applied Science & Technology Source, Communication Abstracts, INSPEC, Metadex, Directory of Open Access Journals, Civil Engineering Abstracts
Anahtar Kelimeler: adaptive neuro-fuzzy inference system, gradient-based one-side sampling, RF-based clustering, social networks, spam detection
Gazi Üniversitesi Adresli: Evet

Özet

With today’s technological advancements and the widespread use of the Internet, social networking platforms that allow users to interact with each other are increasing rapidly. The popular social network X (formerly Twitter) has become a target for malicious actors, and spam is one of its biggest challenges. The filters employed by such platforms to protect users struggle to keep up with evolving spam techniques, the diverse behaviors of platform users, the dynamic tactics of spam accounts, and the need for updates in spam detection algorithms. The literature shows that many effective solutions rely on computationally expensive methods that are limited by dataset constraints. This study addresses the spam challenges of social networks by proposing a novel detection framework, Light-ANFIS, which combines ANFIS with gradient-based one-side sampling (GOSS) and random forest-based clustering (RFBFC) techniques. The proposed approach employs the RFBFC technique to achieve efficient feature reduction, yielding an ANFIS model with reduced input requirements. This optimized ANFIS structure enables a simpler system configuration by minimizing parameter usage. In this context, dimensionality reduction enables a faster ANFIS training. The GOSS technique further accelerates ANFIS training by reducing the sample size without sacrificing accuracy. The proposed Light-ANFIS architecture was evaluated using three datasets: two public benchmarks and one custom dataset. To demonstrate the impact of GOSS, its performance was benchmarked against that of RFBFC-ANFIS, which relies solely on RFBFC. Experiments comparing the training durations of the Light-ANFIS and RFBFC-ANFIS architectures revealed that the GOSS technique improved the training time efficiency by 38.77% (Dataset 1), 40.86% (Dataset 2), and 38.79% (Dataset 3). The Light-ANFIS architecture has also achieved successful results in terms of accuracy, precision, recall, F1-score, and AUC performance metrics. The proposed architecture has obtained scores of 0.98748, 0.98821, 0.99091, 0.98956, and 0.98664 in Dataset 1; 0.98225, 0.97412, 0.99043, 0.98221, and 0.98233 in Dataset 2; and 0.98552, 0.98915, 0.98720, 0.98818, and 0.98503 in Dataset 3 for these performance metrics, respectively. The Light-ANFIS architecture has been observed to demonstrate performance above existing methods when compared with methods in studies using similar datasets and methodologies based on the literature. Even in Dataset 1 and Dataset 3, it achieved a slightly better performance in terms of confusion matrix metrics compared to current deep learning (DL)-based hybrid and fusion methods, which are known as high-performance complex models in this field. Additionally, the proposed model not only exhibits high performance but also features a simpler configuration than structurally equivalent models, providing it with a competitive edge. This makes it a valuable for safeguarding social media users from harmful content.