FUSEPOP: A Multi-Modal Fusion with Mutual Information Weighting and Stacked Ensemble for Social Media Popularity Prediction

ŞENCAN, ÖMER; ATACAK, İSMAİL; DOĞRU, İBRAHİM; TOKLU, SİNAN; Bar, Necaattin; Kılıç, Kazım

doi:10.3390/app16094160

FUSEPOP: A Multi-Modal Fusion with Mutual Information Weighting and Stacked Ensemble for Social Media Popularity Prediction

ŞENCAN Ö. A., ATACAK İ., DOĞRU İ. A., TOKLU S., Bar N., Kılıç K.

Applied Sciences (Switzerland), cilt.16, sa.9, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 16 Sayı: 9
Basım Tarihi: 2026
Doi Numarası: 10.3390/app16094160
Dergi Adı: Applied Sciences (Switzerland)
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Anahtar Kelimeler: mutual information fusion, popularity detection, social media, stacked ensemble
Gazi Üniversitesi Adresli: Evet

Özet

Short-form video content has gained importance as a popular form of digital media due to the rising popularity of social media platforms and the decreasing attention spans of consumers. However, a major obstacle to popularity detection in short-form content is the heterogeneous nature of the data, encompassing textual, visual, and metadata components. To tackle this challenge, we propose FUSEPOP, a robust multi-modal architecture. The proposed framework utilizes ResNet-50 for visual feature extraction and XLM-RoBERTa for encoding multilingual textual information. FUSEPOP employs a mutual information-based modality weighting mechanism with logarithmic smoothing and a 0.7 weight ceiling to balance contributions from each input stream. Furthermore, FUSEPOP implements a robust stacked generalization strategy trained via stratified 5-fold cross-validation. This approach utilizes a logistic regression meta-learner to dynamically synthesize predictions from random forest, XGBoost, and a neural network-based classifier. Experimental results show that this architecture significantly outperforms benchmark models, achieving an accuracy of 0.980 and an average F1-score of 0.964 on the feature configuration selected for this study, and remains competitive on a literature-aligned alternative configuration. These findings confirm that the proposed model successfully detects popularity on short-form social media content.