TwitterBulletin: An Intelligent and Real-Time Automated News Categorization Tool for Twitter


DEMİRCİ M. S., SAĞIROĞLU Ş.

Journal of Universal Computer Science, cilt.28, sa.4, ss.345-377, 2022 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 28 Sayı: 4
  • Basım Tarihi: 2022
  • Doi Numarası: 10.3897/jucs.69377
  • Dergi Adı: Journal of Universal Computer Science
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Directory of Open Access Journals
  • Sayfa Sayıları: ss.345-377
  • Anahtar Kelimeler: artificial intelligence, deep learning, News classification, news dataset, news topic modelling, social media, Twitter
  • Gazi Üniversitesi Adresli: Evet

Özet

Social media platforms have become popular news sources thanks to their immense popularity and high speed of information dissemination. Using these platforms is essential for news organizations and journalists to track and discover news in digital journalism age. However, the abundance of meaningless data and the lack of organization on these platforms make it difficult to reach valuable news for journalists. In this paper, we create the first public dataset containing large number of real-world Turkish news tweets belonging to different news categories, to the best of our knowledge. We propose an artificial intelligence-based two-step approach to assist journalists for accessing the news shared by various sources on social media under the relevant categories like politics (elections, riots, etc.), health (pandemic, covid-19, etc.), etc. via a single platform by reducing the possibility of overlooking needed information. In the first step, we propose a machine learning based novel model for collecting and categorizing news posts on social media. We implement several traditional machine learning and deep learning based algorithms and evaluate their classification performance in terms of accuracy, precision, recall, and F1 score. In the second step, we develop a software tool, named TwitterBulletin, which automatically retrieves Turkish news tweets and groups them under news categories in real time by using the CNN classifier which achieves the best performance in the first step. The results show that the overall accuracy rate of TwitterBulletin is reasonably high and satisfactory despite the challenge of classifying short tweets.