Detection of Turkish Fake News from Tweets with BERT Models


Creative Commons License

Koru G. K., ULUYOL Ç.

IEEE Access, cilt.12, ss.14918-14931, 2024 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 12
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1109/access.2024.3354165
  • Dergi Adı: IEEE Access
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Sayfa Sayıları: ss.14918-14931
  • Anahtar Kelimeler: BERT, deep learning, ensemble learning, Fake news, generated news
  • Gazi Üniversitesi Adresli: Evet

Özet

As the number of people using social networks increases, more people are using social media platforms to meet their news needs. Users think that it is easier to follow the agenda by accessing news, especially on Twitter, rather than newspaper news pages. However, fake news is increasingly appearing on social media, and it is not always possible for people to obtain correct news from partial news pages or short Twitter posts. Understanding whether the news shared on Twitter is true or not is an important problem. Detecting fake tweets is of great importance in Turkish as well as in any language. In this study, fake news obtained from verification platforms on Twitter and real news obtained from the Twitter accounts of mainstream newspapers were labeled and, preprocessed using the Zemberek natural language processing tool developed for the Turkish language, and a dataset named TR_FaRe_News was created. Then, the TR_FaRe_News dataset was explored using ensemble methods and BoW, TF-IDF, and Word2Vec vectorization methods for fake news detection. Then a pre-trained BERT deep learning model was fine-tuned, and variations of the model extended with Bi-LSTM and Convolutional Neural Network (CNN) layers with the frozen and unfrozen parameters methods were explored. The performance evaluation was conducted using seven comparable datasets, namely BuzzFeedNews, GossipCop, ISOT, LIAR, Twitter15, and Twitter16, including an LLM-generated fake news dataset. Analyzing Turkish tweets and using fake news datasets generated by LLM is considered an important contribution. Accuracy values between 90 and 94% were obtained with the BERT and BERTurk + CNN models with 94% accuracy.