Personality traits prediction model from Turkish contents with semantic structures


Kosan M. A., KARACAN H., Urgen B. A.

Neural Computing and Applications, cilt.35, sa.23, ss.17147-17165, 2023 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 35 Sayı: 23
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1007/s00521-023-08603-z
  • Dergi Adı: Neural Computing and Applications
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Applied Science & Technology Source, Biotechnology Research Abstracts, Compendex, Computer & Applied Sciences, Index Islamicus, INSPEC, zbMATH
  • Sayfa Sayıları: ss.17147-17165
  • Anahtar Kelimeler: Turkish Twitter content, Personality prediction model, Personality dataset, Preprocessing
  • Gazi Üniversitesi Adresli: Evet

Özet

Users' personality traits can provide different clues about them in the Internet environment. Some areas where these clues can be used are law enforcement, advertising agencies, recruitment processes, and e-commerce applications. In this study, it is aimed to create a dataset and a prediction model for predicting the personality traits of Internet users who produce Turkish content. The main contribution of the study is the personality traits dataset composed of the Turkish Twitter content. In addition, the preprocessing, vectorization, and deep learning model comparisons made in the proposed prediction system will contribute to both current usages and future studies in the relevant literature. It has been observed that the success of the Bidirectional Encoder Representations from Transformers vectorization method and the Stemming preprocessing step on the Turkish personality traits dataset is high. In the previous studies, the effects of these processes on English datasets were reported to have lower success rates. In addition, the results show that the Bidirectional Long Short-Term Memory deep learning method has a better level of success than other methods both for the Turkish dataset and English datasets.