Sentiment Analysis for Turkish Unstructured Data by Machine Translation


2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, United States Of America, 11 December 2020 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/bigdata50022.2020.9377784
  • City: Atlanta, GA
  • Country: United States Of America
  • Keywords: sentiment analysis, unstructured data, machine translation, machine learning
  • Gazi University Affiliated: Yes


Recent online popular platforms such as social media, blogs, and newspapers generate a vast amount of unstructured data per second. Sentiment Analysis (SA) is an efficient technique to identify and extract subjective information in unstructured data to enable businesses to understand the emotional tendency of the interactive users towards its products or services. However, analyzing unstructured data can be more difficult than structural data. In particular, the performance of SA techniques decreases due to the structural complexity of the language. SA techniques are widely used in English since it is universal and structurally more suitable for SA. On the other hand, structural difficulties and complexities in Turkish cause performance degradation of SA studies compared to English. This study aims to overcome this difficulty by first translating Turkish texts into English texts by machine translation, and then realizing sentiment analysis on English texts. To demonstrate the success of machine translation, the experiments are conducted on two different data sets and results are given in a comparative manner for both on Turkish as the original language and English as the translated language. Data sets in both languages are classified by six different machine learning methods which are Logistic Regression, Naive Bayes, Decision Tree, Random Forest, Support Vector Machine, and Artificial Neural Network. When the success rates of machine learning methods are examined, a significant increase is observed by machine translation for most of the methods.