Insult crime analysis over Turkish data on twitter


Thesis Type: Post Graduate

Institution Of The Thesis: Gazi Üniversitesi, Bilişim Enstitüsü, Turkey

Approval Date: 2017

Student: NEVZAT ERÇOLAK

Consultant: HÜSEYİN ÇAKIR

Abstract:

Social media is a platform that people share their ideas and their daily lives, and contains sports, politics, magazine and other similar areas that can be qualified as news. Aside from these kind of sharing, social media is also a house of threads, blackmails, insults and many other forms of crime elements. In this thesis, by using posts on Twitter, one of the most widely used social media platforms, the catagorization of insult crime was studied in the scope of text classification. It is certainly possible to group posts on social media platforms with text classification, in ahice tagging process of text is made by predefined categories. In the scope of this research, within the safe limits of law, speficially crime of insult was analyzed among the crimes that people are exposed to in internet and social media platforms. By using not only text classification methods but also machine learning in the classification phase, an example classification model was created for crime of insult analysis. The study was carried out in two categories, "insult" or "not insult" and tweets used in the study were obtained by the search of predefined Turkish keywords. In the scope of this study different pre-processing and classification techniques were performed, and effect of these techniques on text classification were investigated. Among the best results obtained from different attributes and classification methods, Support Vector Machine method was used and 95,4% accuracy achieved. Finally a prototype application that is able to process the content of Turkish tweet posts was developed in the purpose of automatically applying methods used in text classification and language processing. By using this prototype application, new data sets were created in the intent of enhancement of the classification accuracy.