Data analysis through social media according to the classified crime


Creative Commons License

Savas S., TOPALOĞLU N.

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, cilt.27, sa.1, ss.407-420, 2019 (SCI İndekslerine Giren Dergi) identifier identifier

  • Cilt numarası: 27 Konu: 1
  • Basım Tarihi: 2019
  • Doi Numarası: 10.3906/elk-1712-17
  • Dergi Adı: TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES
  • Sayfa Sayıları: ss.407-420

Özet

The amount and variety of data generated through social media sites has increased along with the widespread use of social media sites. In addition, the data production rate has increased in the same way. The inclusion of personal information within these data makes it important to process the data and reach meaningful information within it. This process can be called intelligence and this meaningful information may be for commercial, academic, or security purposes. An example application is developed in this study for intelligence on Twitter. Crimes in Turkey are classified according to Turkish Statistical Institute criminal data and keywords are defined according to this data. A total of 150,000 tweet data in the Turkish language are collected from Twitter between specified dates and processed by Turkish Zemberek natural language processing. It is seen that 56% of the people are talking about terrorist attacks and bombing attacks on the study dates. The words "bomb," "terror," "attack," "organization", and "explode" have percentages of 24%, 12%, 8%, 6%, and 6%, respectively. Moreover, associations between words and situations are found. Correlations are important to create new subclusters like "terror" and "rape" in this study with 0.90 correlation. Bigger masses can be accessible by expanding keyword groups to have a clear picture of the real situation.