A Named Entity Recognition Dataset for Turkish


Kucuk D., Kucuk D., ARICI N.

24th Signal Processing and Communication Application Conference (SIU), Zonguldak, Türkiye, 16 - 19 Mayıs 2016, ss.329-332 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası:
  • Doi Numarası: 10.1109/siu.2016.7495744
  • Basıldığı Şehir: Zonguldak
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.329-332

Özet

Named entity recognition is one of the important topics in the research area of natural language processing. Named entity recognition studies conducted on Turkish texts are quite limited, compared to the studies on other languages. Besides, the lack of common data sets makes the comparison of different approaches harder. In this study, a dataset comprising news articles in Turkish annotated with named entities is presented. The annotations comprise the basic named entity types of person, location, and organization names. Additionally, to be used as reference in future studies, a rule-based named entity recognition system is evaluated on the final form of this data set and the corresponding evaluation results are presented. It is envisioned that our study will contribute to the advancement of named entity recognition studies on Turkish texts.