Corporate E-mail Classification System


Yildiz A., DEMİRCİ M.

2017 International Conference on Computer Science and Engineering (UBMK), Antalya, Türkiye, 5 - 08 Ekim 2017, ss.559-564 identifier identifier

  • Cilt numarası:
  • Doi Numarası: 10.1109/ubmk.2017.8093462
  • Basıldığı Şehir: Antalya
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.559-564

Özet

In this study, a system has been developed to provide meaning and classification of e-mails according to their contents. The aim of the work is to develop an intelligent inbox to assist in awareness of information security. In the designed system, a client is developed that can connect to the server and receive e-mails. A two-step analysis is performed on the received e-mails. For both stage analysis, the original source content of the e-mails is parsed. In the first step, the details that the user will not see in the standard inboxes such as the transmission history of the e-mail, the servers it is going to, the delay time, the message ID are visualized. With this information, it is possible to make determinations such as delivery delays in the received e-mails, server information generated by the delay, and finding the responsible person. In the second stage, the body of the e-mailis selected and the content is analyzed from the format information after it is cleared. In this study, which is customized for Turkish, the words in content are parsed to their root by Zemberek software and recorded with the class information that obtained from the user. Then, a data set is created from the recorded words. The generated data set has been tested with many classification algorithms on the WEKA application and the Naive Bayes Multinomial algorithm which gives the most successful result has been implemented in the system. The overall classification performance of the system for multiple classes was calculated as 90%. The study provides a contribution to the litrature as it is a desktop application that can work in real-time in Turkish without sharing the corporate data with external networks and a new feature selection method.