Corporate E-mail Classification System

Yildiz A., DEMİRCİ M.

2017 International Conference on Computer Science and Engineering (UBMK), Antalya, Türkiye, 5 - 08 Ekim 2017, ss.559-564, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Doi Numarası: 10.1109/ubmk.2017.8093462
Basıldığı Şehir: Antalya
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.559-564
Gazi Üniversitesi Adresli: Evet

Özet

In this study, a system has been developed to provide meaning and classification of e-mails according to their contents. The aim of the work is to develop an intelligent inbox to assist in awareness of information security. In the designed system, a client is developed that can connect to the server and receive e-mails. A two-step analysis is performed on the received e-mails. For both stage analysis, the original source content of the e-mails is parsed. In the first step, the details that the user will not see in the standard inboxes such as the transmission history of the e-mail, the servers it is going to, the delay time, the message ID are visualized. With this information, it is possible to make determinations such as delivery delays in the received e-mails, server information generated by the delay, and finding the responsible person. In the second stage, the body of the e-mailis selected and the content is analyzed from the format information after it is cleared. In this study, which is customized for Turkish, the words in content are parsed to their root by Zemberek software and recorded with the class information that obtained from the user. Then, a data set is created from the recorded words. The generated data set has been tested with many classification algorithms on the WEKA application and the Naive Bayes Multinomial algorithm which gives the most successful result has been implemented in the system. The overall classification performance of the system for multiple classes was calculated as 90%. The study provides a contribution to the litrature as it is a desktop application that can work in real-time in Turkish without sharing the corporate data with external networks and a new feature selection method.