2020 5th International Conference on Computer Science and Engineering (UBMK), Diyarbakır, Türkiye, 9 - 11 Eylül 2020, ss.290-294
Author identification takes an important place in Natural Language Processing (NLP). Each written document carries the trail of its author. In this study, we aim to realize the author identification via the traces belonging to author be retrieved from the text. A raw dataset was created with 25 columnists and randomly selected 2024 texts from different newspapers in the Turkish language. A dataset with character and lexical features with natural language processing methods were prepared over the raw dataset. The feature selection process was realized with the combination of the Chicken swarm optimization and the ensemble learning algorithms on the prepared dataset. The results were evaluated before and after the feature selection method was applied. The highest success rate with 93.99% was achieved when Adaboost with J48 algorithm was applied after the feature selection process carried out.