ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, vol.1, no.1, pp.1-15, 2023 (SCI-Expanded)
Keyword extraction is a fundamental problem in natural language
processing applications. Many graph-based models can be found in the
literature that construct a graph of word co-occurrences from the input
text to solve this problem. These models use graph-based features, such
as Betweenness Centrality, Closeness Centrality, Eigenvector Centrality,
Degree, PageRank, Clustering Coefficient, Eccentricity, Structural Hole
and Coreness. In this paper, we propose a novel graph-based token
classification model based on commonly used graph-based features. We
used extra tree, lasso, genetic algorithm and wrapper methods to filter
most informative group from all features. The token classification
module of the model uses the Random Forest Ensemble classification
algorithm. The performance results were evaluated with the commonly used
datasets Inspec, Semeval-2017, and 500N-KPCrowd. The proposed model was
also evaluated with the newly collected TRDizinEn and DergiParkEn
datasets. Semeval-2017, 500N-KPCrowd, DergiParkEn, and TRDizinEn
achieved the highest