CLASSIFICATION OF COURSE CONTENTS BY USING SELF-ORGANIZING MAPS


Alpdogan Y., BİLGE H. Ş.

JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, cilt.24, sa.2, ss.303-310, 2009 (SCI-Expanded) identifier identifier

Özet

The number of electronic documents is growing at a high rate in today; therefore automatic document classification systems are becoming more important for the future of the information management. In this study, it is aimed to classify the technical documents automatically according to their contents. Course contents of computer engineering departments are used as technical documents, which contain many technical terms. In this study, a technical document classification system is proposed that is based on the Self-Organizing Map (SOM) algorithm, which is an effective unsupervised artificial neural network method. Before the classification process, some preprocessing steps have to be applied. First of all, stopwords are removed from documents. In order to increase the classification performance, the word stemming is needed. The words that are used in only one document are removed because of their less importance. Most frequently used words are not removed in contrary to other applications, because they are found to be important and meaningful in this data set. Next, term frequency and inverse document frequency data are used for calculation of normalized weighted vectors. By using these vectors of each course, document classification is performed by self-organizing map method. For comparison, the results are shown with the output of k-means algorithm. By using this classification study, the relations between the course contents of a department are very clearly visualized. Furthermore, different named and coded courses from different universities come successfully together in the final SOM map.