Specification Based Automatic Product Categorization From Unstructured Data

Huseynli A., Yıldız O., Akcayol M. A.

26th IEEE Signal Processing and Communications Applications Conference (SIU), İzmir, Türkiye, 2 - 05 Mayıs 2018, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Doi Numarası: 10.1109/siu.2018.8404356
Basıldığı Şehir: İzmir
Basıldığı Ülke: Türkiye
Gazi Üniversitesi Adresli: Evet

Özet

Categorization of products is crucial for both properties and price comparison applications as well as ecommerce sites that follow the product. This process is often manual and requires significant workload. In particular, the fact that XML data from the industry does not have a standard structure makes the categorization process even more difficult. Considering that products often have discrete information, such as technical specifications, it may be possible to automate this process with text mining methods. In this study, the original data set created with unstructured data was categorized using different methods. First, pre-processing is performed on the data set and attribute extraction is performed from the data set for the clustering operation. Then experimental results were obtained for three different clustering methods. The experimental study showed that the k-means and k-medoid methods for the binary feature matrix had a 98% success.