Specification Based Automatic Product Categorization From Unstructured Data

Huseynli A., YILDIZ O. , AKCAYOL M. A.

26th IEEE Signal Processing and Communications Applications Conference (SIU), İzmir, Turkey, 2 - 05 May 2018 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Volume:
  • Doi Number: 10.1109/siu.2018.8404356
  • City: İzmir
  • Country: Turkey


Categorization of products is crucial for both properties and price comparison applications as well as ecommerce sites that follow the product. This process is often manual and requires significant workload. In particular, the fact that XML data from the industry does not have a standard structure makes the categorization process even more difficult. Considering that products often have discrete information, such as technical specifications, it may be possible to automate this process with text mining methods. In this study, the original data set created with unstructured data was categorized using different methods. First, pre-processing is performed on the data set and attribute extraction is performed from the data set for the clustering operation. Then experimental results were obtained for three different clustering methods. The experimental study showed that the k-means and k-medoid methods for the binary feature matrix had a 98% success.