A novel hybridization approach to improve the critical distance clustering algorithm: Balancing speed and quality


Hamed Kuwil F., ATİLA Ü.

Expert Systems with Applications, cilt.247, 2024 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 247
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1016/j.eswa.2024.123298
  • Dergi Adı: Expert Systems with Applications
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Anahtar Kelimeler: Algorithm hybridization, Algorithm specialization, Clustering analysis, Connectivity and coherence, Critical distance
  • Gazi Üniversitesi Adresli: Evet

Özet

Clustering is a prominent research area, with numerous studies and the development of hundreds of algorithms over the years. However, a fundamental challenge in clustering research is the trade-off between algorithm speed and clustering quality. Existing algorithms tend to prioritize either fast execution with compromised clustering quality or slower performance with superior clustering results. In this study, we propose a novel CDC-2 algorithm, an improved version of the Critical Distance Clustering (CDC) algorithm, to address this challenge. Inspired by the concepts of hybridization in biology and the division of labor in the economic system, we present a new hybridization strategy. Our approach integrates the connectivity and coherence aspects of the K-means and CDC-2 algorithms, respectively, allowing us to combine speed and quality in a single algorithm. This approach is referred to as the CDC++ algorithm, and it is characterized as a hybrid that combines elements from two algorithms, K-means and CDC-2, in order to leverage their strengths while mitigating their weaknesses. Moreover, the structure and mechanism of the CDC++ algorithm led to the introduction of a new concept called “object autoencoder.” Unlike traditional feature reduction methods, this concept focuses on object reduction, representing a significant advancement in clustering techniques. To validate our approach, we conducted experimental studies on thirteen synthetic and five real datasets. Comparative analysis with four well-known algorithms demonstrates that our proposed development and hybridization enable efficient processing of large-scale and high-dimensional datasets without compromising clustering quality.