CLUSTER ANALYSIS OF RAW MILK PRODUCTION IN TÜRKİYE: EVIDENCE FROM RAW MILK SUPPORT


Korkmaz G., Ebegil M.

19th International İstanbul Scientific Research Congress, İstanbul, Türkiye, 28 - 30 Nisan 2026, ss.1901-1902, (Özet Bildiri)

Özet

This study examines the results of the 2023 Raw Milk Support Program implemented by the Ministry of Agriculture and Forestry of the Republic of Türkiye using cluster analysis and compares the results obtained with different clustering methods. The data used in the analysis consist of variables related to milk production quantities by species (cow, sheep, goat, and buffalo) at the provincial level, as well as the number of producers, obtained from the Ministry's Milk Registration System (BSKS). To ensure comparability and methodological robustness, the dataset was standardized prior to analysis, and potential outliers were identified and assessed. In the analysis, the Ward method, a hierarchical clustering technique, and the K-means and Partitioning Around Medoids (PAM) algorithms, which are non-hierarchical clustering methods, were employed. To determine the optimal number of clusters, cluster validity indices such as average silhouette width, gap statistic, Calinski-Harabasz, Davies Bouldin, and Dunn indices were used. Although these indices yield different cluster numbers, a combined perspective considering the obtained coefficients and their interpretability resulted in the selection of 6 clusters for the Ward and K-means methods and 5 clusters for the PAM algorithm. The findings reveal that raw milk production in Türkiye exhibits a non-homogeneous, multi-layered, and regionally differentiated structure. The Ward method produces compact and well-separated clusters that clearly distinguish high-production provinces such as İzmir and Konya, along with several provinces in the Aegean and Marmara regions. In contrast, numerous provinces are grouped into clusters characterized by low production levels. The K-means clustering results highlight strong consistency with the Ward method, which can be attributed to the fact that both methods aim to minimize within cluster variance, while also revealing differences in provinces such as Istanbul and Kocaeli. The PAM algorithm, by selecting representative observations (medoids) from within the dataset for each cluster, yields a clustering structure that is more robust to outliers. The Adjusted Rand Index (ARI) was used to compare the results of the different clustering algorithms used in the analysis. The highest similarity (ARI = 0.74) was observed between the K-means and PAM algorithms, indicating a strong concordance between these partitioning-based methods. In addition, the Ward method also demonstrated a high level of agreement with K-means (ARI = 0.73), supporting the robustness and consistency of the overall clustering structure. In conclusion, the study provides robust empirical evidence that raw milk production in Türkiye exhibits a multi-layered structure and that support policies should be differentiated to account for regional variations.

Keywords: Raw Milk Production, Ward’s Method, K-means, PAM, Cluster Validity Indices .