Genetic diversity and population structure analysis of cotton (Gossypium hirsutum L.) genotypes using DArTseq technology


Creative Commons License

Haliloğlu K., Akgöl B., Hançer T., Alipour H., Türkoğlu A.

BMC GENOMICS, cilt.27, sa.295, ss.1-18, 2026 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 27 Sayı: 295
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1186/s12864-026-12643-9
  • Dergi Adı: BMC GENOMICS
  • Derginin Tarandığı İndeksler: Scopus, Science Citation Index Expanded (SCI-EXPANDED), BIOSIS, EMBASE, MEDLINE, Directory of Open Access Journals
  • Sayfa Sayıları: ss.1-18
  • Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
  • Gazi Üniversitesi Adresli: Evet

Özet

Cotton is a globally cultivated crop that is exposed to various biotic and abiotic stress conditions. The most effective strategy to mitigate these stresses is the development of tolerant or resistant varieties through plant breeding programs. Appropriate breeding strategies such as conventional breeding are crucial for generating beneficial genetic variations and identifying desirable traits. Integrating conventional breeding with molecular breeding is a key approach to addressing the challenges of sustainable cotton production. We aimed to assess the genetic diversity and the population structure of cotton genotypes, therefore evaluating genotypes value in terms of use for breeding purposes. A total of 913 genotypes, including advanced breeding lines from the Gossypium hirsutum L., were genotyped using the diversity array technology sequencing (DArTseq array) with a high throughput of the single-nucleotide polymorphisms (SNPs). Out of 5,986 SNPs, 1,431 high-quality SNPs were selected for genome diversity analysis. The genotypes investigated were classified into subgroups based on kinship relationships, UPMGA clustering, discriminant analysis of principal components, and principal coordinate analysis. Population structure analysis using the model-based ΔK method suggested an optimal K = 2, representing the highest level of population differentiation. However, the multivariate DAPC method revealed a finer substructure of K = 3, providing a more detailed view of the genetic relationships within the breeding lines. The highest fixation index (FST) and lowest gene flow (Nm) were observed between subpopulation I and subpopulation II. The mean values of minor allele frequency (MAF), genetic diversity (GD), polymorphism information content (PIC), observation heterozygosity (Ho), and the highest FST across the whole genome were estimated at 0.2328, 0.3208, 0.2607, 0.0210, and 0.0830, respectively. The findings of this study will provide valuable insights into breeders in selecting parents for cultivar development breeding programs through crossbreeding.