Diagnosis of polycystic ovary syndrome through different machine learning and feature selection techniques

Danaei Mehr H., Polat H.

HEALTH AND TECHNOLOGY, vol.12, no.1, pp.137-150, 2022 (ESCI) identifier identifier

  • Publication Type: Article / Article
  • Volume: 12 Issue: 1
  • Publication Date: 2022
  • Doi Number: 10.1007/s12553-021-00613-y
  • Journal Indexes: Emerging Sources Citation Index (ESCI), Scopus, EMBASE, INSPEC
  • Page Numbers: pp.137-150
  • Keywords: Polycystic ovary syndrome diagnosis, Feature selection, Machine learning, Random forest, Ensemble learning, DISEASE, CLASSIFICATION
  • Gazi University Affiliated: Yes


Polycystic ovary syndrome (PCOS) has been determined as one of the serious health problems among women which affects women's fertility and leads to crucial health conditions. Hence, early diagnosis of polycystic ovary syndrome can be effective in the treatment process. Recently, machine learning methods have acquired promising results in medical diagnosis. Furthermore, feature selection techniques which generate the most significant subset of features, can reduce the computational time and improve the performance of classifiers. Conventional single machine learning algorithms classify datasets in a single process with an individual model whereas ensemble machine learning algorithms create multiple process with a combination of two or more models which can achieve more accurate results. Therefore, considering the advantages of ensemble classifiers and feature selection methods, in this study, traditional and ensemble classifiers were applied on the Kaggle PCOS dataset to diagnose polycystic ovary syndrome. Furthermore, the performance of various classifiers (i.e., Ensemble Random Forest, Extra Tree, Adaptive Boosting (AdaBoost) and Multi-Layer Perceptron (MLP)) were investigated using the dataset with all features and reduced subsets of features which were generated by filter, embedded and wrapper feature selection methods. The experimental results demonstrated that the feature selection methods had beneficial effects on the improvement of the performance of all classifiers. Moreover, Ensemble Random Forest classifier by using the reduced subset of features based on the embedded feature selection method surpassed other classifiers with Accuracy of 98.89% and Sensitivity of 100% in this study and other studies in the literature.