PyCaret for Predicting Type 2 Diabetes: A Phenotype- and Gender-Based Approach with the “Nurses’ Health Study” and the “Health Professionals’ Follow-Up Study” Datasets

Gul, Sebnem; AYTURAN, KUBİLAY; HARDALAÇ, FIRAT

doi:10.3390/jpm14080804

PyCaret for Predicting Type 2 Diabetes: A Phenotype- and Gender-Based Approach with the “Nurses’ Health Study” and the “Health Professionals’ Follow-Up Study” Datasets

Atıf İçin Kopyala

Gul S., AYTURAN K., HARDALAÇ F.

Journal of Personalized Medicine, cilt.14, sa.8, 2024 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 14 Sayı: 8
Basım Tarihi: 2024
Doi Numarası: 10.3390/jpm14080804
Dergi Adı: Journal of Personalized Medicine
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Food Science & Technology Abstracts, Directory of Open Access Journals
Anahtar Kelimeler: feature importance plot, machine learning, prediction, PyCaret, SHAP value, type 2 diabetes mellitus
Gazi Üniversitesi Adresli: Evet

Özet

Predicting type 2 diabetes mellitus (T2DM) by using phenotypic data with machine learning (ML) techniques has received significant attention in recent years. PyCaret, a low-code automated ML tool that enables the simultaneous application of 16 different algorithms, was used to predict T2DM by using phenotypic variables from the “Nurses’ Health Study” and “Health Professionals’ Follow-up Study” datasets. Ridge Classifier, Linear Discriminant Analysis, and Logistic Regression (LR) were the best-performing models for the male-only data subset. For the female-only data subset, LR, Gradient Boosting Classifier, and CatBoost Classifier were the strongest models. The AUC, accuracy, and precision were approximately 0.77, 0.70, and 0.70 for males and 0.79, 0.70, and 0.71 for females, respectively. The feature importance plot showed that family history of diabetes (famdb), never having smoked, and high blood pressure (hbp) were the most influential features in females, while famdb, hbp, and currently being a smoker were the major variables in males. In conclusion, PyCaret was used successfully for the prediction of T2DM by simplifying complex ML tasks. Gender differences are important to consider for T2DM prediction. Despite this comprehensive ML tool, phenotypic variables alone may not be sufficient for early T2DM prediction; genotypic variables could also be used in combination for future studies.