Feature Selection and SMOTE Based Recommendation for Parkinson's Imbalanced Dataset Prediction Problem Parkinson Dengesiz Veri Kümesi Tahmin Problemi Için Özellik Seçimi ve SMOTE Tabanli Bir Öneri


Nakkas B. N.

30th Signal Processing and Communications Applications Conference, SIU 2022, Safranbolu, Turkey, 15 - 18 May 2022 identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/siu55565.2022.9864672
  • City: Safranbolu
  • Country: Turkey
  • Keywords: classification, data mining, Parkinson's disease
  • Gazi University Affiliated: Yes

Abstract

© 2022 IEEE.Parkinson's disease is basically a movement disorder. Early diagnosis of Parkinson's disease and the application of appropriate treatments are extremely important for the patient. Many symptoms are used to diagnose the disease. In this study, biomedical voice data were used for the detection of Parkinson's disease, and these data were classified using data mining algorithms. Before classification, the imbalance in the dataset was corrected with Synthetic Minority Oversampling Technique (SMOTE) and only 7 features were selected for classification using Recursive Feature Elimination with Cross-Validation (RFECV). Classification was performed using Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Artificial Neural Network (ANN) algorithms. In order to validate the classification models, 10-fold cross-validation on the dataset and the techniques of splitting the dataset into training and test sets were applied separately. Accuracy, precision, sensitivity, F1-score metrics, and complexity matrix were used to evaluate model performances. The most successful accuracy rates were obtained with KNN and RF as 96.1% and with ANN as 100%.