PREDICTIVE PERFORMANCES OF IMPLICITLY AND EXPLICITLY ROBUST CLASSIFIERS ON HIGH DIMENSIONAL DATA


GÜNDÜZ TEKİN N. , Fokoue E.

COMMUNICATIONS FACULTY OF SCIENCES UNIVERSITY OF ANKARA-SERIES A1 MATHEMATICS AND STATISTICS, cilt.66, sa.2, ss.14-36, 2017 (ESCI İndekslerine Giren Dergi) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 66 Konu: 2
  • Basım Tarihi: 2017
  • Doi Numarası: 10.1501/commua1_0000000797
  • Dergi Adı: COMMUNICATIONS FACULTY OF SCIENCES UNIVERSITY OF ANKARA-SERIES A1 MATHEMATICS AND STATISTICS
  • Sayfa Sayıları: ss.14-36

Özet

The goal of this paper is to demonstrate via extensive simulation that implicit robustness can substantially outperform explicit robust in the pattern recognition of contaminated high dimension low sample size data. Our work specifically demonstrates via extensive computational simulations and applications to real life data, that random subspace ensemble learning machines, although not explicitly structurally designed as a robustness-inducing supervised learning paradigms, outperforms the structurally robustness-seeking classifiers on high dimension low sample size datasets. Random forest (RF), which is arguably the most commonly used random subspace ensemble learning method, is compared to various robust extensions/adaptations of the discriminant analysis classifier, and our work reveals that RF, although not inherently designed to be robust to outliers, substantially outperforms the existing techniques specifically designed to achieve robustness. Specifically, by exploring different scenarios of the sample size n and the input space dimensionality p along with the corresponding capacity kappa = n/p with kappa < 1, we demonstrate through extensive simulations that regardless of the contamination rate epsilon, RF predictively outperforms the explicitly robustness-inducing classification techniques when the intrinsic dimensionality of the data is large.