Outlier detection with Mahalanobis square distance: incorporating small sample correction factor


EKİZ M., EKİZ O. U.

JOURNAL OF APPLIED STATISTICS, cilt.44, sa.13, ss.2444-2457, 2017 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 44 Sayı: 13
  • Basım Tarihi: 2017
  • Doi Numarası: 10.1080/02664763.2016.1255313
  • Dergi Adı: JOURNAL OF APPLIED STATISTICS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.2444-2457
  • Anahtar Kelimeler: Outlier, MCD estimators, S-estimators, bi-weight, t-biweight, Mahalanobis square distance, 62F35, 62H10, MULTIVARIATE LOCATION, S-ESTIMATORS, COVARIANCE, ASYMPTOTICS, MATRICES, POINTS
  • Gazi Üniversitesi Adresli: Evet

Özet

Mahalanobis square distances (MSDs) based on robust estimators improves outlier detection performance in multivariate data. However, the unbiasedness of robust estimators are not guaranteed when the sample size is small and this reduces their performance in outlier detection. In this study, we propose a framework that uses MSDs with incorporated small sample correction factor (c) and show its impact on performance when the sample size is small. This is achieved by using two prototypes, minimum covariance determinant estimator and S-estimators with bi-weight and t-biweight functions. The results from simulations show that distribution of MSDs for non-extreme observations are more likely to fit to chi-square with p degrees of freedom and MSDs of the extreme observations fit to F distribution, when c is incorporated into the model. However, without c, the distributions deviate significantly from chi-square and F observed for the case with incorporated c. These results are even more prominent for S-estimators. We present seven distinct comparison methods with robust estimators and various cut-off values and test their outlier detection performance with simulated data. We also present an application of some of these methods to the real data.