Outlier detection with Mahalanobis square distance: incorporating small sample correction factor


EKİZ M., EKİZ O. U.

JOURNAL OF APPLIED STATISTICS, vol.44, no.13, pp.2444-2457, 2017 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 44 Issue: 13
  • Publication Date: 2017
  • Doi Number: 10.1080/02664763.2016.1255313
  • Journal Name: JOURNAL OF APPLIED STATISTICS
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.2444-2457
  • Keywords: Outlier, MCD estimators, S-estimators, bi-weight, t-biweight, Mahalanobis square distance, 62F35, 62H10, MULTIVARIATE LOCATION, S-ESTIMATORS, COVARIANCE, ASYMPTOTICS, MATRICES, POINTS
  • Gazi University Affiliated: Yes

Abstract

Mahalanobis square distances (MSDs) based on robust estimators improves outlier detection performance in multivariate data. However, the unbiasedness of robust estimators are not guaranteed when the sample size is small and this reduces their performance in outlier detection. In this study, we propose a framework that uses MSDs with incorporated small sample correction factor (c) and show its impact on performance when the sample size is small. This is achieved by using two prototypes, minimum covariance determinant estimator and S-estimators with bi-weight and t-biweight functions. The results from simulations show that distribution of MSDs for non-extreme observations are more likely to fit to chi-square with p degrees of freedom and MSDs of the extreme observations fit to F distribution, when c is incorporated into the model. However, without c, the distributions deviate significantly from chi-square and F observed for the case with incorporated c. These results are even more prominent for S-estimators. We present seven distinct comparison methods with robust estimators and various cut-off values and test their outlier detection performance with simulated data. We also present an application of some of these methods to the real data.