Investigation of the robustness in terms of type i error based on minimum covariance determinant for identify outliers in multivariate data


Thesis Type: Postgraduate

Institution Of The Thesis: Gazi Üniversitesi, Fen Bilimleri Enstitüsü, Turkey

Approval Date: 2013

Student: EREN GÜMÜŞ

Supervisor: OSMAN UFUK EKİZ

Abstract:

Outlier identifying in multivariate data analyze is rather hard especially when data size increases. Furthermore, outliers effect the classical location and shape parameter estimators and also Mahalanobis distances which use for outlier detection. Because of that in Mahalanobis distance gauge it would be seen more appropriate to use robust estimators against outliers. Minimum Covariance Determinant is just one of the robust estimator of location and shape estimator of multivariate data which suggested for show the robustness against outliers in literature. The aim of this study with simulation is to probe that based on Minimum Covariance Determinant estimators, contrary to commonly used chi-square distrubition for distances which based on distrubition of robust Mahalanobis distances, F distrubition is more appropriate for outliers. In this study, outlier and robust statistics for against outliers concepts are emphasised primarily and then Minimum Covariance Determinant method is mentioned. Due to the calculation hardness of the Minimum Covariance Determinant, the algorithm of Fast-Minimum Covariance Determinant is mentioned, and the simulation study is performed through this algorithm. In simulation study, through chi-square and F distrubition determine the rates of incorrect are compared and results are supported visually. As a result, contrary to commonly used based on MCD estimates via Mahalanobis distance, chi-square distrubition cutt-off for identfying outliers in multivariate data, F distrubition cut-off is considered appropriate to use.