Optimal bandwidth estimators of kernel density functionals for contaminated data

Gündüz N., Aydın C.

JOURNAL OF APPLIED STATISTICS, vol.48, pp.2239-2258, 2021 (SCI-Expanded) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 48
  • Publication Date: 2021
  • Doi Number: 10.1080/02664763.2021.1944999
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, ABI/INFORM, Aerospace Database, Business Source Elite, Business Source Premier, CAB Abstracts, Veterinary Science Database, zbMATH
  • Page Numbers: pp.2239-2258
  • Keywords: Bandwidth, density functionals, density estimation, kernel smoothing, contaminated data
  • Gazi University Affiliated: Yes


In this study, we provide simulation-based exploration and characterization of the two most crucial kernel density functionals that play a central role in kernel density estimation, considering the probability density functions that are members of the location-scale family. Kernel density functional estimates are known to rely on the choice of preliminary bandwidth. Normal-scale estimators are commonly used to obtain preliminary bandwidth estimates, with the assumption that the data come from normal distribution. Here, we present an alternative approach, called the Cauchy-scale estimators, to obtain preliminary bandwidth estimates. In this approach, data are assumed to come from a Cauchy distribution. Furthermore, analysis results related to the sampling distribution of bandwidth estimators based on the normal- and Cauchy-scale approaches are presented. As a case study, we provide a comprehensive characterization of different contamination levels with a simulation study constructed for the random samples from normal distributions with various parameters and various contamination levels. The proposed preliminary bandwidth selection shows lower variance in both mixture and contaminated data in our simulations. Besides, functional bandwidth presents results similar to the simulation results in the applications we made on the real data set.