Some insights into depth estimators for location and scatter in the multivariate setting
-
Facultad de Matemática, Astronomía, Física y Computación, Universidad Nacional de Córdoba, Universidad Nacional de Córdoba, CIEM and CONICET, Córdoba, Argentina
[jorge.adrover@unc.edu.ar ] -
Facultad de Ciencias Exactas, Físico-Químicas y Naturales, Universidad Nacional de Río Cuarto, Río Cuarto, Argentina[mruiz@exa.unrc.edu.ar]
Keywords: statistical depth – multivariate location and scatter – maximum bias – breakdown point
Abstract
Tukey [1975] introduced the concept of depth in the context of dimensional observations in order to come up with a multivariate analog of the median. More precisely, if and the depth of is defined to be
and the Tukey median is taken to be . Chen et al. [2018] and Paindaveine and Van Bever [2018] also dealt with the concept of depth for the multivariate scatter. Given an initial robust multivariate location estimator Paindaveine and Van Bever [2018] defined the depth of a symmetric positive matrix as
(1) |
and the deepest estimator as . Chen et al. [2018] and Paindaveine and Van Bever [2018] agree with the definition when the location is known as we do not need to include in (1). If a joint location-scale depth can be derived from either,
or
yielding the median as deepest location and the median absolute deviation around the median as the deepest scale estimator in case of the depth .
Chen et al. [2018] introduced a unified way to study the statistical convergence rate and robustness jointly. Given let and . Let be the -contamination neighborhood with . Take as the set of symmetric and definite positive matrices such that the largest eigenvalue is less than a constant
Chen et al. [2018] derived that, for , and sufficiently small, there exists a constant (depending on but independent of , , ), such that
(2) |
The constant in (2) is actually affected by the asymptotic maximum bias of the Tukey median. Chen and Tyler [2002] dealt with the asymptotic maximum bias of the Tukey median, , over the contamination neighborhood with .
However, the bound (2) can be derived in a more illuminating manner by explicitly incorporating the maximum bias, as the maximum bias governs the behavior of the estimator when the sample size is sufficiently large. Without enlarging the bound in (2), we obtain a more informative inequality,
Chen et al. [2018] also came up with an analogous bound for the deepest dispersion estimator in the case of known location. Therefore they considered the -contamination neighborhood with central model . If such that and stands for the norm of the matrix given by the largest singular value of Chen et al. [2018] showed that,
(3) |
Similarly to the analysis given for the Tukey median, we can obtain a more accurate inequality,
with , without enlarging the original bound (3) and highlighting a possible maximum bias for the largest eigenvalue.
For , , which suggests that the asymptotic breakdown point is In effect, we prove that the asymptotic breakdown point .
Even though the formulation given by Chen et al. [2018] seems to be promising as robust confidence regions, the bounds are extremely large to yield regions with reasonable sizes according to simulations.
References
- Chen et al. [2018] M. Chen, C. Gao, and Z. Ren. Robust covariance and scatter matrix estimation under Huber’s contamination model. The Annals of Statistics, 46(5):1932–1960, 2018.
- Chen and Tyler [2002] Z. Chen and D. E. Tyler. The influence function and maximum bias of Tukey’s median. The Annals of Statistics, 30(6):1737–1759, 2002.
- Paindaveine and Van Bever [2018] D. Paindaveine and G. Van Bever. Halfspace depths for scatter, concentration and shape matrices. The Annals of Statistics, 46(6B):3276–3307, 2018.
- Tukey [1975] J. W. Tukey. Mathematics and the picturing of data. In R. James, editor, Proceedings of the International Congress of Mathematicians, volume 2, pages 523–531, Vancouver, 1975.