Some insights into depth estimators for location and scatter in the multivariate setting

Jorge Adrover

{}^{1}

Marcelo Ruiz

{}^{2}

${}^{1}$

Facultad de Matemática, Astronomía, Física y Computación, Universidad Nacional de Córdoba, Universidad Nacional de Córdoba, CIEM and CONICET, Córdoba, Argentina
[jorge.adrover@unc.edu.ar ]
${}^{2}$

Facultad de Ciencias Exactas, Físico-Químicas y Naturales, Universidad Nacional de Río Cuarto, Río Cuarto, Argentina[mruiz@exa.unrc.edu.ar]

Keywords: statistical depth – multivariate location and scatter – maximum bias – breakdown point

Abstract

Tukey [1975] introduced the concept of depth in the context of $p-$ dimensional observations in order to come up with a multivariate analog of the median. More precisely, if $\mathbf{x\sim}P$ and $\boldsymbol{\theta\in\mathbb{R}}^{p}$ the depth of $\boldsymbol{\theta}$ is defined to be

\mathcal{D}_{T}\left(\boldsymbol{\theta},P\right)\inf_{\left\|\boldsymbol{% \lambda}\right\|=1,\boldsymbol{\lambda\in\mathbb{R}}^{p}}P(\boldsymbol{\lambda% }^{t}\mathbf{x}\leq\boldsymbol{\lambda}^{t}\boldsymbol{\theta)}

and the Tukey median is taken to be $\hat{\boldsymbol{\theta}}_{T}=\arg\max_{\boldsymbol{\theta\in\mathbb{R}}^{p}}% \mathcal{D}_{T}\left(\boldsymbol{\theta},P\right)$ . Chen et al. [2018] and Paindaveine and Van Bever [2018] also dealt with the concept of depth for the multivariate scatter. Given an initial robust multivariate location estimator $\mathbf{\hat{v}}\in\mathbb{R}^{p},$ Paindaveine and Van Bever [2018] defined the depth of a symmetric positive matrix $\Gamma\in\mathbb{R}^{p\times p}$ as

\displaystyle\begin{aligned} \hfil&D_{C}^{L}\left(\Gamma,P\right)=\\ &\inf_{\mathbf{u\in}\mathcal{S}^{p-1}}\min\left\{P\left(\left|\mathbf{u}^{t}% \left(X-\mathbf{\hat{v}}\right)\right|^{2}\leq\mathbf{u}^{t}\Gamma\mathbf{u}% \right),P\left(\left|\mathbf{u}^{t}\left(X-\mathbf{\hat{v}}\right)\right|^{2}% \geq\mathbf{u}^{t}\Gamma\mathbf{u}\right)\right\}\end{aligned}

(1)

and the deepest estimator as $\hat{\Gamma}^{L}=\arg\underset{\Gamma\succeq\mathbf{0}}{\max}D_{C}^{L}\left(% \Gamma,P\right)$ . Chen et al. [2018] and Paindaveine and Van Bever [2018] agree with the definition when the location is known as we do not need to include $\mathbf{\hat{v}}$ in (1). If $p=1,$ a joint location-scale depth can be derived from either,

D_{LS}^{1}\left(\mu,\sigma,P\right)=\min\left\{\begin{array}[]{c}\inf_{\lambda% \in\mathbb{R}}P\left(\left[\left|y-\mu\right|\leq\left|y-\lambda\right|\right]% \right),\\ \inf_{\gamma>0}P\left(\left[\left|\left|\frac{y-\mu}{\sigma}\right|-1\right|% \leq\left|\left|\frac{y-\mu}{\gamma}\right|-1\right|\right]\right)\end{array}\right\}

D_{LS}^{2}\left(\mu,\sigma,P\right)=\inf_{\gamma,\lambda}P\left(\left[|y-\mu|% \leq|y-\lambda|\right]\cap\left[||\frac{y-\mu}{\sigma}|-1|\leq||\frac{y-\mu}{% \gamma}|-1|\right]\right),

yielding the median as deepest location and the median absolute deviation around the median as the deepest scale estimator in case of the depth $D_{LS}^{1}$ .

Chen et al. [2018] introduced a unified way to study the statistical convergence rate and robustness jointly. Given $\delta\in(0,1/2)$ let $\alpha=1-2\delta$ and $\epsilon\in[0,1/2)$ . Let $\mathcal{P}_{\varepsilon}\left(F_{0}\right)$ be the $\varepsilon$ -contamination neighborhood with $F_{0}=N(\boldsymbol{\theta},\Sigma)$ . Take $\mathcal{F}\left(M\right)$ as the set of symmetric and definite positive matrices $\Sigma$ such that the largest eigenvalue $\lambda_{1}(\Sigma)$ is less than a constant $M>0.$

Chen et al. [2018] derived that, for $\varepsilon\in\left[0,\varepsilon^{\prime}\right]$ , $\varepsilon^{\prime}<1/3$ and $\left(p+\log\left(1/\delta\right)\right)/n$ sufficiently small, there exists a constant $C>0$ (depending on $\varepsilon^{\prime}$ but independent of $p$ , $n$ , $\varepsilon$ ), such that

\inf_{\boldsymbol{\theta},\Sigma\in\mathcal{F}\left(M\right),P\in\mathcal{P}_{% \varepsilon}\left(F_{0}\right)}P\left(\left\|\boldsymbol{\hat{\theta}}_{T}-% \boldsymbol{\theta}\right\|^{2}\leq C\left(\max\left\{\frac{p}{n},\varepsilon^% {2}\right\}+\frac{\log\left(1/\delta\right)}{n}\right)\right)\geq\alpha

(2)

The constant $C$ in (2) is actually affected by the asymptotic maximum bias of the Tukey median. Chen and Tyler [2002] dealt with the asymptotic maximum bias of the Tukey median, $B_{T}\left(\varepsilon,\Sigma\right)=\sqrt{\lambda_{1}(\Sigma)}\Phi^{-1}\left(% \frac{1+\varepsilon}{2\left(1-\varepsilon\right)}\right)$ , over the $\varepsilon-$ contamination neighborhood with $\varepsilon\in[0,1/3)$ .

However, the bound (2) can be derived in a more illuminating manner by explicitly incorporating the maximum bias, as the maximum bias governs the behavior of the estimator when the sample size is sufficiently large. Without enlarging the bound in (2), we obtain a more informative inequality,

\inf_{\boldsymbol{\theta},\Sigma\in\mathcal{F}\left(M\right),P\in\mathcal{P}{}% _{\varepsilon}\left(F_{0}\right)}P\left(\left\|\hat{\boldsymbol{\theta}}_{T}-% \boldsymbol{\theta}\right\|^{2}\leq\tilde{C}\left(\max\left\{\frac{p}{n},B_{T}% ^{2}\left(\varepsilon,I\right)\right\}+\frac{\log\left(1/\delta\right)}{n}% \right)\right)\geq\alpha.

Chen et al. [2018] also came up with an analogous bound for the deepest dispersion estimator in the case of known location. Therefore they considered the $\varepsilon$ -contamination neighborhood with central model $N_{p}\left(\mathbf{0},\Sigma\right)$ . If $\beta>0$ such that $\Phi\left(\sqrt{\beta}\right)=3/4,$ $\hat{\Sigma}=\hat{\Gamma}/\sqrt{\beta}$ and $\left\|A\right\|_{op}$ stands for the norm of the matrix $A$ given by the largest singular value of $A,$ Chen et al. [2018] showed that,

\inf_{\Sigma\in\mathcal{F}\left(M\right),P\in\mathcal{P}_{\varepsilon}\left(F_% {0}\right)}P\left(\left\|\hat{\Sigma}-\Sigma\right\|_{op}^{2}\leq C\left(\max% \left\{\frac{p}{n},\varepsilon^{2}\right\}+\frac{\log\left(1/\delta\right)}{n}% \right)\right)\geq\alpha.

(3)

Similarly to the analysis given for the Tukey median, we can obtain a more accurate inequality,

\inf_{\Sigma\in\mathcal{F}\left(M\right),P\in\mathcal{P}_{\varepsilon}\left(F_% {0}\right)}P\left(\left\|\hat{\Sigma}-\Sigma\right\|_{op}^{2}\leq C^{*}\left(% \max\left\{\frac{p}{n},B_{E}^{2}(\varepsilon)\right\}+\frac{\log\left(1/\delta% \right)}{n}\right)\right)\geq\alpha,

with $B_{E}\left(\varepsilon\right)=\left[\frac{1}{\sqrt{\beta}}\Phi{}^{-1}\left(% \frac{3-\varepsilon}{4\left(1-\varepsilon\right)}\right)-1\right]$ , without enlarging the original bound (3) and highlighting a possible maximum bias for the largest eigenvalue.

For $\varepsilon=1/3$ , $B_{E}\left(1/3\right)=\infty$ , which suggests that the asymptotic breakdown point is $1/3.$ In effect, we prove that the asymptotic breakdown point $\varepsilon^{\ast}\left(\hat{\Gamma}\right)=1/3$ .

Even though the formulation given by Chen et al. [2018] seems to be promising as robust confidence regions, the bounds are extremely large to yield regions with reasonable sizes according to simulations.

References

Chen et al. [2018] M. Chen, C. Gao, and Z. Ren. Robust covariance and scatter matrix estimation under Huber’s contamination model. The Annals of Statistics, 46(5):1932–1960, 2018.
Chen and Tyler [2002] Z. Chen and D. E. Tyler. The influence function and maximum bias of Tukey’s median. The Annals of Statistics, 30(6):1737–1759, 2002.
Paindaveine and Van Bever [2018] D. Paindaveine and G. Van Bever. Halfspace depths for scatter, concentration and shape matrices. The Annals of Statistics, 46(6B):3276–3307, 2018.
Tukey [1975] J. W. Tukey. Mathematics and the picturing of data. In R. James, editor, Proceedings of the International Congress of Mathematicians, volume 2, pages 523–531, Vancouver, 1975.