Spatial depth in metric spaces

J. Virta

{}^{1}

${}^{1}$

Department of Mathematics and Statistics, University of Turku, Finland [joni.virta@utu.fi]

1 Background

The term statistical depth refers to a function $D$ that assigns to each point $\mu\in\mathbb{R}^{p}$ a measure of centrality $D(\mu;P)$ with respect to a given probability distribution $P$ on $\mathbb{R}^{p}$ . The larger $D(\mu;P)$ is, the more centrally located $\mu$ is with respect to the probability mass of $P$ . And, vice versa, points with small depths can be seen as outlying in the view of $P$ . This intuitiveness combined with their nonparametric (and often robust) nature has led to statistical depths finding uses in countless applications.

Numerous statistical depths have been proposed in the literature, see, e.g., Mosler and Mozharovskyi [2022] for a review. In this work, our focus is on one particular depth, the spatial depth ( $L_{1}$ -depth) [Vardi and Zhang, 2000], defined as

\displaystyle D_{S}(\mu;P):=1-\left\|\mathrm{E}\left\{\mathrm{sgn}(X-\mu)% \right\}\right\|,

(1)

where $X\sim P$ , $\mathrm{sgn}(x):=\mathrm{I}(x\neq 0)x/\|x\|$ and $\|\cdot\|$ denotes the Euclidean norm. The spatial depth is both robust and has a very natural geometric interpretation: the depth of a point $\mu$ is determined by the length of the expected value of a unit vector drawn from $\mu$ toward a point generated randomly from $P$ .

2 Metric spatial depth

The purpose of this work is to propose an extension of the classical spatial depth (1) to samples of data residing in an arbitrary metric space $(\mathcal{X},d)$ . Such methodology is becoming increasingly important as modern applications produce data that are inherently non-Euclidean (functions, compositions, trees, graphs, rotations, positive-definite matrices, etc.), known also as object data. Hence, devising methodology (known as object data analysis, or metric statistics) that works in arbitrary metric spaces allows for capturing all of these data types at once, see Dubey et al. [2024] for a review.

Consider a simplified scenario where a distribution $P$ taking values in the metric space $\mathcal{X}$ has no atoms, i.e., $P(X=\mu)=0$ for all $\mu\in\mathcal{X}$ . In such a case, our proposed metric spatial depth of the point $\mu\in\mathcal{X}$ with respect to $P$ takes the form

\displaystyle D(\mu;P)=1-\frac{1}{2}\mathrm{E}\left\{\frac{d^{2}(X_{1},\mu)+d^% {2}(X_{2},\mu)-d^{2}(X_{1},X_{2})}{d(X_{1},\mu)d(X_{2},\mu)}\right\},

(2)

where $X_{1},X_{2}$ are drawn independently from $P$ . Simple computation reveals that, when $(\mathcal{X},d)$ is a Euclidean space, (2) and (1) are connected via a one-to-one mapping, making the metric spatial depth a non-Euclidean generalization of the spatial depth.

Our primary contributions regarding the metric spatial depth (2) are:

•

We study the theoretical properties, including robustness, continuity, invariance, convergence and interpretation, of $D(\mu;P)$ . In particular, we show that $D(\mu;P)$ takes values in $[0,2]$ and that the endpoints have elegant geometric characterizations. E.g., the value $D(\mu;P)=2$ is reached if and only if, for i.i.d. $X_{1},X_{2}\sim P$ , the point $\mu$ almost surely lies between $X_{1}$ and $X_{2}$ in terms of the triangle inequality of the space.
•

We explicitly compute the metric spatial depth in several metric spaces, shedding further light on its intuitive meaning.
•

We apply the metric spatial depth to various practical scenarios, including outlier detection and non-convex depth region estimation, also comparing it to several other metric depths.

A preprint of the work is available at https://arxiv.org/abs/2306.09740.

References

Dubey et al. [2024] P. Dubey, Y. Chen, and H.-G. Müller. Metric statistics: Exploration and inference for random objects with distance profiles. Annals of Statistics, 52(2):757–792, 2024.
Mosler and Mozharovskyi [2022] K. Mosler and P. Mozharovskyi. Choosing among notions of multivariate depth statistics. Statistical Science, 37(3):348–368, 2022.
Vardi and Zhang [2000] Y. Vardi and C.-H. Zhang. The multivariate $L_{1}$ -median and associated data depth. Proceedings of the National Academy of Sciences, 97(4):1423–1426, 2000.