Spatial depth in metric spaces

J. Virta1
  • 1

    Department of Mathematics and Statistics, University of Turku, Finland [joni.virta@utu.fi]

1 Background

The term statistical depth refers to a function D that assigns to each point μp a measure of centrality D(μ;P) with respect to a given probability distribution P on p. The larger D(μ;P) is, the more centrally located μ is with respect to the probability mass of P. And, vice versa, points with small depths can be seen as outlying in the view of P. This intuitiveness combined with their nonparametric (and often robust) nature has led to statistical depths finding uses in countless applications.

Numerous statistical depths have been proposed in the literature, see, e.g., Mosler and Mozharovskyi [2022] for a review. In this work, our focus is on one particular depth, the spatial depth (L1-depth) [Vardi and Zhang, 2000], defined as

DS(μ;P):=1-E{sgn(X-μ)}, (1)

where XP, sgn(x):=I(x0)x/x and denotes the Euclidean norm. The spatial depth is both robust and has a very natural geometric interpretation: the depth of a point μ is determined by the length of the expected value of a unit vector drawn from μ toward a point generated randomly from P.

2 Metric spatial depth

The purpose of this work is to propose an extension of the classical spatial depth (1) to samples of data residing in an arbitrary metric space (𝒳,d). Such methodology is becoming increasingly important as modern applications produce data that are inherently non-Euclidean (functions, compositions, trees, graphs, rotations, positive-definite matrices, etc.), known also as object data. Hence, devising methodology (known as object data analysis, or metric statistics) that works in arbitrary metric spaces allows for capturing all of these data types at once, see Dubey et al. [2024] for a review.

Consider a simplified scenario where a distribution P taking values in the metric space 𝒳 has no atoms, i.e., P(X=μ)=0 for all μ𝒳. In such a case, our proposed metric spatial depth of the point μ𝒳 with respect to P takes the form

D(μ;P)=1-12E{d2(X1,μ)+d2(X2,μ)-d2(X1,X2)d(X1,μ)d(X2,μ)}, (2)

where X1,X2 are drawn independently from P. Simple computation reveals that, when (𝒳,d) is a Euclidean space, (2) and (1) are connected via a one-to-one mapping, making the metric spatial depth a non-Euclidean generalization of the spatial depth.

Our primary contributions regarding the metric spatial depth (2) are:

  • We study the theoretical properties, including robustness, continuity, invariance, convergence and interpretation, of D(μ;P). In particular, we show that D(μ;P) takes values in [0,2] and that the endpoints have elegant geometric characterizations. E.g., the value D(μ;P)=2 is reached if and only if, for i.i.d. X1,X2P, the point μ almost surely lies between X1 and X2 in terms of the triangle inequality of the space.

  • We explicitly compute the metric spatial depth in several metric spaces, shedding further light on its intuitive meaning.

  • We apply the metric spatial depth to various practical scenarios, including outlier detection and non-convex depth region estimation, also comparing it to several other metric depths.

A preprint of the work is available at https://arxiv.org/abs/2306.09740.

References

  • Dubey et al. [2024] P. Dubey, Y. Chen, and H.-G. Müller. Metric statistics: Exploration and inference for random objects with distance profiles. Annals of Statistics, 52(2):757–792, 2024.
  • Mosler and Mozharovskyi [2022] K. Mosler and P. Mozharovskyi. Choosing among notions of multivariate depth statistics. Statistical Science, 37(3):348–368, 2022.
  • Vardi and Zhang [2000] Y. Vardi and C.-H. Zhang. The multivariate L1-median and associated data depth. Proceedings of the National Academy of Sciences, 97(4):1423–1426, 2000.