Metric Oja Depth, New Statistical Tool for Estimating the Most Central Objects

V. Zamanifarizhandi1 J. Virta2
Abstract

Despite the widespread emergence of object data like images, well-developed statistical descriptive tools for data in metric spaces are still few. To address this shortcoming, in this study we propose a robust location estimator called the Metric Oja depth applicable to any object data.

  • 1

    Department of Mathematics and Statistics, University of Turku, Finland [vizama@utu.fi]

  • 2

    Department of Mathematics and Statistics, University of Turku, Finland [joni.virta@utu.fi]

Keywords: Object Data – Metric Oja Depth – Robust Analysis

1 Background

Data are currently taking on more complex formats, such as images, graphs, matrices, etc., all of which reside in non-Euclidean spaces and are collectively known as object data. In this work, we focus on one of the most fundamental exploratory statistical tasks, robust location estimation, in the context of object data. As our methodological tool, we chose depth functions, one of the lesser-known yet powerful and data-driven tools in exploratory data analysis. The foundation for depth functions was originally introduced in 1975 [Tukey, 1975]. We let (Ω,,) be a probability space and take (𝒳,d) to be a complete and separable metric space where our data (called hereafter “objects”) resides. Further, we let P be a probability measure defined on the Borel sets of 𝒳. Depth functions are such that given an object x𝒳 and a distribution P taking values in 𝒳, the depth D(x;P) describes how central the object x is w.r.t. P.

1.1 Our contribution

We propose the Metric Oja depth which is an extension of Simplicial volume depth developed for Euclidean space by [Oja, 1983]. Then, we develop two competing strategies for optimizing metric depth functions, i.e., finding the deepest objects with respect to them. Finally, we compare the performance of the Metric Oja depth with three other metric depth functions (half-space, lens, and spatial) in diverse data scenarios. A preprint of the work and simulation codes are available on arXiv and GitHub.

2 Metric Oja Depth

Initially we need to introduce the union event

U(x1,x2,x3):=L(x1,x2,x3)L(x2,x3,x1)L(x3,x1,x2).

For any three objects x1,x2,x3𝒳, we use the notation L(x1,x2,x3) to denote the event that d(x1,x3)=d(x1,x2)+d(x2,x3). Thus, U(x1,x2,x3) means that at least one of x1,x2,x3 is located in between the remaining two objects.

To put the previous ideas of in-betweenness to use, we next define two matrices which are intimately connected to them. Let first x0,x1,x2,x3𝒳 be arbitrary objects. We denote by B3(x0,x1,x2,x3) the 3×3 matrix whose (k,)-element equals

12{d2(x0,xk)+d2(x0,x)-d2(xk,x)}.

Moreover, we let B2(x0,x1,x2) denote the 2×2 top left principal sub-matrix of B3(x0,x1,x2,x3). Our next result shows that the determinants of these two matrices contain interesting information on the relations between the four objects. The notation || in the result denotes the determinant.

Theorem 1
  • (i)

    We have

    |B2(x0,x1,x2)|0,

    where an equality is reached if and only if the event U(x0,x1,x2) holds.

  • (ii)

    We have

    |B3(x0,x1,x2,x3)|-4d2(x0,x1)d2(x0,x2)d2(x0,x3). (1)

    If equality is reached in (1), then at least one of the events L(x1,x0,x2), L(x2,x0,x3), L(x3,x0,x1) holds.

We now build measures of depth using the two matrices. Let X1,X2,X3P be independent random objects and consider, for a fixed object x𝒳, the quantity

G3(x)=E[{|B3(x,X1,X2,X3)|+4d2(x,X1)d2(x,X2)d2(x,X3)}1/2].

By Theorem 1, taking the square root is well-defined and also implies that G3(x) measures the outlyingness of the object x with respect to the distribution P. That is, if G3(x) takes a small value, then the point x must typically be located in between pairs of objects randomly drawn from P. To convert G3(x) into a measure of depth instead, we define

DO3(x):=11+G3(x), (2)

where the subscript O refers to “Oja”, since the concept essentially reduces to the classical Oja depth when the metric space is a Euclidean one. A depth based on the matrix B2 is built analogously.

References

  • Oja [1983] H. Oja. Descriptive Statistics for Multivariate Distributions. Statistics & Probability Letters, 1(6):327–332, 1983.
  • Tukey [1975] J. W. Tukey. Mathematics and the Picturing of Data. In R. James, editor, Proceedings of the International Congress of Mathematicians, volume 2, pages 523–531. Canadian Mathematical Congress, 1975.