Metric Oja Depth, New Statistical Tool for Estimating the Most Central Objects
Abstract
Despite the widespread emergence of object data like images, well-developed statistical descriptive tools for data in metric spaces are still few. To address this shortcoming, in this study we propose a robust location estimator called the Metric Oja depth applicable to any object data.
-
Department of Mathematics and Statistics, University of Turku, Finland [vizama@utu.fi]
-
Department of Mathematics and Statistics, University of Turku, Finland [joni.virta@utu.fi]
Keywords: Object Data – Metric Oja Depth – Robust Analysis
1 Background
Data are currently taking on more complex formats, such as images, graphs, matrices, etc., all of which reside in non-Euclidean spaces and are collectively known as object data. In this work, we focus on one of the most fundamental exploratory statistical tasks, robust location estimation, in the context of object data. As our methodological tool, we chose depth functions, one of the lesser-known yet powerful and data-driven tools in exploratory data analysis. The foundation for depth functions was originally introduced in 1975 [Tukey, 1975]. We let be a probability space and take to be a complete and separable metric space where our data (called hereafter “objects”) resides. Further, we let be a probability measure defined on the Borel sets of . Depth functions are such that given an object and a distribution taking values in , the depth describes how central the object is w.r.t. .
1.1 Our contribution
We propose the Metric Oja depth which is an extension of Simplicial volume depth developed for Euclidean space by [Oja, 1983]. Then, we develop two competing strategies for optimizing metric depth functions, i.e., finding the deepest objects with respect to them. Finally, we compare the performance of the Metric Oja depth with three other metric depth functions (half-space, lens, and spatial) in diverse data scenarios. A preprint of the work and simulation codes are available on arXiv and GitHub.
2 Metric Oja Depth
Initially we need to introduce the union event
For any three objects , we use the notation to denote the event that . Thus, means that at least one of is located in between the remaining two objects.
To put the previous ideas of in-betweenness to use, we next define two matrices which are intimately connected to them. Let first be arbitrary objects. We denote by the matrix whose -element equals
Moreover, we let denote the top left principal sub-matrix of . Our next result shows that the determinants of these two matrices contain interesting information on the relations between the four objects. The notation in the result denotes the determinant.
Theorem 1
-
(i)
We have
where an equality is reached if and only if the event holds.
- (ii)
We now build measures of depth using the two matrices. Let be independent random objects and consider, for a fixed object , the quantity
By Theorem 1, taking the square root is well-defined and also implies that measures the outlyingness of the object with respect to the distribution . That is, if takes a small value, then the point must typically be located in between pairs of objects randomly drawn from . To convert into a measure of depth instead, we define
(2) |
where the subscript O refers to “Oja”, since the concept essentially reduces to the classical Oja depth when the metric space is a Euclidean one. A depth based on the matrix is built analogously.
References
- Oja [1983] H. Oja. Descriptive Statistics for Multivariate Distributions. Statistics & Probability Letters, 1(6):327–332, 1983.
- Tukey [1975] J. W. Tukey. Mathematics and the Picturing of Data. In R. James, editor, Proceedings of the International Congress of Mathematicians, volume 2, pages 523–531. Canadian Mathematical Congress, 1975.