Quantile Integrated Depth and Applications

S. Lopez-Pintado1 M. Luo2 S. Nagy3 and T. Ogden4
  • 1

    Department of Public Health and Health Sciences, Northeastern University, Boston, USA [s.lopez-pintado@northeastern.edu]

  • 2

    Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, San Diego, USA [maluo@ucsd.edu]

  • 3

    Department of Probability and Mathematical Statistics, Charles University, Praque, Czech Republic [nagy@karlin.mff.cuni.cz]

  • 4

    Department of Biostatistics, Columbia University, New York, USA [todd.ogden@columbia.edu]

1 Motivation and Summary

Functional data analysis involves data for which the basic unit of observation is a function or image. The development of robust exploratory tools and inferential methods is very much needed since few assumptions can be made about the generating process. Data depth, a well-known non-parametric tool for analyzing functional data, provides a rigorous method for ranking a sample of curves from the center outwards, allowing for robust inference and outlier detection. Several notions of depth for functional data have been introduced in the last few decades (e.g. Fraiman and Muniz, 2001, Lopez-Pintado and Romo, 2009, Narissty and Nair, 2016, among others). Here, we develop a new family of depths, termed quantile integrated depth (QID), that are based on integrating up to the K-th quantile of the univariate depths. We show that this new family of depths satisfies all the desirable properties established in Gijbels and Nagy (2017), including a type of invariance, maximality at the center, and monotonicity with respect to the deepest point. QID generalizes the well-known integral and infimal depths and solves some of the drawbacks of these types of depths. In particular, since functional data are commonly observed with noise, we explore the effect of noise on different notions of depth. A visualization tool called the Spearman agreement depth (SAD) plot is introduced. The SAD plot compares depth measures of corresponding functional observations between two versions of a dataset, an original version of the data and a version of the data with additive noise. Compared to alternatives, the proposed QID is shown to be robust and performs well with noisy functional data. We also illustrate the advantages of using QIDK as a function of K to identify potential hard-to-detect shape outliers.

2 Quantile Integrated Depth

Let X be a space of functions x:SR for SRd a set of positive finite Lebesgue measure, and d1. An example is the Banach space X=𝒞(S) of continuous functions SR equipped with the supremum distance. Let P be a Borel probability measure on the space of function X. For XP we write PX(s)Ps for the marginal distribution of the random variable X(s)R, with sS.

Suppose that a univariate depth D is given. For a function xX and with probability P, to each sS we attach the depth D(x(s);Ps) of the functional value x(s) with respect to the corresponding marginal distribution Ps. We obtain a mapping

Yx,P:S[0,1]:sD(x(s);Ps).

Consider now S with its Borel subsets and a Lebesgue measure λ on S as a probability space. Without loss of generality, we may suppose that λ(S)=1, otherwise we just consider a properly normalized Lebesgue measure instead of λ. The map Yx,P induces a pushforward Borel probability measure in [0,1]. We denote that measure by Dx,P, and its distribution function by

Fx,P:[0,1][0,1]:tλ(Yx,Pt)=Dx,P([0,t]).

This measures the proportion of time that the point-wise depth of x is below t. We also write

Fx,P-1:[0,1][0,1]:uinf{t[0,1]:Fx,P(t)u}

for the corresponding quantile function. For a given K(0,1), the quantile integrated depth of x𝒳 w.r.t. P𝒫(𝒳) is defined as

QID(x;P)=0KFx,P-1(u)𝑑u. (1)

Intuitively, QIDK measures the integral of the K smallest pointwise depths of function x. The distinctive feature of the Quantile Integrated depth (QID) is the attention it gives to the shape of the left lower tail of the pointwise depth distribution Dx,P and this makes QID suitable for identifying possible shape functional outliers. Also, it can be shown that QID satisfies desirable theoretical properties and behaves well in terms of robustness in the presence of noise.

References

  • Fraiman and Muniz [2001] R. Fraiman, and G. Muniz. Trimmed means for functional data. Test, 10:419–440, 2001.
  • Gijbels and Nagy [2017] I. Gijbels, and S. Nagy. On a general definition of depth for functional data. Statist. Sci., 32(2):630–639, 2017.
  • Lopez-Pintado and Romo [2009] S. Lopez-Pintado, and J. Romo. On the notion of depth for functional data. J. Amer. Statist. Assoc., 104:718–734, 2009.
  • Narisetty and Nair [2016] N.N. Narisetty, and V.N. Nair. Extremal depth for functional data and applications. J. Amer. Statist. Assoc., 111:1705–1714, 2016.