Distribution free multivariate rank based tests
Abstract
Rank tests in multivariate setting were recently made possible thanks to the definition of multivariate ranks and signs based on the optimal transport. In our contribution we present several tests whose test statistics asymptotic distribution was derived using the generalization of Hájek projection to multivariate ranks and signs.
-
Department of Probability and Statistics, Faculty of Mathematics and Physics, Charles University, Czechia [daniel.hlubinka@matfyz.cuni.cz]
-
Faculté Solvay Brussels School - E.M., Université libre de Bruxelles, Belgium.
Keywords: Multivariate rank tests – multiple-output regression – MANOVA – multivariate one-sample location tests – Hájek projection.
1 Introduction
Linear models including two-sample tests and ANOVA are one of the most common and useful statistical tests. Their nonparametric rank-based version were, however, restricted to one dimensional outcome or derived under quite restrictive conditions on pseudo-Gaussianity or similar shape constraints. Recent development of optimal measure transport (OMT) and development of the theory of OMT multivariate ranks and signs allowed to propose fully distribution free multivariate rank tests [Hallin et al., 2021]. Their asymptotic normality follows from the Hájek representation that was extended to multivariate settings and allowed construction of the whole range of rank tests for multiple-output linear models [Hallin et al., 2023] including multivariate versions of one-sample location tests [Hlubinka and Hudecová, 2024].
2 Multivariate center-outward ranks and signs
For an absolutely continuous random vector with distribution the center-outward distribution function is defined as the gradient of a convex function pushing forward to , a suitable reference distribution on . In what follows we use for the distribution of a random vector , where and are independent, follows the uniform distribution on , and is uniformly distributed over the unit sphere.
Let be a random sample from . The sample center-outward distribution function is defined as the one-to-one mapping of the sample to a regular grid of points in the unit ball such that the sum is minimal. The grid is chosen to be as much as regular such that the uniform distribution on converges weakly to .
The center-outward ranks are then defined by -valued random variables , and the center-outward signs are defined by unit vectors .
3 Test statistics
We give an example of test statistic of two-sample and one-sample location tests. Consider two independent random samples of -variate absolutely continuous random vectors , and with the same probability distribution except for the location parameter. Let be a score function (van der Waerden, Wilcoxon, …), and , , the multivariate center-outward ranks and signs of the pooled random sample , , and denote .
Let the grid is chosen such that it is balanced, i.e., . Assume further if . Then the two-sample test statistic for the null hypothesis (no location shift)
where is the number of different values of , , tends to the distribution and the null hypothesis is rejected at asymptotic level if exceeds the -quantile of the distribution.
Consider a random sample from an absolutely continuous distribution centrally symmetric around . The one-sample ranks and signs location test statistic of the null hypothesis may be chosen as
or
In both cases the asymptotic distribution of the test statistic under the null hypothesis is the distribution. Note, that the test statistic is a multivariate extension of the univariate sign test while the test statistic is a multivariate extension of the univariate one-sample Wilcoxon tests. More tests may be derived using various score functions.
4 Results
Simulation results show that the power of the OMT ranks and signs two-sample test is comparable with the Wilcoxon elliptical rank test and Hotelling’s two sample test for elliptically symmetric distribution while it outperformes both tests when the underlying distribution is not elliptically symmetric. On the other hand, in higher dimensions the sample size needs to be larger as we need more points to obtain reasonably regular grid in the -dimensional unit ball, and in this case the computation costs of the optimal transport are also rapidly growing.
References
- Hallin et al. [2021] Marc Hallin, Eustasio del Barrio, Juan Cuesta-Albertos, and Carlos Matrán. Distribution and quantile functions, ranks and signs in dimension : A measure transportation approach. Annals of Statistics, 49(2):1139–1165, 2021.
- Hallin et al. [2023] Marc Hallin, Daniel Hlubinka, and Šárka Hudecová. Efficient fully distribution-free center-outward rank tests for multiple-output regression and MANOVA. Journal of the American Statistical Association, 118(543):1923–1939, 2023.
- Hlubinka and Hudecová [2024] Daniel Hlubinka and Šárka Hudecová. One-sample location tests based on center-outward signs and ranks. In Recent Advances in Econometrics and Statistics: Festschrift in Honour of Marc Hallin, pages 29–48. Springer, 2024.