Distribution free multivariate rank based tests

D. Hlubinka1 Š. Hudecová1 and M. Hallin2
Abstract

Rank tests in multivariate setting were recently made possible thanks to the definition of multivariate ranks and signs based on the optimal transport. In our contribution we present several tests whose test statistics asymptotic distribution was derived using the generalization of Hájek projection to multivariate ranks and signs.

  • 1

    Department of Probability and Statistics, Faculty of Mathematics and Physics, Charles University, Czechia [daniel.hlubinka@matfyz.cuni.cz]

  • 2

    Faculté Solvay Brussels School - E.M., Université libre de Bruxelles, Belgium.

Keywords: Multivariate rank tests – multiple-output regression – MANOVA – multivariate one-sample location tests – Hájek projection.

1 Introduction

Linear models including two-sample tests and ANOVA are one of the most common and useful statistical tests. Their nonparametric rank-based version were, however, restricted to one dimensional outcome or derived under quite restrictive conditions on pseudo-Gaussianity or similar shape constraints. Recent development of optimal measure transport (OMT) and development of the theory of OMT multivariate ranks and signs allowed to propose fully distribution free multivariate rank tests [Hallin et al., 2021]. Their asymptotic normality follows from the Hájek representation that was extended to multivariate settings and allowed construction of the whole range of rank tests for multiple-output linear models [Hallin et al., 2023] including multivariate versions of one-sample location tests [Hlubinka and Hudecová, 2024].

2 Multivariate center-outward ranks and signs

For an absolutely continuous random vector 𝐗d with distribution P𝐗 the center-outward distribution function is defined as the gradient of a convex function pushing P𝐗 forward to U, a suitable reference distribution on d. In what follows we use for U the distribution of a random vector R𝐙, where R and 𝐙 are independent, R follows the uniform distribution on [0,1], and 𝐙 is uniformly distributed over the unit sphere.

Let 𝐗1,,𝐗n be a random sample from P𝐗. The sample center-outward distribution function 𝐅±n is defined as the one-to-one mapping of the sample to a regular grid 𝒢n of n points in the unit ball such that the sum 𝐅±n(𝐗i)-𝐗i is minimal. The grid 𝒢n is chosen to be as much as regular such that the uniform distribution on 𝒢n converges weakly to U.

The center-outward ranks are then defined by [0,1]-valued random variables Ri=𝐅±n(𝐗i), and the center-outward signs are defined by unit vectors 𝐒i=𝐅±n(𝐗i)/𝐅±n(𝐗i).

3 Test statistics

We give an example of test statistic of two-sample and one-sample location tests. Consider two independent random samples of d-variate absolutely continuous random vectors 𝐗1,,𝐗n1, and 𝐘1,,𝐘n2 with the same probability distribution except for the location parameter. Let J be a score function (van der Waerden, Wilcoxon, …), and (Ri,𝐒i), i{1,,n}, the multivariate center-outward ranks and signs of the pooled random sample 𝐗1,,𝐗n1, 𝐘1,,𝐘n2, and denote n=n1+n2.

Let the grid 𝒢n is chosen such that it is balanced, i.e., 𝐬i𝒢n𝐬i=0. Assume further nmin{n1,n2}/max{n1,n2} if n. Then the two-sample test statistic for the null hypothesis (no location shift)

TJn=ndn1n201J2(u)𝑑ui=1n1J(RinR+1)𝐒i2,

where nR is the number of different values of 𝐬i, 𝐬i𝒢n, tends to the χd2 distribution and the null hypothesis is rejected at asymptotic level α if TJn exceeds the (1-α)-quantile of the χd2 distribution.

Consider a random sample 𝐗1,,𝐗n from an absolutely continuous distribution centrally symmetric around μd. The one-sample ranks and signs location test statistic of the null hypothesis H0:μ=𝟎 may be chosen as

TSn=dni=1n𝐒i2

or

TFn=3dni=1nRi𝐒i2.

In both cases the asymptotic distribution of the test statistic under the null hypothesis is the χd2 distribution. Note, that the test statistic TSn is a multivariate extension of the univariate sign test while the test statistic TFn is a multivariate extension of the univariate one-sample Wilcoxon tests. More tests may be derived using various score functions.

4 Results

Simulation results show that the power of the OMT ranks and signs two-sample test is comparable with the Wilcoxon elliptical rank test and Hotelling’s two sample test for elliptically symmetric distribution while it outperformes both tests when the underlying distribution is not elliptically symmetric. On the other hand, in higher dimensions the sample size n needs to be larger as we need more points to obtain reasonably regular grid 𝒢n in the d-dimensional unit ball, and in this case the computation costs of the optimal transport are also rapidly growing.

References

  • Hallin et al. [2021] Marc Hallin, Eustasio del Barrio, Juan Cuesta-Albertos, and Carlos Matrán. Distribution and quantile functions, ranks and signs in dimension d: A measure transportation approach. Annals of Statistics, 49(2):1139–1165, 2021.
  • Hallin et al. [2023] Marc Hallin, Daniel Hlubinka, and Šárka Hudecová. Efficient fully distribution-free center-outward rank tests for multiple-output regression and MANOVA. Journal of the American Statistical Association, 118(543):1923–1939, 2023.
  • Hlubinka and Hudecová [2024] Daniel Hlubinka and Šárka Hudecová. One-sample location tests based on center-outward signs and ranks. In Recent Advances in Econometrics and Statistics: Festschrift in Honour of Marc Hallin, pages 29–48. Springer, 2024.