Multivariate permutation tests based on discrete optimal transport

Š. Hudecová1 D. Hlubinka1 and Z. Hlávka1
Abstract

This contribution presents a new approach to multivariate permutation tests that utilizes the discrete optimal measure transport.

  • 1

    Department of Probability and Statistics, Faculty of Mathematics and Physics, Charles University, the Czech Republic [hudecova@karlin.mff.cuni.cz]

Keywords: multivariate permutation test, optimal measure transport

1 Univariate permutation tests

Permutation tests provide a robust and powerful tool for testing various statistical hypotheses. Their use for univariate test statistics is simple, and they yield valid inference under minimal assumptions about the underlying data distribution. Let T0 be a univariate test statistic computed for a given data set. For a chosen number B of repetitions, one randomly shuffles the labels of the observations and calculates the test statistics for the permuted data sets to obtain a permutation sample T1,,TB. The permutation p-value is computed as a proportion of Tj, j=1,,B, that are more extreme than T0.

2 Multivariate permutation tests

There are various problems where the test statistic under investigation is multivariate, with the multiple testing problem being a simple example, as multiple test statistics can be viewed as components of a single multivariate test statistic. Unfortunately, the permutation principle does not straightforwardly extend to the multivariate setup. If the test statistic 𝑻0 takes values in d, d>1, then the permutation sample 𝑻1,,𝑻B can be obtained analogously, but it is not clear how to decide which values are more extreme than 𝑻0 due to the lack of natural ordering in d. Traditional approaches to multivariate permutation tests rely on a suitable transformation of a vector of componentwise permutation p-values. However, this approach has several drawbacks.

The problem of extremeness in d is closely related to a definition of multivariate quantiles. Recently, Hallin et al. [2021] introduced a concept of multivariate quantiles derived from the optimal measure transportation (OMT). This contribution presents a new approach to permutation tests from Hlávka et al. [2024] that utilizes this concept.

3 Application of the optimal transport

The OMT permutation tests are based on the discrete L2 optimal measure transport 𝑭* of the set 𝒯={𝑻0,𝑻1,,𝑻B} of B+1 points in d to a specified regular grid set 𝒢 of B+1 points in the unit ball in d. The extremeness of 𝑻0 is then evaluated via extremeness of its image 𝑭*(𝑻𝟎) among 𝑭*(𝑻j), j=1,,B, i.e. the permutation p-value is computed as the relative frequency of 𝑭*(𝑻j), j=1,,B, such that 𝑭*(𝑻b)𝑭*(𝑻0).

The proposed approach is distribution-free and it requires only mild assumptions about exchangeability under the null hypothesis. Beside the final permutation p-value, the approach allows also to calculate and interpret contributions of the components of the vector test statistic 𝑻0 to the rejection of the null hypothesis. Illustrative practical examples will be presented.

References

  • Hallin et al. [2021] Marc Hallin, Eustasio del Barrio, Juan Cuesta-Albertos, and Carlos Matrán. Distribution and quantile functions, ranks and signs in dimension d: A measure transportation approach. Ann. Statist., 49(2):1139 – 1165, 2021. https://doi.org/10.1214/20-AOS1996.
  • Hlávka et al. [2024] Zdeněk Hlávka, Daniel Hlubinka, and Šárka Hudecová. Multivariate quantile-based permutation tests with application to functional data. J. Comput. Graph. Stat., 2024. https://doi.org/10.1080/10618600.2024.2444302.