Selective Randomization Inference for Adaptive Experiments
-
Department of Mathematics, EPFL, Lausanne, Switzerland [tobias.freidling@epfl.ch]
-
Statistical Laboratory, DPMMS, University of Cambridge, Cambridge, UK [qyzhao@statslab.cam.ac.uk]
-
Marshall Business School, University of Southern California, Los Angeles, USA [zijungao@marshall.edu]
Keywords: Randomization tests – Selective Inference – Causality
1 Robust Inference in Adaptive Experiments
Many modern controlled experiments are conducted in multiple stages: After each stage, the already collected data is provisionally analysed and – based on these results – the further design of the experiment is adapted. For instance, we may want to focus on a certain subpopulation (enrichment trials) or change the probabilities according to which the newly recruited units are assigned to the different arms (response-adaptive randomization).
While increasing patient benefit and using resources more economically, adaptive studies are challenging to analyse because the data is used twice: (1) for selecting the design of later stages and the null hypothesis, (2) for testing the null hypothesis. The literature on adaptive studies is aware of this issue but suggested solutions often rely on specific designs, parametric models, the assumption of i.i.d. data and/or asymptotic approximations. Therefore, there is a need for methods that provide more robust inference, especially in high-stakes applications such as clinical trials.
2 The Selective Randomization Test
In order to develop such assumption-lean tools, we use randomization tests, which were originally introduced by Fisher [1935]. Since inference is solely based on the known randomness in the assignment to different arms, we do not require any assumptions on the distribution of the outcomes and covariates or the dependence between different units and can provide finite sample guarantees. For this reason, randomization inference has been rediscovered as a robust method of analysing clinical trials, e.g. Rosenberger et al. [2019].
To account for adaptive choices in the design, we use ideas from the literature on post-selection inference [Lee et al., 2016, Fithian et al., 2017]. In a nutshell, we propose a conditional randomization p-value [Zhang and Zhao, 2023] where the conditioning event (only) contains the information that is used for the adaptive decision. In this way, we avoid using data twice and improve over more simplistic approaches like data splitting.
3 Inference and Computation
We show that our proposed selective randomization p-value controls the (selective) type-I error under very general conditions. We illustrate in multiple simulations that it improves power compared to other valid randomization p-values. Furthermore, we show how it can be used to define an estimator of and construct confidence intervals for a homogeneous treatment effect.
In practice, we would often use a Monte Carlo approximation of the p-value to shorten its computation time. To this end, we propose a rejection sampling strategy as well as a Markov Chain Monte Carlo algorithm.
References
- Fisher [1935] R. A. Fisher. The design of experiments. Oliver & Boyd, Edinburgh, 1935.
- Fithian et al. [2017] William Fithian, Dennis Sun, and Jonathan Taylor. Optimal Inference After Model Selection. arXiv: 1410.2597, 2017.
- Freidling et al. [2024] Tobias Freidling, Qingyuan Zhao, and Zijun Gao. Selective randomization inference for adaptive experiments. arXiv: 2405.07026, 2024.
- Lee et al. [2016] Jason D. Lee, Dennis L. Sun, Yuekai Sun, and Jonathan E. Taylor. Exact post-selection inference, with application to the lasso. The Annals of Statistics, 44(3), 2016.
- Rosenberger et al. [2019] William F. Rosenberger, Diane Uschner, and Yanying Wang. Randomization: The forgotten component of the randomized clinical trial. Statistics in Medicine, 38(1):1–12, 2019.
- Zhang and Zhao [2023] Yao Zhang and Qingyuan Zhao. What is a Randomization Test? Journal of the American Statistical Association, 0(0):1–15, 2023.