FDR control with knockoff e-values in robust multivariate compositional regression
-
Department of Economics, Management and Statistics, University of Milano-Bicocca, Milano, Italy [gianna.monti@unimib.it]
-
CSTAT - Computational Statistics Institute of Statistics & Mathematical Methods in Economics, Vienna University of Technology, Vienna, Austria [Peter.Filzmoser@tuwien.ac.at]
Keywords: knockoff – e-values – compositional data – variable selection – microbiome data
The intestinal microbiome plays a crucial role in host health and metabolism, with changes in microbial composition reflecting disease states such as obesity, liver disease, and cancer. Given this relationship, health indicators commonly used in medicine are expected to correlate with microbiome variations. We propose a novel method to identify bacterial species associated with multiple health indicators, providing indirect insights into disease status.
Analyzing microbiome data presents several challenges, including its compositional nature, high dimensionality, sparsity, and susceptibility to outliers. To address these issues, we develop a robust multivariate compositional regression model that accounts for response correlations [Chang and Welsh, 2023], is resistant to outliers, respects compositional constraints [Gloor et al., 2017], and ensures control over the false discovery rate (FDR). Our approach improves upon the Multi-Response Knockoff Filter (MRKF) [Srinivasan et al., 2023] by incorporating a robust framework to handle outliers and introducing a derandomization step to enhance result stability and reproducibility. This derandomization exploits the link between the knockoff method [Barber and Candés, 2015, 2019] and -values, redefining the knockoff procedure as an e-BH approach [Ren and Barber, 2024].
Our method provides a powerful and reliable tool for identifying meaningful microbiome-health associations in high-dimensional settings.
References
- Controlling the false discovery rate via knockoffs. Ann Stat 43 (5), pp. 2055 – 2085. External Links: Document Cited by: FDR control with knockoff e-values in robust multivariate compositional regression.
- A knockoff filter for high-dimensional selective inference. Ann Stat 47 (5), pp. 2504 – 2537. External Links: Document Cited by: FDR control with knockoff e-values in robust multivariate compositional regression.
- Robust multivariate lasso regression with covariance estimation. J Comput Graph Stat 32 (3), pp. 961–973. External Links: Document Cited by: FDR control with knockoff e-values in robust multivariate compositional regression.
- Microbiome datasets are compositional: and this is not optional. Front Microbiol 8, pp. 2224. External Links: Document Cited by: FDR control with knockoff e-values in robust multivariate compositional regression.
- Derandomised knockoffs: leveraging e-values for false discovery rate control. J R Stat Soc Series B Stat Methodol 86 (1), pp. 122–154. External Links: Document Cited by: FDR control with knockoff e-values in robust multivariate compositional regression.
- Identification of microbial features in multivariate regression under false discovery rate control. Computational Statistics & Data Analysis 181, pp. 107621. External Links: Document Cited by: FDR control with knockoff e-values in robust multivariate compositional regression.