FDR control with knockoff e-values in robust multivariate compositional regression

G.S. Monti1 P. Filzmoser2
  • 1

    Department of Economics, Management and Statistics, University of Milano-Bicocca, Milano, Italy [gianna.monti@unimib.it]

  • 2

    CSTAT - Computational Statistics Institute of Statistics & Mathematical Methods in Economics, Vienna University of Technology, Vienna, Austria [Peter.Filzmoser@tuwien.ac.at]

Keywords: knockoff – e-values – compositional data – variable selection – microbiome data

The intestinal microbiome plays a crucial role in host health and metabolism, with changes in microbial composition reflecting disease states such as obesity, liver disease, and cancer. Given this relationship, health indicators commonly used in medicine are expected to correlate with microbiome variations. We propose a novel method to identify bacterial species associated with multiple health indicators, providing indirect insights into disease status.

Analyzing microbiome data presents several challenges, including its compositional nature, high dimensionality, sparsity, and susceptibility to outliers. To address these issues, we develop a robust multivariate compositional regression model that accounts for response correlations [Chang and Welsh, 2023], is resistant to outliers, respects compositional constraints [Gloor et al., 2017], and ensures control over the false discovery rate (FDR). Our approach improves upon the Multi-Response Knockoff Filter (MRKF) [Srinivasan et al., 2023] by incorporating a robust framework to handle outliers and introducing a derandomization step to enhance result stability and reproducibility. This derandomization exploits the link between the knockoff method [Barber and Candés, 2015, 2019] and e-values, redefining the knockoff procedure as an e-BH approach [Ren and Barber, 2024].

Our method provides a powerful and reliable tool for identifying meaningful microbiome-health associations in high-dimensional settings.

References