Bayesian Variable Selection in Generalized Linear Models

L. Filippozzia,b, I. Urteagac,d and C. Agostinellia

aUniversity of Trento, bFondazione Bruno Kessler (FBK), cBasque Center for Applied Mathematics (BCAM), dIkerbasque — Basque Foundation for Science

Variable or covariate selection is a crucial step aimed at identifying the most relevant predictors for explaining the response variable.

In this work, we propose BayesVS-GLM, a novel Bayesian covariate selection method for Generalized Linear Models (GLMs). BayesVS-GLM is based on a fully conjugate Bayesian hierarchical model that comes with theoretical posterior consistency guarantees. Specifically, we extend the standard GLM framework by introducing a binary vector z to indicate which covariates are included in the generalized linear predictor. The regression coefficients β are modeled conditionally on z, using conjugate priors for GLMs [4].

Although related method exists ([1], [2], [3]), our method is, to the best of our knowledge, the first that provides a unified framework in which: (i) the formulation of the hierarchical GLM is fully conjugate; (ii) the GLM likelihood is explicitly dependent on indicator variables z, enabling a regressor selection based uniquely on observed data; and (iii) the posterior asymptotical accuracy of z and posterior consistency of the regression coefficients β are guaranteed. For posterior inference, we present an efficient Gibbs sampling algorithm, based on a fully conjugate Bayesian hierarchical model.

The BayesVS-GLM formulation is applicable to any distribution within the exponential family, and unifies a range of existing Bayesian variable selection perspectives within a single coherent hierarchical framework.

Keywords: Bayesian Variable Selection, Generalized Linear Models, Posterior model selection.

References

  • [1] L. Kuo, and B. Mallick (1998). Variable selection for regression models. Sankhyā: The Indian Journal of Statistics, Series B, 60(1), 65–81.
  • [2] P. Dellaportas, J. Forster, and I. Ntzoufras (2002). On Bayesian model and variable selection using MCMC. In Statistics and computing, 12(1), 27–36.
  • [3] N. N. Narisetty, S. Juan, and H. Xuming (2019). Skinny Gibbs: A Consistent and Scalable Gibbs Sampler for Model Selection. In Journal of the American Statistical Association, 114(527), 1205–1217.
  • [4] M. Chen, and J. G. Ibrahim (2003). Conjugate priors for generalized linear models. In Statistica Sinica, 13(2), 461–476.