A generalized Bayesian approach for high-dimensional robust regression with serially correlated errors and predictors

Saptarshi Chakraborty1 Kshitij Khare2 George Michailidis3
  • 1

    Department of Biostatistics, State University of New York at Buffalo, Buffalo, New York, USA [chakrab2@buffalo.edu]

  • 2

    University of Florida, Gainesville, Florida, USA [kdkhare@ufl.edu]

  • 3

    Department of Statistics & Data Science, University of California, Los Angeles, California, USA [gmichail@ucla.edu]

Keywords: Robust regression – high dimensional data – serially correlated errors – uncertainty quantification – Generalized Bayes inference

1 Abstract

This work introduces a loss-based generalized Bayesian methodology for high-dimensional robust regression with serially correlated errors and predictors. The proposed framework employs a novel scaled pseudo-Huber (SPH) loss function, which smooths the well-known Huber loss, effectively balancing quadratic (2) and absolute linear (1) loss behaviors. This flexibility enables the framework to accommodate both thin-tailed and heavy-tailed data efficiently. The generalized Bayesian approach constructs a working likelihood based on the SPH loss, facilitating efficient and stable estimation while providing rigorous uncertainty quantification for all model parameters. Notably, this approach allows formal statistical inference without requiring ad hoc tuning parameter selection while adaptively addressing a wide range of tail behavior in the errors. By specifying appropriate prior distributions for the regression coefficients–such as ridge priors for small or moderate-dimensional settings and spike-and-slab priors for high-dimensional settings–the framework ensures principled inference. We establish rigorous theoretical guarantees for accurate parameter estimation and correct predictor selection under sparsity assumptions for a wide range of data-generating setups. Extensive simulation studies demonstrate the superior performance of our approach compared to traditional Bayesian regression methods based on 2 and 1-loss functions. The results highlight its flexibility and robustness, particularly in challenging high-dimensional settings characterized by data contamination.