Robust and Conjugate Gaussian Process Regression

M. Altamirano

{}^{1}

F-X. Briol

{}^{2}

J. Knoblauch

{}^{3}

Abstract

To enable closed form conditioning, a common assumption in Gaussian process (GP) regression is independent and identically distributed Gaussian observation noise. This strong and simplistic assumption is often violated in practice, which leads to unreliable inferences and uncertainty quantification. Unfortunately, existing methods for robustifying GPs break closed-form conditioning, which makes them less attractive to practitioners and significantly more computationally expensive. In this paper, we demonstrate how to perform provably robust and conjugate Gaussian process (RCGP) regression at virtually no additional cost using generalised Bayesian inference. RCGP is particularly versatile as it enables exact conjugate closed form updates in all settings where standard GPs admit them. To demonstrate its strong empirical performance, we deploy RCGP for problems ranging from Bayesian optimisation to sparse variational Gaussian processes. The code and datasets to reproduce all experiments are available at
https://github.com/maltamiranomontero/RCGP.

${}^{1}$

Department of Statistical Science, University College London, London, UK [matias.altamirano.22@ucl.ac.uk]
${}^{2}$

Department of Statistical Science, University College London, London, UK [f.briol@ucl.ac.uk]
${}^{3}$

Department of Statistical Science, University College London, London, UK [j.knoblauch@ucl.ac.uk]

Keywords: Gaussian Processes – Robustness – Generalised Bayes

1 Abstract

GPs [Rasmussen and Williams, 2006] are one of the most widely used methods for Bayesian inference on latent functions, especially when uncertainty is required. They have numerous appealing properties, including that the prior is relatively interpretable and can be elicited through a choice of mean and covariance functions, as well as the fact that they have closed form posteriors under Gaussian likelihoods. Their convergence is also well understood, even under prior misspecification [Wynne et al., 2021]. Thanks to these advantages, GPs have found applications in diverse problems including single- and multi-output regression [Bonilla et al., 2007, Moreno-Muñoz et al., 2018], emulation of expensive simulators [Santner et al., 2018], Bayesian optimisation [Shahriari et al., 2015, Garnett, 2021] and Bayesian deep learning [Damianou and Lawrence, 2013, Salimbeni et al., 2019, Dutordoir et al., 2020]. Their use is enabled by a plethora of packages including GPflow [Matthews et al., 2017] GPyTorch [Gardner et al., 2018], BoTorch [Balandat et al., 2020], ProbNum [Wenger et al., 2021] and emukit [Paleyes et al., 2023].

By far the most common use of GPs is in regression. Here, the observations correspond to noisy realisations from an unknown latent function that is assumed to be drawn from a GP prior. To obtain a conjugate GP posterior distribution on the latent function, the observation noise is usually assumed to be Gaussian. While assuming Gaussian observation noise makes the posterior tractable, it also makes inferences non-robust. In particular, Gaussian noise makes GPs highly susceptible to extreme values, heterogeneities, and outliers. In many real-world applications and data sets, the presence of outliers is almost inevitable. They can occur for a variety of different reasons, including due to faulty measurements, broken sensors, extreme weather events, stock market sell-offs, or genetic mutations.

Existing Work

The lack of robustness in GPs is a well-known fundamental challenge for their widespread application, and a number of methods have been proposed to address this. Broadly, these fall into two categories. The first replaces the Gaussian measurement error with more heavy-tailed error distributions such as Student’s $t$ [Jylänki et al., 2011, Ranjan et al., 2016], Laplace [Kuss, 2006], Huber densities [Algikar and Mili, 2023], data-dependent noise [Goldberg et al., 1997], or mixture distributions [Naish-Guzman and Holden, 2007, Stegle et al., 2008, Daemi et al., 2019, Lu et al., 2023]. Heavy tails allow these distributions to better accommodate outliers, rendering them more robust to corruptions. Their main limitation lies in their computational cost, as abandoning Gaussian noise nullifies one key advantage of GPs: conjugacy. As a consequence, these techniques rely on approximations via variational methods or Markov chain Monte Carlo. This decreases their accuracy while increasing computational costs. The second set of approaches consists in removing outlying observations before using a standard GP with Gaussian noise [Li et al., 2021, Park et al., 2022, Andrade and Takeda, 2023]. While such approaches use conjugacy, it can be challenging to detect outliers in irregularly spaced data or higher dimensions. Outlier detection also tends to be computationally costly, and often requires estimating large numbers of parameters.

In this paper, we propose a new and third way to achieve robustness that uses generalised Bayesian inference [see e.g. Bissiri et al., 2016, Jewson et al., 2018, Knoblauch et al., 2022]. In doing so, we significantly improve upon an earlier attempt in this direction due to Knoblauch [2019] that was applicable only for variational deep GPs, lacked closed form solutions, and was based on hyperparameters that were difficult to choose. In line with the ideas of generalised Bayesian methods, we will not modify the Gaussian noise model. Instead, we change how information is assimilated, and leverage robust loss functions instead of robust error models.

Contributions

This paper proposes a novel robust and conjugate Gaussian process (RCGP) inspired by a generalised Bayesian inference scheme proposed in Altamirano et al. [2023]. The posteriors rely on a generalised form of score matching [Hyvärinen, 2006, Barp et al., 2019], which effectively down-weights outlying observations. The resulting inference resolves the trade-off between robustness and computation inherent in existing methods: it is robust in the sense of Huber [1981] while retaining closed form solutions for both its posterior and posterior predictive. Additionally—and unlike other robust GPs—RCGPs can easily be plugged into various GP techniques such as sparse variational GPs [Titsias, 2009, Hensman et al., 2013], deep GPs [Damianou and Lawrence, 2013], multi-output GPs [Bonilla et al., 2007], and Bayesian optimisation [Shahriari et al., 2015]. Finally, even in settings where robustness is not required, our experiments show that RCGPs performs as well as standard GPs—raising the possibility that RCGPs may become a preferred default choice over GPs in the future.

References

Algikar and Mili [2023] Pooja Algikar and Lamine Mili. Robust Gaussian process regression with Huber likelihood. arXiv:2301.07858, 2023.
Altamirano et al. [2023] Matias Altamirano, Francois-Xavier Briol, and Jeremias Knoblauch. Robust and scalable Bayesian online changepoint detection. In Proceedings of the 40th International Conference on Machine Learning, pages 642–663. PMLR, 2023.
Andrade and Takeda [2023] Daniel Andrade and Akiko Takeda. Robust Gaussian process regression with the trimmed marginal likelihood. In Uncertainty in Artificial Intelligence, pages 67–76, 2023.
Balandat et al. [2020] Maximilian Balandat, Brian Karrer, Daniel R. Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, and Eytan Bakshy. BoTorch: A framework for efficient Monte-Carlo Bayesian optimization. In Advances in Neural Information Processing Systems 33, 2020. URL http://arxiv.org/abs/1910.06403.
Barp et al. [2019] Alessandro Barp, François-Xavier Briol, Andrew Duncan, Mark Girolami, and Lester Mackey. Minimum Stein discrepancy estimators. In Advances in Neural Information Processing Systems, pages 12964–12976, 2019.
Bissiri et al. [2016] Pier Giovanni Bissiri, Chris C Holmes, and Stephen G Walker. A general framework for updating belief distributions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(5):1103–1130, 2016.
Bonilla et al. [2007] Edwin V Bonilla, Kian Chai, and Christopher Williams. Multi-task Gaussian process prediction. Advances in Neural Information Processing Systems, 20, 2007.
Daemi et al. [2019] Atefeh Daemi, Yousef Alipouri, and Biao Huang. Identification of robust Gaussian process regression with noisy input using EM algorithm. Chemometrics and Intelligent Laboratory Systems, 191:1–11, 2019.
Damianou and Lawrence [2013] Andreas Damianou and Neil D Lawrence. Deep Gaussian processes. In Artificial intelligence and statistics, pages 207–215. PMLR, 2013.
Dewaskar et al. [2023] Miheer Dewaskar, Christopher Tosh, Jeremias Knoblauch, and David B Dunson. Robustifying likelihoods by optimistically re-weighting data. arXiv preprint arXiv:2303.10525, 2023.
Dutordoir et al. [2020] Vincent Dutordoir, Mark Wilk, Artem Artemev, and James Hensman. Bayesian image classification with deep convolutional Gaussian processes. In International Conference on Artificial Intelligence and Statistics, pages 1529–1539. PMLR, 2020.
Gardner et al. [2018] J. R. Gardner, G. Pleiss, D. Bindel, K. Q. Weinberger, and A.G. Wilson. GPyTorch: Blackbox matrix-matrix Gaussian Pprocess inference with GPU acceleration. In Advances in Neural Information Processing Systems, pages 7587–7597, 2018.
Garnett [2021] R. Garnett. Bayesian Optimization. Cambridge University Press, 2021.
Goldberg et al. [1997] Paul Goldberg, Christopher Williams, and Christopher Bishop. Regression with input-dependent noise: A Gaussian process treatment. Advances in Neural Information Processing Systems, 10, 1997.
Hensman et al. [2013] James Hensman, Nicolo Fusi, and Neil D Lawrence. Gaussian processes for big data. arXiv preprint arXiv:1309.6835, 2013.
Huber [1981] Peter J Huber. Robust statistics. Wiley Series in Probability and Mathematical Statistics, 1981.
Hyvärinen [2006] A. Hyvärinen. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6:695–708, 2006.
Jewson et al. [2018] J. Jewson, J. Q. Smith, and C. Holmes. Principled Bayesian minimum divergence inference. Entropy, 20(6):442, 2018.
Jylänki et al. [2011] Pasi Jylänki, Jarno Vanhatalo, and Aki Vehtari. Robust Gaussian process regression with a student-t likelihood. Journal of Machine Learning Research, 12(11), 2011.
Knoblauch [2019] Jeremias Knoblauch. Robust deep Gaussian processes. arXiv preprint arXiv:1904.02303, 2019.
Knoblauch et al. [2022] Jeremias Knoblauch, Jack Jewson, and Theodoros Damoulas. An optimization-centric view on Bayes’ rule: Reviewing and generalizing variational inference. Journal of Machine Learning Research, 23(132):1–109, 2022.
Kuss [2006] Malte Kuss. Gaussian process models for robust regression, classification, and reinforcement learning. PhD thesis, echnische Universität Darmstadt Darmstadt, Germany, 2006.
Li et al. [2021] Zhao-Zhou Li, Lu Li, and Zhengyi Shao. Robust Gaussian process regression based on iterative trimming. Astronomy and Computing, 36:100483, 2021.
Lu et al. [2023] Y. Lu, J. Ma, L. Fang, X. Tian, and J. Jiang. Robust and scalable Gaussian process regression and its applications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21950–21959, 2023.
Matthews et al. [2017] Alexander G. de G. Matthews, Mark van der Wilk, Tom Nickson, Keisuke. Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. GPflow: A Gaussian process library using TensorFlow. Journal of Machine Learning Research, 18(40):1–6, apr 2017. URL http://jmlr.org/papers/v18/16-537.html.
Moreno-Muñoz et al. [2018] Pablo Moreno-Muñoz, Antonio Artés, and Mauricio Alvarez. Heterogeneous multi-output Gaussian process prediction. Advances in Neural Nnformation Processing Systems, 31, 2018.
Naish-Guzman and Holden [2007] Andrew Naish-Guzman and Sean Holden. Robust regression with twinned Gaussian processes. Advances in Neural Information Processing Systems, 20, 2007.
Paleyes et al. [2023] Andrei Paleyes, Maren Mahsereci, and Neil D. Lawrence. Emukit: A Python toolkit for decision making under uncertainty. Proceedings of the Python in Science Conference, 2023.
Park et al. [2022] Chiwoo Park, David J Borth, Nicholas S Wilson, Chad N Hunter, and Fritz J Friedersdorf. Robust Gaussian process regression with a bias model. Pattern Recognition, 124:108444, 2022.
Ranjan et al. [2016] Rishik Ranjan, Biao Huang, and Alireza Fatehi. Robust Gaussian process modeling using EM algorithm. Journal of Process Control, 42:125–136, 2016.
Rasmussen and Williams [2006] Carl Edward Rasmussen and Christopher KI Williams. Gaussian processes for machine learning. MIT press Cambridge, MA, 2006.
Salimbeni et al. [2019] Hugh Salimbeni, Vincent Dutordoir, James Hensman, and Marc Deisenroth. Deep Gaussian processes with importance-weighted variational inference. In International Conference on Machine Learning, pages 5589–5598. PMLR, 2019.
Santner et al. [2018] T. J Santner, B. J Williams, and W. I Notz. The Design and Analysis of Computer Experiments. Springer, 2nd edition, 2018. ISBN 9781493988457.
Shahriari et al. [2015] Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2015.
Stegle et al. [2008] Oliver Stegle, Sebastian V Fallert, David JC MacKay, and Soren Brage. Gaussian process robust regression for noisy heart rate data. IEEE Transactions on Biomedical Engineering, 55(9):2143–2151, 2008.
Titsias [2009] Michalis Titsias. Variational learning of inducing variables in sparse Gaussian processes. In Artificial intelligence and statistics, pages 567–574. PMLR, 2009.
Wenger et al. [2021] J. Wenger, N. Krämer, M. Pförtner, J. Schmidt, N. Bosch, N. Effenberger, J. Zenn, A. Gessner, T. Karvonen, F-X Briol, M. Mahsereci, and P. Hennig. ProbNum: Probabilistic numerics in Python. arXiv:2112.02100, 2021. URL http://arxiv.org/abs/2112.02100.
Wynne et al. [2021] G. Wynne, F.-X. Briol, and M. Girolami. Convergence guarantees for Gaussian process means with misspecified likelihoods and smoothness. Journal of Machine Learning Research, 22(123):1–40, 2021.