CBMA: Improving Conformal Prediction through Bayesian Model Averaging

P. Bhagwat

{}^{1}

L. Kong

{}^{1}

and B. Jiang

{}^{1}

${}^{1}$

Department of Mathematical and Statistical Sciences, University of Alberta, Canada [pbhagwat,lkong, bei1@ualberta.ca]

Keywords: Model uncertainty – Machine learning – Confidence – Prediction regions

1 Abstract

Conformal prediction has emerged as a popular technique for facilitating valid predictive inference across a spectrum of machine learning models, under minimal assumption of exchangeability. Recently, Hoff [2023] showed that full conformal Bayes provides the most efficient prediction sets (smallest by expected volume) among all prediction sets that are valid at the $(1-\alpha)$ level if the model is correctly specified. However, a critical issue arises when the Bayesian model itself may be mis-specified, resulting in prediction set that might be suboptimal, even though it still enjoys the frequentist coverage guarantee. To address this limitation, we propose an innovative solution that combines Bayesian model averaging (BMA) with conformal prediction. This hybrid not only leverages the strengths of Bayesian conformal prediction but also introduces a layer of robustness through model averaging. Theoretically, we prove that the resulting prediction set will converge to the optimal level of efficiency, if the true model is included among the candidate models. This assurance of optimality, even under potential model uncertainty, provides a significant improvement over existing methods, ensuring more reliable and precise uncertainty quantification.

2 Motivation

Conformal prediction is a distribution-free uncertainty quantification method that generates prediction sets with valid coverage under minimal assumptions of exchangeability [Vovk et al., 2005]. By building on existing machine learning techniques, conformal prediction uses available data to establish valid prediction set $C_{\alpha}(X_{n+1})$ for new instances $X_{n+1}$ with coverage guarantee $P(Y_{n+1}\in C_{\alpha}(X_{n+1}))\geq 1-\alpha$ . Prediction sets that include the ground truth with high probability are essential for high-stakes applications, such as autonomous vehicles and clinical settings. However, it is also preferable for the prediction sets $C_{\alpha}(X_{n+1}))$ to be efficient i.e. as small as possible, as smaller sets are more informative. Recently, Hoff [2023] investigated the optimal efficiency of full conformal Bayes prediction sets under a correctly specified Bayesian model, showing that they yield prediction sets with the minimum expected volume at a given target coverage level $(1-\alpha)$ . In addition, Fong and Holmes [2021] introduced scalable methods for generating full conformal Bayes prediction sets applicable to any Bayesian model, highlighting the advantages of conformal Bayesian predictions over traditional Bayesian posterior predictive sets, especially in the presence of model misspecification.

However, constructing or selecting the correct Bayesian model to obtain optimally efficient conformal prediction sets is a challenging task. Although conformal prediction sets provide finite-sample coverage guarantees for any model, they may fail to achieve volume optimality when the underlying model is misspecified. This challenge underscores a broader limitation of traditional conformal prediction methods, which typically rely on a fixed machine learning model to generate prediction sets with a predetermined marginal coverage level. Given the vast array of possible predictive models for a given problem, and the fact that conformal methods yield valid sets for all of them, identifying the most appropriate model becomes inherently difficult. This issue of model uncertainty has long been a persistent and underexplored problem in the conformal prediction literature.

3 Our Contributions

In this paper, we provide an innovative solution in the form of Conformal Bayesian model averaging (CBMA) to the challenging issue of constructing efficient conformal prediction sets when there is model uncertainty in the Bayesian framework. Our proposed CBMA prediction method seamlessly combines conformity scores of each Bayesian model in order to construct a single conformal prediction set. This paves the way for incorporating model averaging procedures into conformal prediction framework which is currently lacking in the literature. To the best of our knowledge, our CBMA approach is the first method which combines conformity scores from diverse models to construct valid conformal prediction sets and requires no data splitting (data efficient). Our choice of conformity scores in the form of posterior predictive densities is also optimal and leads to the most efficient prediction set when the underlying Bayesian model is true [Hoff, 2023]. Our CBMA method can construct full conformal prediction sets given samples for model parameters of each model from their posterior distributions and posterior model probabilities, which can be obtained as the output of BMA. Theoretically, we also prove the optimal efficiency achieved by conformal prediction sets constructed using CBMA method as the sample size increases. Such guarantee of optimal efficiency even under model uncertainty provides an improvement over existing methods. Finally, our method incorporates both data and model uncertainty into the construction of prediction sets which enhances the reliability of the predictions.

4 Methodology

In this section, we summarize our CBMA approach to construct conformal predictive sets. We consider a collection of i.i.d. observations $Z_{1:n}=\{(X_{i},Y_{i})\}_{i=1}^{n}$ , where each observation $(X_{i},Y_{i})\in\mathbb{R}^{d}\times\mathbb{R}$ is a covariate-response pair. For each model $\mathcal{M}_{k}$ in the model space $\boldsymbol{\mathcal{M}}=\{\mathcal{M}_{1},\ldots,\mathcal{M}_{K}\}$ , we consider Bayesian prediction using training data $Z_{1:n}$ for an outcome of interest $Y_{i}\in\mathbb{R}$ and covariates $X_{i}\in\mathbb{R}^{d}$ . Given a model likelihood $p_{\theta_{k}}(y_{i}|x_{i})$ and prior on parameters, $\pi_{k}(\theta_{k})$ for $\theta_{k}\in\Theta_{k}$ , the posterior predictive distribution for the response at a new $X_{n+1}=x_{n+1}$ takes on the form

\displaystyle p_{\mathcal{M}_{k}}(y|x_{n+1},Z_{1:n})=\int p_{\theta_{k}}(y|x_{% n+1})\pi_{k}(\theta_{k}|Z_{1:n})\;\text{d}\theta_{k},

where $\pi_{k}(\theta_{k}|Z_{1:n})$ is the Bayesian posterior and $\pi_{k}(\theta_{k}|Z_{1:n})\propto\pi_{k}(\theta_{k})\prod\limits_{i=1}^{n}p_{% \theta_{k}}(y_{i}|x_{i})$ . We first fit $K$ candidate models using BMA [Raftery et al., 1997, Hoeting et al., 1999]. Following Fong and Holmes [2021], full conformal Bayes prediction sets are constructed using posterior predictive densities as conformity measures:

\displaystyle\sigma_{i}^{\mathcal{M}_{k}}=p_{\mathcal{M}_{k}}(Y_{i}|X_{i},Z_{1% :n+1}).

Finally, we aggregate individual model conformity scores $\sigma_{i}^{\mathcal{M}_{k}}$ to construct a weighted combination $\sigma_{i}^{CBMA}$ defined as:

\displaystyle\sigma_{i}^{CBMA}=\sum\limits_{k=1}^{K}q_{k}(Z_{1:n},Z_{n+1})% \sigma_{i}^{\mathcal{M}_{k}},

(1)

where

\displaystyle q_{k}(Z_{1:n},Z_{n+1})=\frac{p(\mathcal{M}_{k}|Z_{1:n})p_{% \mathcal{M}_{k}}(y|x_{n+1},Z_{1:n})}{\sum\limits_{k=1}^{K}p(\mathcal{M}_{k}|Z_% {1:n})p_{\mathcal{M}_{k}}(y|x_{n+1},Z_{1:n})}.

(2)

We now define the conformal prediction set [Vovk et al., 2005, Shafer and Vovk, 2008] as

	$\displaystyle r_{CBMA}(y)$	$\displaystyle=\frac{1}{n+1}\sum\limits_{i=1}^{n+1}\boldsymbol{1}(\sigma^{CBMA}% _{i}\leq\sigma^{CBMA}_{n+1}),$
	$\displaystyle C^{CBMA}_{\alpha}(X_{n+1})$	$\displaystyle=\{y\in\mathcal{Y}:r_{CBMA}(y)>\alpha\}.$			(3)

References

E. Fong and C. C. Holmes (2021) Conformal bayesian computation. Advances in Neural Information Processing Systems 34, pp. 18268–18279. Cited by: §2, §4.
J. A. Hoeting, D. Madigan, A. E. Raftery, and C. T. Volinsky (1999) Bayesian model averaging: a tutorial. Statistical science 14 (4), pp. 382–417. Cited by: §4.
P. Hoff (2023) Bayes-optimal prediction with frequentist coverage control. Bernoulli 29 (2), pp. 901–928. Cited by: §1, §2, §3.
A. E. Raftery, D. Madigan, and J. A. Hoeting (1997) Bayesian model averaging for linear regression models. Journal of the American Statistical Association 92 (437), pp. 179–191. Cited by: §4.
G. Shafer and V. Vovk (2008) A tutorial on conformal prediction.. Journal of Machine Learning Research 9 (3). Cited by: §4.
V. Vovk, A. Gammerman, and G. Shafer (2005) Algorithmic learning in a random world. Vol. 29, Springer. Cited by: §2, §4.