Scalable Differentially Private Bayesian Optimization

G. Sopa

{}^{1}

J. Marusic

{}^{1}

M. Avella-Medina

{}^{1}

and J.P. Cunningham

{}^{1}

${}^{1}$

Department of Statistics, Columbia University, New York, USA [gs3222,jm5692, ma3874, jpc2181@columbia.edu]

Keywords: Differential Privacy – Bayesian Optimization

1 Abstract

In recent years, there has been much work on scaling Bayesian Optimization to high-dimensional problems, for example hyperparameter tuning in large neural network models. These scalable methods have been successful, finding high objective values much more quickly than traditional global Bayesian Optimization or random search-based methods. At the same time, these large neural network models often use sensitive data, but preservation of Differential Privacy has not scaled alongside these modern Bayesian Optimization procedures.

In this work we develop a method for privately estimating potentially high-dimensional parameter spaces using Gradient Informative Bayesian Optimization. In short, in each iteration our algorithm uses Bayesian Optimization techniques to greedily find parameter configurations that seem informative about the gradient of the objective function at the current parameter configuration estimate. Our algorithm then evaluates the objective function at these configurations, and uses the resulting and all previously observed objective values to infer the gradient of the objective function at the current iterate. The algorithm then performs an iteration of Noisy Gradient Descent along this estimated gradient, as is standard for Empirical Risk Minimization in the privacy literature [Song et al., 2013, Feldman et al., 2020, Bassily et al., 2014], and iterates.

Our theoretical results prove that under suitable conditions, our method converges exponentially fast to a ball around the optimal parameter configuration. This ball is composed of a gradient approximation error and an inevitable privacy noise error, and we demonstrate that the approximation error becomes smaller in magnitude than the privacy noise after a small amount of iterations for reasonable problems. Moreover, regardless of whether the assumptions are satisfied, we show that our algorithm maintains privacy and empirically demonstrates superior performance to existing methods in the high-dimensional hyperparameter setting.

Although our algorithm is designed to privately minimize functions with expensive or impossible to evaluate gradients, it manages to perform competitively even in M-estimation problems of the $\psi$ -type. For example, we show similar performance to standard Noisy Gradient Descent in an M-estimation procedure for a linear regression problem used in Avella-Medina et al. [2023], despite our algorithm not using any provided gradient information.

References

Avella-Medina et al. [2023] Marco Avella-Medina, Casey Bradshaw, and Po-Ling Loh. Differentially private inference via noisy optimization. The Annals of Statistics, 51(5):2067–2092, 2023.
Bassily et al. [2014] Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Foundations of Computer Science, 2014.
Feldman et al. [2020] Vitaly Feldman, Tomer Koren, and Kunal Talwar. Private stochastic convex optimization: Optimal rates in linear time. In Symposium on Theory of Computing, 2020.
Song et al. [2013] Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. Stochastic gradient descent with differentially private updates. In 2013 IEEE Global Conference on Signal and Information Processing, 2013.