Learning with Importance Weighted Variational Inference

K. Daudel1 and F. Roueff2
  • 1

    ESSEC Business School, France [daudel@essec.edu]

  • 2

    Télécom Paris, Institut Polytechnique de Paris, France [roueff@telecom-paris.fr]

1 Abstract

Variational inference (VI) methods seek to find the best approximation to an unknown target posterior density within a more tractable family of probability densities 𝒬 [Jordan et al., 1999, Blei et al., 2017]. A common setting where VI is applied is when one is given a model that depends on a parameter θ and the goal is to optimize the associated marginal log likelihood, with the posterior density being intractable. Since direct optimization of the marginal log likelihood cannot be carried out, variational bounds involving the variational family 𝒬 are constructed as surrogate objective functions to the marginal log likelihood that are more amenable to optimization.

While the most traditional variational bound is the Evidence Lower BOund (ELBO), popular alternatives to the ELBO that rely on importance weighting ideas have been proposed to improve on VI in the context of maximum likelihood optimization, such as the Importance Weighted Auto-Encoder (IWAE) in Burda et al. [2016] and the Variational Rényi (VR) bounds in Li and Turner [2016]. The methodology to learn the parameters of interest using these bounds typically amounts to running gradient-based VI algorithms that incorporate the reparameterization trick [Kingma and Welling, 2014]. However, the way the choice of the variational bound impacts the outcome of VI algorithms can be unclear and an active line of research in VI is then concerned with better comprehending this aspect.

Among the existing works, Daudel et al. [2023] introduce and study the VR-IWAE bound, a variational bound that depends on two hyperparameters (N,α)×[0,1) and that unifies the ELBO, IWAE and VR methodologies when the reparameterization trick is available. In particular, Daudel et al. [2023] provide analyses of the VR-IWAE bound that elucidate the role N and α play in this bound. Yet, solely focusing on the behavior of a variational bound is insufficient to assess the effectiveness of algorithms based on this bound at learning the parameters of interest [Rainforth et al., 2018].

In our work [Daudel and Roueff, 2024], we study the role of N and α in two gradient estimators of the VR-IWAE bound that are at the center of the Importance Weighted VI methodology, namely the reparameterized and doubly-reparameterized gradient estimators of the VR-IWAE bound. In doing so, we provide insights that apply to widely-used gradient estimators of the IWAE and VR bounds and more broadly we further advance the understanding of Importance Weighted VI methods. We illustrate our theoretical findings empirically.

References

  • Blei et al. [2017] David M. Blei, Alp Kucukelbir, and Jon D. McAuliffe. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
  • Burda et al. [2016] Yuri Burda, Roger Grosse, and Ruslan Salakhutdinov. Importance weighted autoencoders. In 4th International Conference on Learning Representations (ICLR), 2016.
  • Daudel and Roueff [2024] Kamélia Daudel and François Roueff. Learning with importance weighted variational inference: Asymptotics for gradient estimators of the vr-iwae bound, 2024.
  • Daudel et al. [2023] Kamélia Daudel, Joe Benton, Yuyang Shi, and Arnaud Doucet. Alpha-divergence variational inference meets importance weighted auto-encoders: Methodology and asymptotics. Journal of Machine Learning Research, 24(243):1–83, 2023.
  • Jordan et al. [1999] Michael Jordan, Zoubin Ghahramani, Tommi Jaakkola, and Lawrence Saul. An introduction to variational methods for graphical models. Machine Learning, 37:183–233, 1999.
  • Kingma and Welling [2014] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. In International Conference on Learning Representations (ICLR), 2014.
  • Li and Turner [2016] Yingzhen Li and Richard E Turner. Rényi divergence variational inference. In Advances in Neural Information Processing Systems, volume 29, 2016.
  • Rainforth et al. [2018] Tom Rainforth, Adam Kosiorek, Tuan Anh Le, Chris Maddison, Maximilian Igl, Frank Wood, and Yee Whye Teh. Tighter variational bounds are not necessarily better. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 4277–4285, 2018.