Scalable Estimation of Large Generalized Additive Models
M. Zimmermanna, C. Collarinb, S. N. Woodb, F. Ziela
aData Science in Energy and Environment, University of Duisburg-Essen, Germany, bSchool of Mathematics, University of Edinburgh, United Kingdom
Generalized Additive Models (GAM) relate a univariate exponential family response yi to a linear predictor g(μi)=ηi=𝐀iγ+jfj(xij), where smooths fjs are represented in terms of basis functions. As a consequence, the model can be expressed as η=𝐗β. The p regression coefficients, β=(β1,,βp), are estimated by penalized maximum likelihood: β^=argmaxβ[(β)-(2ϕ)-1jλjβ𝐒jβ], where each λj controls the smoothness measured by 𝐒j. The λjs can be estimated by Laplace approximate marginal likelihood (LAML) maximization. However, LAML requires forming and decomposing the Hessian matrix of the penalized log-likelihood. This generally precludes reducing the computational cost below O(np2), thereby making model estimation unfeasible for high-dimensional predictors. Here we propose a way around this bottleneck. We combine the generalized Fellner–Schal smoothing parameter update with stochastic trace estimation (e.g., Hutch++, [1]) and preconditioned conjugate gradients, thus avoiding the formation or Cholesky factorisation of the GAM penalized Hessian. The resulting procedure relies only on matrix–vector products, enables low memory-bandwidth parallelization, exploits model term sparsity with minimal fill-in, and achieves an O(np) computational cost. The performance of the approach is demonstrated on the NMMAPS respiratory mortality data with over one million observations and more than 20,000 coefficients, fitted in just over half an hour. Keywords: Generalized Additive Models; Preconditioned Conjugate Gradient; Stochastic Trace Estimation

References

  • [1] R. A. Meyer, C. Musco, C. Musco, D. P. and Woodruff (2021). Hutch++: Optimal stochastic trace estimation. In: Proc SIAM Symp Simplicity Algorithms, Jan, 142 – 155.