Search CORE

3 research outputs found

Accelerating Generalized Linear Models by Trading off Computation for Uncertainty

Author: Hennig Philipp
Schneider Frank
Tatzel Lukas
Wenger Jonathan
Publication venue
Publication date: 31/10/2023
Field of study

Bayesian Generalized Linear Models (GLMs) define a flexible probabilistic framework to model categorical, ordinal and continuous data, and are widely used in practice. However, exact inference in GLMs is prohibitively expensive for large datasets, thus requiring approximations in practice. The resulting approximation error adversely impacts the reliability of the model and is not accounted for in the uncertainty of the prediction. In this work, we introduce a family of iterative methods that explicitly model this error. They are uniquely suited to parallel modern computing hardware, efficiently recycle computations, and compress information to reduce both the time and memory requirements for GLMs. As we demonstrate on a realistically large classification problem, our method significantly accelerates training by explicitly trading off reduced computation for increased uncertainty.Comment: Main text: 10 pages, 6 figures; Supplements: 13 pages, 2 figure

arXiv.org e-Print Archive

Late-Phase Second-Order Training

Author: Hennig Philipp
Schneider Frank
Tatzel Lukas
Publication venue: OpenReview.net
Publication date: 20/10/2022
Field of study

Publikationsserver der Universität Tübingen

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

Author: Dangel Felix
Hennig Philipp
Tatzel Lukas
Publication venue: Cornell University
Publication date: 04/06/2021
Field of study

Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) approximation is valuable for algorithms that rely on a local model for the loss to train, compress, or explain deep networks. Existing methods based on implicit multiplication via automatic differentiation or Kronecker-factored block diagonal approximations do not consider noise in the mini-batch. We present ViViT, a curvature model that leverages the GGN's low-rank structure without further approximations. It allows for efficient computation of eigenvalues, eigenvectors, as well as per-sample first- and second-order directional derivatives. The representation is computed in parallel with gradients in one backward pass and offers a fine-grained cost-accuracy trade-off, which allows it to scale. We demonstrate this by conducting performance benchmarks and substantiate ViViT's usefulness by studying the impact of noise on the GGN's structural properties during neural network training.Comment: Main text: 10 pages, 6 figures; Supplements: 26 pages, 27 figures, 5 table

arXiv.org e-Print Archive

Publikationsserver der Universität Tübingen