5 research outputs found
Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax
The Gumbel-Softmax is a continuous distribution over the simplex that is
often used as a relaxation of discrete distributions. Because it can be readily
interpreted and easily reparameterized, it enjoys widespread use. We propose a
conceptually simpler and more flexible alternative family of reparameterizable
distributions where Gaussian noise is transformed into a one-hot approximation
through an invertible function. This invertible function is composed of a
modified softmax and can incorporate diverse transformations that serve
different specific purposes. For example, the stick-breaking procedure allows
us to extend the reparameterization trick to distributions with countably
infinite support, or normalizing flows let us increase the flexibility of the
distribution. Our construction enjoys theoretical advantages over the
Gumbel-Softmax, such as closed form KL, and significantly outperforms it in a
variety of experiments
CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra
Many areas of machine learning and science involve large linear algebra
problems, such as eigendecompositions, solving linear systems, computing matrix
exponentials, and trace estimation. The matrices involved often have Kronecker,
convolutional, block diagonal, sum, or product structure. In this paper, we
propose a simple but general framework for large-scale linear algebra problems
in machine learning, named CoLA (Compositional Linear Algebra). By combining a
linear operator abstraction with compositional dispatch rules, CoLA
automatically constructs memory and runtime efficient numerical algorithms.
Moreover, CoLA provides memory efficient automatic differentiation, low
precision computation, and GPU acceleration in both JAX and PyTorch, while also
accommodating new objects, operations, and rules in downstream packages via
multiple dispatch. CoLA can accelerate many algebraic operations, while making
it easy to prototype matrix structures and algorithms, providing an appealing
drop-in tool for virtually any computational effort that requires linear
algebra. We showcase its efficacy across a broad range of applications,
including partial differential equations, Gaussian processes, equivariant model
construction, and unsupervised learning.Comment: Code available at https://github.com/wilson-labs/col
A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks
Unlike conventional grid and mesh based methods for solving partial
differential equations (PDEs), neural networks have the potential to break the
curse of dimensionality, providing approximate solutions to problems where
using classical solvers is difficult or impossible. While global minimization
of the PDE residual over the network parameters works well for boundary value
problems, catastrophic forgetting impairs the applicability of this approach to
initial value problems (IVPs). In an alternative local-in-time approach, the
optimization problem can be converted into an ordinary differential equation
(ODE) on the network parameters and the solution propagated forward in time;
however, we demonstrate that current methods based on this approach suffer from
two key issues. First, following the ODE produces an uncontrolled growth in the
conditioning of the problem, ultimately leading to unacceptably large numerical
errors. Second, as the ODE methods scale cubically with the number of model
parameters, they are restricted to small neural networks, significantly
limiting their ability to represent intricate PDE initial conditions and
solutions. Building on these insights, we develop Neural IVP, an ODE based IVP
solver which prevents the network from getting ill-conditioned and runs in time
linear in the number of parameters, enabling us to evolve the dynamics of
challenging PDEs with neural networks.Comment: ICLR 2023. Code available at https://github.com/mfinzi/neural-iv
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
While there has been progress in developing non-vacuous generalization bounds
for deep neural networks, these bounds tend to be uninformative about why deep
learning works. In this paper, we develop a compression approach based on
quantizing neural network parameters in a linear subspace, profoundly improving
on previous results to provide state-of-the-art generalization bounds on a
variety of tasks, including transfer learning. We use these tight bounds to
better understand the role of model size, equivariance, and the implicit biases
of optimization, for generalization in deep learning. Notably, we find large
models can be compressed to a much greater extent than previously known,
encapsulating Occam's razor. We also argue for data-independent bounds in
explaining generalization.Comment: NeurIPS 2022. Code is available at
https://github.com/activatedgeek/tight-pac-baye