46 research outputs found
Block Neural Autoregressive Flow
Normalising flows (NFS) map two density functions via a differentiable
bijection whose Jacobian determinant can be computed efficiently. Recently, as
an alternative to hand-crafted bijections, Huang et al. (2018) proposed neural
autoregressive flow (NAF) which is a universal approximator for density
functions. Their flow is a neural network (NN) whose parameters are predicted
by another NN. The latter grows quadratically with the size of the former and
thus an efficient technique for parametrization is needed. We propose block
neural autoregressive flow (B-NAF), a much more compact universal approximator
of density functions, where we model a bijection directly using a single
feed-forward network. Invertibility is ensured by carefully designing each
affine transformation with block matrices that make the flow autoregressive and
(strictly) monotone. We compare B-NAF to NAF and other established flows on
density estimation and approximate inference for latent variable models. Our
proposed flow is competitive across datasets while using orders of magnitude
fewer parameters.Comment: 12 pages, 3 figures, 3 table
Expressive Monotonic Neural Networks
The monotonic dependence of the outputs of a neural network on some of its
inputs is a crucial inductive bias in many scenarios where domain knowledge
dictates such behavior. This is especially important for interpretability and
fairness considerations. In a broader context, scenarios in which monotonicity
is important can be found in finance, medicine, physics, and other disciplines.
It is thus desirable to build neural network architectures that implement this
inductive bias provably. In this work, we propose a weight-constrained
architecture with a single residual connection to achieve exact monotonic
dependence in any subset of the inputs. The weight constraint scheme directly
controls the Lipschitz constant of the neural network and thus provides the
additional benefit of robustness. Compared to currently existing techniques
used for monotonicity, our method is simpler in implementation and in theory
foundations, has negligible computational overhead, is guaranteed to produce
monotonic dependence, and is highly expressive. We show how the algorithm is
used to train powerful, robust, and interpretable discriminators that achieve
competitive performance compared to current state-of-the-art methods across
various benchmarks, from social applications to the classification of the
decays of subatomic particles produced at the CERN Large Hadron Collider.Comment: 9 pages, 4 figures, ICLR 2023 final submissio