9,281 research outputs found
Automatic Variational Inference in Stan
Variational inference is a scalable technique for approximate Bayesian
inference. Deriving variational inference algorithms requires tedious
model-specific calculations; this makes it difficult to automate. We propose an
automatic variational inference algorithm, automatic differentiation
variational inference (ADVI). The user only provides a Bayesian model and a
dataset; nothing else. We make no conjugacy assumptions and support a broad
class of models. The algorithm automatically determines an appropriate
variational family and optimizes the variational objective. We implement ADVI
in Stan (code available now), a probabilistic programming framework. We compare
ADVI to MCMC sampling across hierarchical generalized linear models,
nonconjugate matrix factorization, and a mixture model. We train the mixture
model on a quarter million images. With ADVI we can use variational inference
on any model we write in Stan
Automatic Differentiation Variational Inference
Probabilistic modeling is iterative. A scientist posits a simple model, fits
it to her data, refines it according to her analysis, and repeats. However,
fitting complex models to large data is a bottleneck in this process. Deriving
algorithms for new models can be both mathematically and computationally
challenging, which makes it difficult to efficiently cycle through the steps.
To this end, we develop automatic differentiation variational inference (ADVI).
Using our method, the scientist only provides a probabilistic model and a
dataset, nothing else. ADVI automatically derives an efficient variational
inference algorithm, freeing the scientist to refine and explore many models.
ADVI supports a broad class of models-no conjugacy assumptions are required. We
study ADVI across ten different models and apply it to a dataset with millions
of observations. ADVI is integrated into Stan, a probabilistic programming
system; it is available for immediate use
Robust Cardiac Motion Estimation using Ultrafast Ultrasound Data: A Low-Rank-Topology-Preserving Approach
Cardiac motion estimation is an important diagnostic tool to detect heart
diseases and it has been explored with modalities such as MRI and conventional
ultrasound (US) sequences. US cardiac motion estimation still presents
challenges because of the complex motion patterns and the presence of noise. In
this work, we propose a novel approach to estimate the cardiac motion using
ultrafast ultrasound data. -- Our solution is based on a variational
formulation characterized by the L2-regularized class. The displacement is
represented by a lattice of b-splines and we ensure robustness by applying a
maximum likelihood type estimator. While this is an important part of our
solution, the main highlight of this paper is to combine a low-rank data
representation with topology preservation. Low-rank data representation
(achieved by finding the k-dominant singular values of a Casorati Matrix
arranged from the data sequence) speeds up the global solution and achieves
noise reduction. On the other hand, topology preservation (achieved by
monitoring the Jacobian determinant) allows to radically rule out distortions
while carefully controlling the size of allowed expansions and contractions.
Our variational approach is carried out on a realistic dataset as well as on a
simulated one. We demonstrate how our proposed variational solution deals with
complex deformations through careful numerical experiments. While maintaining
the accuracy of the solution, the low-rank preprocessing is shown to speed up
the convergence of the variational problem. Beyond cardiac motion estimation,
our approach is promising for the analysis of other organs that experience
motion.Comment: 15 pages, 10 figures, Physics in Medicine and Biology, 201
Pushing Stochastic Gradient towards Second-Order Methods -- Backpropagation Learning with Transformations in Nonlinearities
Recently, we proposed to transform the outputs of each hidden neuron in a
multi-layer perceptron network to have zero output and zero slope on average,
and use separate shortcut connections to model the linear dependencies instead.
We continue the work by firstly introducing a third transformation to normalize
the scale of the outputs of each hidden neuron, and secondly by analyzing the
connections to second order optimization methods. We show that the
transformations make a simple stochastic gradient behave closer to second-order
optimization methods and thus speed up learning. This is shown both in theory
and with experiments. The experiments on the third transformation show that
while it further increases the speed of learning, it can also hurt performance
by converging to a worse local optimum, where both the inputs and outputs of
many hidden neurons are close to zero.Comment: 10 pages, 5 figures, ICLR201
- …