264 research outputs found
Sharing storage using dirty vectors
Consider a computation F with n inputs (independent variables) and m outputs (dependent variables) and suppose that we wish to evaluate the Jacobian of F. Automatic differentiation commonly performs this evaluation by associating vector storage either with the program variables (in the case of forward-mode automatic differentiation) or with the adjoint variables (in the case of reverse). Each vector component contains a partial derivative with respect to an independent variable, or a partial derivative of a dependent variable, respectively. The vectors may be full vectors, or they may be dynamically managed sparse data structures. In either case, many of these vectors will be scalar multiples of one another. For example, any intermediate variable produced by a unary operation in the forward mode will have a derivative vector that is a multiple of the derivative for the argument. Any computational graph node that is read just once during its lifetime will have an adjoint vector that is a multiple of the adjoint of the node that reads it. It is frequently wasteful to perform component multiplications explicitly. A scalar multiple of another vector can be replaced by a single multiplicative "scale factor" together with a pointer to the other vector. Automated use of this "dirty vector" technique can save considerable memory management overhead and dramatically reduce the number of floating-point operations required. In particular, dirty vectors often allow shared threads of computation to be reverse-accumulated cheaply. The mechanism permits a number of generalizations, some of which give efficient techniques for preaccumulation
The divergence of the BFGS and Gauss Newton Methods
We present examples of divergence for the BFGS and Gauss Newton methods.
These examples have objective functions with bounded level sets and other
properties concerning the examples published recently in this journal, like
unit steps and convexity along the search lines. As these other examples, the
iterates, function values and gradients in the new examples fit into the
general formulation in our previous work {\it On the divergence of line search
methods, Comput. Appl. Math. vol.26 no.1 (2007)}, which also presents an
example of divergence for Newton's method.Comment: This article was accepted by Mathematical programmin
Fast derivatives of likelihood functionals for ODE based models using adjoint-state method
We consider time series data modeled by ordinary differential equations
(ODEs), widespread models in physics, chemistry, biology and science in
general. The sensitivity analysis of such dynamical systems usually requires
calculation of various derivatives with respect to the model parameters.
We employ the adjoint state method (ASM) for efficient computation of the
first and the second derivatives of likelihood functionals constrained by ODEs
with respect to the parameters of the underlying ODE model. Essentially, the
gradient can be computed with a cost (measured by model evaluations) that is
independent of the number of the ODE model parameters and the Hessian with a
linear cost in the number of the parameters instead of the quadratic one. The
sensitivity analysis becomes feasible even if the parametric space is
high-dimensional.
The main contributions are derivation and rigorous analysis of the ASM in the
statistical context, when the discrete data are coupled with the continuous ODE
model. Further, we present a highly optimized implementation of the results and
its benchmarks on a number of problems.
The results are directly applicable in (e.g.) maximum-likelihood estimation
or Bayesian sampling of ODE based statistical models, allowing for faster, more
stable estimation of parameters of the underlying ODE model.Comment: 5 figure
Scalable Rejection Sampling for Bayesian Hierarchical Models
Bayesian hierarchical modeling is a popular approach to capturing unobserved
heterogeneity across individual units. However, standard estimation methods
such as Markov chain Monte Carlo (MCMC) can be impracticable for modeling
outcomes from a large number of units. We develop a new method to sample from
posterior distributions of Bayesian models, without using MCMC. Samples are
independent, so they can be collected in parallel, and we do not need to be
concerned with issues like chain convergence and autocorrelation. The algorithm
is scalable under the weak assumption that individual units are conditionally
independent, making it applicable for large datasets. It can also be used to
compute marginal likelihoods
- …