945 research outputs found
Hessian Matrix-Free Lagrange-Newton-Krylov-Schur-Schwarz Methods for Elliptic Inverse Problems
This study focuses on the solution of inverse problems for elliptic systems. The inverse problem is constructed as a PDE-constrained optimization, where the cost function is the L2 norm of the difference between the measured data and the predicted state variable, and the constraint is an elliptic PDE. Particular examples of the system considered in this stud, are groundwater flow and radiation transport. The inverse problems are typically ill-posed due to error in measurements of the data. Regularization methods are employed to partially alleviate this problem. The PDE-constrained optimization is formulated as the minimization of a Lagrangian functional, formed from the regularized cost function and the discretized PDE, with respect to the parameters, the state variables, and the Lagrange multipliers. Our approach is known as an all at once method. An algorithm is proposed for an inverse problem that is capable of being extended to large scales. To overcome storage limitations, we develop a parallel preconditioned Newton-Krylov method employed in a Hessian-free manner. The preconditioners have an inner-outer structure, taking the form of a Schur complement (block factorization) at the outer level and Schwarz projections at the inner level. However, building an exact Schur complement is prohibitively expensive. Thus, we use Schur complement approximations, including the identity, probing, the Laplacian, the J operator, and a BFGS operator. For exact data the exact Schur complements are superior to the inexact approximations. However, for data with noise the inexact methods are competitive to or even better than the exact in every computational aspect. We also find that nousymmetric forms of the Karush-Kuhn-Tucker matrices and preconditioners are competitive to or better than the symmetric forms that are commonly used in the optimization community. In this study, iterative Tikhonov and Total Variation regularizations are proposed and compared to the standard regularizations and each other. For exact data with jump discontinuities the standard and iterative Total Variation regulations are superior to the standard and iterative Tikhonov regularizations. However, in the case of noisy data the proposed iterative Tikhonov regularizations are superior to the standard and iterative Total Variation methods. We also show that in some cases the iterative regularizations are better than the noniterative. To demonstrate the performance of the algorithm, including the effectiveness of the preconditioners and regularizations, synthetic one- and two-dimensional elliptic inverse problems are solved, and we also compare with other methodologies that are available in the literature. The proposed algorithm performs well with regard to robustness, reconstructs the parameter models effectively, and is easily implemented in the framework of the available parallel PDE software PETSc and the automatic differentiation software ADIC. The algorithm is also extendable to three-dimensional problems
Diffusion Approximations for Online Principal Component Estimation and Global Convergence
In this paper, we propose to adopt the diffusion approximation tools to study
the dynamics of Oja's iteration which is an online stochastic gradient descent
method for the principal component analysis. Oja's iteration maintains a
running estimate of the true principal component from streaming data and enjoys
less temporal and spatial complexities. We show that the Oja's iteration for
the top eigenvector generates a continuous-state discrete-time Markov chain
over the unit sphere. We characterize the Oja's iteration in three phases using
diffusion approximation and weak convergence tools. Our three-phase analysis
further provides a finite-sample error bound for the running estimate, which
matches the minimax information lower bound for principal component analysis
under the additional assumption of bounded samples.Comment: Appeared in NIPS 201
Automatic differentiation in machine learning: a survey
Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in
machine learning. Automatic differentiation (AD), also called algorithmic
differentiation or simply "autodiff", is a family of techniques similar to but
more general than backpropagation for efficiently and accurately evaluating
derivatives of numeric functions expressed as computer programs. AD is a small
but established field with applications in areas including computational fluid
dynamics, atmospheric sciences, and engineering design optimization. Until very
recently, the fields of machine learning and AD have largely been unaware of
each other and, in some cases, have independently discovered each other's
results. Despite its relevance, general-purpose AD has been missing from the
machine learning toolbox, a situation slowly changing with its ongoing adoption
under the names "dynamic computational graphs" and "differentiable
programming". We survey the intersection of AD and machine learning, cover
applications where AD has direct relevance, and address the main implementation
techniques. By precisely defining the main differentiation techniques and their
interrelationships, we aim to bring clarity to the usage of the terms
"autodiff", "automatic differentiation", and "symbolic differentiation" as
these are encountered more and more in machine learning settings.Comment: 43 pages, 5 figure
- …