2,318 research outputs found
Optimization Methods for Inverse Problems
Optimization plays an important role in solving many inverse problems.
Indeed, the task of inversion often either involves or is fully cast as a
solution of an optimization problem. In this light, the mere non-linear,
non-convex, and large-scale nature of many of these inversions gives rise to
some very challenging optimization problems. The inverse problem community has
long been developing various techniques for solving such optimization tasks.
However, other, seemingly disjoint communities, such as that of machine
learning, have developed, almost in parallel, interesting alternative methods
which might have stayed under the radar of the inverse problem community. In
this survey, we aim to change that. In doing so, we first discuss current
state-of-the-art optimization methods widely used in inverse problems. We then
survey recent related advances in addressing similar challenges in problems
faced by the machine learning community, and discuss their potential advantages
for solving inverse problems. By highlighting the similarities among the
optimization challenges faced by the inverse problem and the machine learning
communities, we hope that this survey can serve as a bridge in bringing
together these two communities and encourage cross fertilization of ideas.Comment: 13 page
Universal Approximation of Parametric Optimization via Neural Networks with Piecewise Linear Policy Approximation
Parametric optimization solves a family of optimization problems as a
function of parameters. It is a critical component in situations where optimal
decision making is repeatedly performed for updated parameter values, but
computation becomes challenging when complex problems need to be solved in
real-time. Therefore, in this study, we present theoretical foundations on
approximating optimal policy of parametric optimization problem through Neural
Networks and derive conditions that allow the Universal Approximation Theorem
to be applied to parametric optimization problems by constructing piecewise
linear policy approximation explicitly. This study fills the gap on formally
analyzing the constructed piecewise linear approximation in terms of
feasibility and optimality and show that Neural Networks (with ReLU
activations) can be valid approximator for this approximation in terms of
generalization and approximation error. Furthermore, based on theoretical
results, we propose a strategy to improve feasibility of approximated solution
and discuss training with suboptimal solutions.Comment: 17 pages, 2 figures, preprint, under revie
Collective stability of networks of winner-take-all circuits
The neocortex has a remarkably uniform neuronal organization, suggesting that
common principles of processing are employed throughout its extent. In
particular, the patterns of connectivity observed in the superficial layers of
the visual cortex are consistent with the recurrent excitation and inhibitory
feedback required for cooperative-competitive circuits such as the soft
winner-take-all (WTA). WTA circuits offer interesting computational properties
such as selective amplification, signal restoration, and decision making. But,
these properties depend on the signal gain derived from positive feedback, and
so there is a critical trade-off between providing feedback strong enough to
support the sophisticated computations, while maintaining overall circuit
stability. We consider the question of how to reason about stability in very
large distributed networks of such circuits. We approach this problem by
approximating the regular cortical architecture as many interconnected
cooperative-competitive modules. We demonstrate that by properly understanding
the behavior of this small computational module, one can reason over the
stability and convergence of very large networks composed of these modules. We
obtain parameter ranges in which the WTA circuit operates in a high-gain
regime, is stable, and can be aggregated arbitrarily to form large stable
networks. We use nonlinear Contraction Theory to establish conditions for
stability in the fully nonlinear case, and verify these solutions using
numerical simulations. The derived bounds allow modes of operation in which the
WTA network is multi-stable and exhibits state-dependent persistent activities.
Our approach is sufficiently general to reason systematically about the
stability of any network, biological or technological, composed of networks of
small modules that express competition through shared inhibition.Comment: 7 Figure
Automatic differentiation in machine learning: a survey
Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in
machine learning. Automatic differentiation (AD), also called algorithmic
differentiation or simply "autodiff", is a family of techniques similar to but
more general than backpropagation for efficiently and accurately evaluating
derivatives of numeric functions expressed as computer programs. AD is a small
but established field with applications in areas including computational fluid
dynamics, atmospheric sciences, and engineering design optimization. Until very
recently, the fields of machine learning and AD have largely been unaware of
each other and, in some cases, have independently discovered each other's
results. Despite its relevance, general-purpose AD has been missing from the
machine learning toolbox, a situation slowly changing with its ongoing adoption
under the names "dynamic computational graphs" and "differentiable
programming". We survey the intersection of AD and machine learning, cover
applications where AD has direct relevance, and address the main implementation
techniques. By precisely defining the main differentiation techniques and their
interrelationships, we aim to bring clarity to the usage of the terms
"autodiff", "automatic differentiation", and "symbolic differentiation" as
these are encountered more and more in machine learning settings.Comment: 43 pages, 5 figure
Sequential Gaussian Processes for Online Learning of Nonstationary Functions
Many machine learning problems can be framed in the context of estimating
functions, and often these are time-dependent functions that are estimated in
real-time as observations arrive. Gaussian processes (GPs) are an attractive
choice for modeling real-valued nonlinear functions due to their flexibility
and uncertainty quantification. However, the typical GP regression model
suffers from several drawbacks: i) Conventional GP inference scales
with respect to the number of observations; ii) updating a GP model
sequentially is not trivial; and iii) covariance kernels often enforce
stationarity constraints on the function, while GPs with non-stationary
covariance kernels are often intractable to use in practice. To overcome these
issues, we propose an online sequential Monte Carlo algorithm to fit mixtures
of GPs that capture non-stationary behavior while allowing for fast,
distributed inference. By formulating hyperparameter optimization as a
multi-armed bandit problem, we accelerate mixing for real time inference. Our
approach empirically improves performance over state-of-the-art methods for
online GP estimation in the context of prediction for simulated non-stationary
data and hospital time series data
Deep Graph-Convolutional Image Denoising
Non-local self-similarity is well-known to be an effective prior for the
image denoising problem. However, little work has been done to incorporate it
in convolutional neural networks, which surpass non-local model-based methods
despite only exploiting local information. In this paper, we propose a novel
end-to-end trainable neural network architecture employing layers based on
graph convolution operations, thereby creating neurons with non-local receptive
fields. The graph convolution operation generalizes the classic convolution to
arbitrary graphs. In this work, the graph is dynamically computed from
similarities among the hidden features of the network, so that the powerful
representation learning capabilities of the network are exploited to uncover
self-similar patterns. We introduce a lightweight Edge-Conditioned Convolution
which addresses vanishing gradient and over-parameterization issues of this
particular graph convolution. Extensive experiments show state-of-the-art
performance with improved qualitative and quantitative results on both
synthetic Gaussian noise and real noise
A Coverage Study of the CMSSM Based on ATLAS Sensitivity Using Fast Neural Networks Techniques
We assess the coverage properties of confidence and credible intervals on the
CMSSM parameter space inferred from a Bayesian posterior and the profile
likelihood based on an ATLAS sensitivity study. In order to make those
calculations feasible, we introduce a new method based on neural networks to
approximate the mapping between CMSSM parameters and weak-scale particle
masses. Our method reduces the computational effort needed to sample the CMSSM
parameter space by a factor of ~ 10^4 with respect to conventional techniques.
We find that both the Bayesian posterior and the profile likelihood intervals
can significantly over-cover and identify the origin of this effect to physical
boundaries in the parameter space. Finally, we point out that the effects
intrinsic to the statistical procedure are conflated with simplifications to
the likelihood functions from the experiments themselves.Comment: Further checks about accuracy of neural network approximation, fixed
typos, added refs. Main results unchanged. Matches version accepted by JHE
- …