2,318 research outputs found

    Optimization Methods for Inverse Problems

    Full text link
    Optimization plays an important role in solving many inverse problems. Indeed, the task of inversion often either involves or is fully cast as a solution of an optimization problem. In this light, the mere non-linear, non-convex, and large-scale nature of many of these inversions gives rise to some very challenging optimization problems. The inverse problem community has long been developing various techniques for solving such optimization tasks. However, other, seemingly disjoint communities, such as that of machine learning, have developed, almost in parallel, interesting alternative methods which might have stayed under the radar of the inverse problem community. In this survey, we aim to change that. In doing so, we first discuss current state-of-the-art optimization methods widely used in inverse problems. We then survey recent related advances in addressing similar challenges in problems faced by the machine learning community, and discuss their potential advantages for solving inverse problems. By highlighting the similarities among the optimization challenges faced by the inverse problem and the machine learning communities, we hope that this survey can serve as a bridge in bringing together these two communities and encourage cross fertilization of ideas.Comment: 13 page

    Universal Approximation of Parametric Optimization via Neural Networks with Piecewise Linear Policy Approximation

    Full text link
    Parametric optimization solves a family of optimization problems as a function of parameters. It is a critical component in situations where optimal decision making is repeatedly performed for updated parameter values, but computation becomes challenging when complex problems need to be solved in real-time. Therefore, in this study, we present theoretical foundations on approximating optimal policy of parametric optimization problem through Neural Networks and derive conditions that allow the Universal Approximation Theorem to be applied to parametric optimization problems by constructing piecewise linear policy approximation explicitly. This study fills the gap on formally analyzing the constructed piecewise linear approximation in terms of feasibility and optimality and show that Neural Networks (with ReLU activations) can be valid approximator for this approximation in terms of generalization and approximation error. Furthermore, based on theoretical results, we propose a strategy to improve feasibility of approximated solution and discuss training with suboptimal solutions.Comment: 17 pages, 2 figures, preprint, under revie

    Collective stability of networks of winner-take-all circuits

    Full text link
    The neocortex has a remarkably uniform neuronal organization, suggesting that common principles of processing are employed throughout its extent. In particular, the patterns of connectivity observed in the superficial layers of the visual cortex are consistent with the recurrent excitation and inhibitory feedback required for cooperative-competitive circuits such as the soft winner-take-all (WTA). WTA circuits offer interesting computational properties such as selective amplification, signal restoration, and decision making. But, these properties depend on the signal gain derived from positive feedback, and so there is a critical trade-off between providing feedback strong enough to support the sophisticated computations, while maintaining overall circuit stability. We consider the question of how to reason about stability in very large distributed networks of such circuits. We approach this problem by approximating the regular cortical architecture as many interconnected cooperative-competitive modules. We demonstrate that by properly understanding the behavior of this small computational module, one can reason over the stability and convergence of very large networks composed of these modules. We obtain parameter ranges in which the WTA circuit operates in a high-gain regime, is stable, and can be aggregated arbitrarily to form large stable networks. We use nonlinear Contraction Theory to establish conditions for stability in the fully nonlinear case, and verify these solutions using numerical simulations. The derived bounds allow modes of operation in which the WTA network is multi-stable and exhibits state-dependent persistent activities. Our approach is sufficiently general to reason systematically about the stability of any network, biological or technological, composed of networks of small modules that express competition through shared inhibition.Comment: 7 Figure

    Automatic differentiation in machine learning: a survey

    Get PDF
    Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply "autodiff", is a family of techniques similar to but more general than backpropagation for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs. AD is a small but established field with applications in areas including computational fluid dynamics, atmospheric sciences, and engineering design optimization. Until very recently, the fields of machine learning and AD have largely been unaware of each other and, in some cases, have independently discovered each other's results. Despite its relevance, general-purpose AD has been missing from the machine learning toolbox, a situation slowly changing with its ongoing adoption under the names "dynamic computational graphs" and "differentiable programming". We survey the intersection of AD and machine learning, cover applications where AD has direct relevance, and address the main implementation techniques. By precisely defining the main differentiation techniques and their interrelationships, we aim to bring clarity to the usage of the terms "autodiff", "automatic differentiation", and "symbolic differentiation" as these are encountered more and more in machine learning settings.Comment: 43 pages, 5 figure

    Sequential Gaussian Processes for Online Learning of Nonstationary Functions

    Full text link
    Many machine learning problems can be framed in the context of estimating functions, and often these are time-dependent functions that are estimated in real-time as observations arrive. Gaussian processes (GPs) are an attractive choice for modeling real-valued nonlinear functions due to their flexibility and uncertainty quantification. However, the typical GP regression model suffers from several drawbacks: i) Conventional GP inference scales O(N3)O(N^{3}) with respect to the number of observations; ii) updating a GP model sequentially is not trivial; and iii) covariance kernels often enforce stationarity constraints on the function, while GPs with non-stationary covariance kernels are often intractable to use in practice. To overcome these issues, we propose an online sequential Monte Carlo algorithm to fit mixtures of GPs that capture non-stationary behavior while allowing for fast, distributed inference. By formulating hyperparameter optimization as a multi-armed bandit problem, we accelerate mixing for real time inference. Our approach empirically improves performance over state-of-the-art methods for online GP estimation in the context of prediction for simulated non-stationary data and hospital time series data

    Deep Graph-Convolutional Image Denoising

    Full text link
    Non-local self-similarity is well-known to be an effective prior for the image denoising problem. However, little work has been done to incorporate it in convolutional neural networks, which surpass non-local model-based methods despite only exploiting local information. In this paper, we propose a novel end-to-end trainable neural network architecture employing layers based on graph convolution operations, thereby creating neurons with non-local receptive fields. The graph convolution operation generalizes the classic convolution to arbitrary graphs. In this work, the graph is dynamically computed from similarities among the hidden features of the network, so that the powerful representation learning capabilities of the network are exploited to uncover self-similar patterns. We introduce a lightweight Edge-Conditioned Convolution which addresses vanishing gradient and over-parameterization issues of this particular graph convolution. Extensive experiments show state-of-the-art performance with improved qualitative and quantitative results on both synthetic Gaussian noise and real noise

    A Coverage Study of the CMSSM Based on ATLAS Sensitivity Using Fast Neural Networks Techniques

    Get PDF
    We assess the coverage properties of confidence and credible intervals on the CMSSM parameter space inferred from a Bayesian posterior and the profile likelihood based on an ATLAS sensitivity study. In order to make those calculations feasible, we introduce a new method based on neural networks to approximate the mapping between CMSSM parameters and weak-scale particle masses. Our method reduces the computational effort needed to sample the CMSSM parameter space by a factor of ~ 10^4 with respect to conventional techniques. We find that both the Bayesian posterior and the profile likelihood intervals can significantly over-cover and identify the origin of this effect to physical boundaries in the parameter space. Finally, we point out that the effects intrinsic to the statistical procedure are conflated with simplifications to the likelihood functions from the experiments themselves.Comment: Further checks about accuracy of neural network approximation, fixed typos, added refs. Main results unchanged. Matches version accepted by JHE
    corecore