141,054 research outputs found

    On the Iteration Complexity of Hypergradient Computation

    Get PDF
    We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or even impossible to compute exactly, which has raised the interest in approximation methods. We investigate some popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. Under the hypothesis that the fixed point equation is defined by a contraction mapping, we present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity. This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best. We present an extensive experimental comparison among the methods which confirm the theoretical findings

    On the Iteration Complexity of Hypergradient Computation

    Get PDF
    We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or even impossible to compute exactly, which has raised the interest in approximation methods. We investigate some popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. Under the hypothesis that the fixed point equation is defined by a contraction mapping, we present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity. This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best. We present an extensive experimental comparison among the methods which confirm the theoretical findings.Comment: accepted at ICML 2020; 19 pages, 4 figures; code at https://github.com/prolearner/hypertorch (corrected typos and one reference

    Theory of coupled neuronal-synaptic dynamics

    Full text link
    In neural circuits, synaptic strengths influence neuronal activity by shaping network dynamics, and neuronal activity influences synaptic strengths through activity-dependent plasticity. Motivated by this fact, we study a recurrent-network model in which neuronal units and synaptic couplings are interacting dynamic variables, with couplings subject to Hebbian modification with decay around quenched random strengths. Rather than assigning a specific role to the plasticity, we use dynamical mean-field theory and other techniques to systematically characterize the neuronal-synaptic dynamics, revealing a rich phase diagram. Adding Hebbian plasticity slows activity in chaotic networks and can induce chaos in otherwise quiescent networks. Anti-Hebbian plasticity quickens activity and produces an oscillatory component. Analysis of the Jacobian shows that Hebbian and anti-Hebbian plasticity push locally unstable modes toward the real and imaginary axes, explaining these behaviors. Both random-matrix and Lyapunov analysis show that strong Hebbian plasticity segregates network timescales into two bands with a slow, synapse-dominated band driving the dynamics, suggesting a flipped view of the network as synapses connected by neurons. For increasing strength, Hebbian plasticity initially raises the complexity of the dynamics, measured by the maximum Lyapunov exponent and attractor dimension, but then decreases these metrics, likely due to the proliferation of stable fixed points. We compute the marginally stable spectra of such fixed points as well as their number, showing exponential growth with network size. In chaotic states with strong Hebbian plasticity, a stable fixed point of neuronal dynamics is destabilized by synaptic dynamics, allowing any neuronal state to be stored as a stable fixed point by halting the plasticity. This phase of freezable chaos offers a new mechanism for working memory.Comment: 20 pages, 9 figure

    A geometrical analysis of global stability in trained feedback networks

    Get PDF
    Recurrent neural networks have been extensively studied in the context of neuroscience and machine learning due to their ability to implement complex computations. While substantial progress in designing effective learning algorithms has been achieved in the last years, a full understanding of trained recurrent networks is still lacking. Specifically, the mechanisms that allow computations to emerge from the underlying recurrent dynamics are largely unknown. Here we focus on a simple, yet underexplored computational setup: a feedback architecture trained to associate a stationary output to a stationary input. As a starting point, we derive an approximate analytical description of global dynamics in trained networks which assumes uncorrelated connectivity weights in the feedback and in the random bulk. The resulting mean-field theory suggests that the task admits several classes of solutions, which imply different stability properties. Different classes are characterized in terms of the geometrical arrangement of the readout with respect to the input vectors, defined in the high-dimensional space spanned by the network population. We find that such approximate theoretical approach can be used to understand how standard training techniques implement the input-output task in finite-size feedback networks. In particular, our simplified description captures the local and the global stability properties of the target solution, and thus predicts training performance

    Fixed-Point Performance Analysis of Recurrent Neural Networks

    Full text link
    Recurrent neural networks have shown excellent performance in many applications, however they require increased complexity in hardware or software based implementations. The hardware complexity can be much lowered by minimizing the word-length of weights and signals. This work analyzes the fixed-point performance of recurrent neural networks using a retrain based quantization method. The quantization sensitivity of each layer in RNNs is studied, and the overall fixed-point optimization results minimizing the capacity of weights while not sacrificing the performance are presented. A language model and a phoneme recognition examples are used

    Recurrent backpropagation and the dynamical approach to adaptive neural computation

    Get PDF
    Error backpropagation in feedforward neural network models is a popular learning algorithm that has its roots in nonlinear estimation and optimization. It is being used routinely to calculate error gradients in nonlinear systems with hundreds of thousands of parameters. However, the classical architecture for backpropagation has severe restrictions. The extension of backpropagation to networks with recurrent connections will be reviewed. It is now possible to efficiently compute the error gradients for networks that have temporal dynamics, which opens applications to a host of problems in systems identification and control
    • …