141,054 research outputs found
On the Iteration Complexity of Hypergradient Computation
We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or even impossible to compute exactly, which has raised the interest in approximation methods. We investigate some popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. Under the hypothesis that the fixed point equation is defined by a contraction mapping, we present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity. This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best. We present an extensive experimental comparison among the methods which confirm the theoretical findings
On the Iteration Complexity of Hypergradient Computation
We study a general class of bilevel problems, consisting in the minimization
of an upper-level objective which depends on the solution to a parametric
fixed-point equation. Important instances arising in machine learning include
hyperparameter optimization, meta-learning, and certain graph and recurrent
neural networks. Typically the gradient of the upper-level objective
(hypergradient) is hard or even impossible to compute exactly, which has raised
the interest in approximation methods. We investigate some popular approaches
to compute the hypergradient, based on reverse mode iterative differentiation
and approximate implicit differentiation. Under the hypothesis that the fixed
point equation is defined by a contraction mapping, we present a unified
analysis which allows for the first time to quantitatively compare these
methods, providing explicit bounds for their iteration complexity. This
analysis suggests a hierarchy in terms of computational efficiency among the
above methods, with approximate implicit differentiation based on conjugate
gradient performing best. We present an extensive experimental comparison among
the methods which confirm the theoretical findings.Comment: accepted at ICML 2020; 19 pages, 4 figures; code at
https://github.com/prolearner/hypertorch (corrected typos and one reference
Theory of coupled neuronal-synaptic dynamics
In neural circuits, synaptic strengths influence neuronal activity by shaping
network dynamics, and neuronal activity influences synaptic strengths through
activity-dependent plasticity. Motivated by this fact, we study a
recurrent-network model in which neuronal units and synaptic couplings are
interacting dynamic variables, with couplings subject to Hebbian modification
with decay around quenched random strengths. Rather than assigning a specific
role to the plasticity, we use dynamical mean-field theory and other techniques
to systematically characterize the neuronal-synaptic dynamics, revealing a rich
phase diagram. Adding Hebbian plasticity slows activity in chaotic networks and
can induce chaos in otherwise quiescent networks. Anti-Hebbian plasticity
quickens activity and produces an oscillatory component. Analysis of the
Jacobian shows that Hebbian and anti-Hebbian plasticity push locally unstable
modes toward the real and imaginary axes, explaining these behaviors. Both
random-matrix and Lyapunov analysis show that strong Hebbian plasticity
segregates network timescales into two bands with a slow, synapse-dominated
band driving the dynamics, suggesting a flipped view of the network as synapses
connected by neurons. For increasing strength, Hebbian plasticity initially
raises the complexity of the dynamics, measured by the maximum Lyapunov
exponent and attractor dimension, but then decreases these metrics, likely due
to the proliferation of stable fixed points. We compute the marginally stable
spectra of such fixed points as well as their number, showing exponential
growth with network size. In chaotic states with strong Hebbian plasticity, a
stable fixed point of neuronal dynamics is destabilized by synaptic dynamics,
allowing any neuronal state to be stored as a stable fixed point by halting the
plasticity. This phase of freezable chaos offers a new mechanism for working
memory.Comment: 20 pages, 9 figure
A geometrical analysis of global stability in trained feedback networks
Recurrent neural networks have been extensively studied in the context of
neuroscience and machine learning due to their ability to implement complex
computations. While substantial progress in designing effective learning
algorithms has been achieved in the last years, a full understanding of trained
recurrent networks is still lacking. Specifically, the mechanisms that allow
computations to emerge from the underlying recurrent dynamics are largely
unknown. Here we focus on a simple, yet underexplored computational setup: a
feedback architecture trained to associate a stationary output to a stationary
input. As a starting point, we derive an approximate analytical description of
global dynamics in trained networks which assumes uncorrelated connectivity
weights in the feedback and in the random bulk. The resulting mean-field theory
suggests that the task admits several classes of solutions, which imply
different stability properties. Different classes are characterized in terms of
the geometrical arrangement of the readout with respect to the input vectors,
defined in the high-dimensional space spanned by the network population. We
find that such approximate theoretical approach can be used to understand how
standard training techniques implement the input-output task in finite-size
feedback networks. In particular, our simplified description captures the local
and the global stability properties of the target solution, and thus predicts
training performance
Fixed-Point Performance Analysis of Recurrent Neural Networks
Recurrent neural networks have shown excellent performance in many
applications, however they require increased complexity in hardware or software
based implementations. The hardware complexity can be much lowered by
minimizing the word-length of weights and signals. This work analyzes the
fixed-point performance of recurrent neural networks using a retrain based
quantization method. The quantization sensitivity of each layer in RNNs is
studied, and the overall fixed-point optimization results minimizing the
capacity of weights while not sacrificing the performance are presented. A
language model and a phoneme recognition examples are used
Recurrent backpropagation and the dynamical approach to adaptive neural computation
Error backpropagation in feedforward neural network models is a popular learning algorithm that has its roots in nonlinear estimation and optimization. It is being used routinely to calculate error gradients in nonlinear systems with hundreds of thousands of parameters. However, the classical architecture for backpropagation has severe restrictions. The extension of backpropagation to networks with recurrent connections will be reviewed. It is now possible to efficiently compute the error gradients for networks that have temporal dynamics, which opens applications to a host of problems in systems identification and control
- …