147 research outputs found
Recommended from our members
Inferring actions, intentions, and causal relations in a neural network
From a young age, we can select actions to achieve desired goals, infer the goals of other agents, and learn causalrelations in our environment through social interactions. Crucially, these abilities are productive or generative: for instance,we can impute desires to others that we have never held ourselves. This capacity has been captured by the powerful BayesianTheory of Mind formalism, but it remains to forge connections to the rich neural data around action selection, goal inference,and social causal learning. How can productive inference about actions and intentions arise within the neural circuitry of thebrain? Using the recently-developed linearly solvable Markov decision process, we present a neural network model whichpermits a distributed representation of tasks. Such a representation allows the expression of infinite possibilities by combininga finite set of bases, enabling truly generative inference of actions, goals, and causal relations in a neural network framework
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
Despite the widespread practical success of deep learning methods, our
theoretical understanding of the dynamics of learning in deep neural networks
remains quite sparse. We attempt to bridge the gap between the theory and
practice of deep learning by systematically analyzing learning dynamics for the
restricted case of deep linear neural networks. Despite the linearity of their
input-output map, such networks have nonlinear gradient descent dynamics on
weights that change with the addition of each new hidden layer. We show that
deep linear networks exhibit nonlinear learning phenomena similar to those seen
in simulations of nonlinear networks, including long plateaus followed by rapid
transitions to lower error solutions, and faster convergence from greedy
unsupervised pretraining initial conditions than from random initial
conditions. We provide an analytical description of these phenomena by finding
new exact solutions to the nonlinear dynamics of deep learning. Our theoretical
analysis also reveals the surprising finding that as the depth of a network
approaches infinity, learning speed can nevertheless remain finite: for a
special class of initial conditions on the weights, very deep networks incur
only a finite, depth independent, delay in learning speed relative to shallow
networks. We show that, under certain conditions on the training data,
unsupervised pretraining can find this special class of initial conditions,
while scaled random Gaussian initializations cannot. We further exhibit a new
class of random orthogonal initial conditions on weights that, like
unsupervised pre-training, enjoys depth independent learning times. We further
show that these initial conditions also lead to faithful propagation of
gradients even in deep nonlinear networks, as long as they operate in a special
regime known as the edge of chaos.Comment: Submission to ICLR2014. Revised based on reviewer feedbac
A mathematical theory of semantic development in deep neural networks
An extensive body of empirical research has revealed remarkable regularities
in the acquisition, organization, deployment, and neural representation of
human semantic knowledge, thereby raising a fundamental conceptual question:
what are the theoretical principles governing the ability of neural networks to
acquire, organize, and deploy abstract knowledge by integrating across many
individual experiences? We address this question by mathematically analyzing
the nonlinear dynamics of learning in deep linear networks. We find exact
solutions to this learning dynamics that yield a conceptual explanation for the
prevalence of many disparate phenomena in semantic cognition, including the
hierarchical differentiation of concepts through rapid developmental
transitions, the ubiquity of semantic illusions between such transitions, the
emergence of item typicality and category coherence as factors controlling the
speed of semantic processing, changing patterns of inductive projection over
development, and the conservation of semantic similarity in neural
representations across species. Thus, surprisingly, our simple neural model
qualitatively recapitulates many diverse regularities underlying semantic
development, while providing analytic insight into how the statistical
structure of an environment can interact with nonlinear deep learning dynamics
to give rise to these regularities
Inferring Actions, Intentions, and Causal Relations in a Deep Neural Network
From a young age, we can select actions to achieve desired goals, infer the goals of other agents, and learn causal relations in our environment through social interactions. Crucially, these abilities are productive and generative: we can impute desires to others that we have never held ourselves. These abilities are often captured by only partially overlapping models, each requiring substantial changes to fit combinations of abilities. Here, in an attempt to unify previous models, we present a neural network underpinned by the linearly solvable Markov Decision Process (LMDP) framework which permits a distributed representation of tasks. The network contains two pathways: one captures the desirability of states, and another encodes the passive dynamics of state transitions in the absence of control. Interactions between pathways are bound by a principle of rational action, enabling generative inference of actions, goals, and causal relations supported by gradient updates to parts of the network
If deep learning is the answer, then what is the question?
Neuroscience research is undergoing a minor revolution. Recent advances in
machine learning and artificial intelligence (AI) research have opened up new
ways of thinking about neural computation. Many researchers are excited by the
possibility that deep neural networks may offer theories of perception,
cognition and action for biological brains. This perspective has the potential
to radically reshape our approach to understanding neural systems, because the
computations performed by deep networks are learned from experience, not
endowed by the researcher. If so, how can neuroscientists use deep networks to
model and understand biological brains? What is the outlook for neuroscientists
who seek to characterise computations or neural codes, or who wish to
understand perception, attention, memory, and executive functions? In this
Perspective, our goal is to offer a roadmap for systems neuroscience research
in the age of deep learning. We discuss the conceptual and methodological
challenges of comparing behaviour, learning dynamics, and neural representation
in artificial and biological systems. We highlight new research questions that
have emerged for neuroscience as a direct consequence of recent advances in
machine learning.Comment: 4 Figures, 17 Page
On The Specialization of Neural Modules
A number of machine learning models have been proposed with the goal of achieving systematic generalization: the ability to reason about new situations by combining aspects of previous experiences. These models leverage compositional
architectures which aim to learn specialized modules dedicated to structures in a
task that can be composed to solve novel problems with similar structures. While
the compositionality of these architectures is guaranteed by design, the modules
specializing is not. Here we theoretically study the ability of network modules
to specialize to useful structures in a dataset and achieve systematic generalization. To this end we introduce a minimal space of datasets motivated by practical
systematic generalization benchmarks. From this space of datasets we present a
mathematical definition of systematicity and study the learning dynamics of linear
neural modules when solving components of the task. Our results shed light on the
difficulty of module specialization, what is required for modules to successfully
specialize, and the necessity of modular architectures to achieve systematicity.
Finally, we confirm that the theoretical results in our tractable setting generalize to
more complex datasets and non-linear architectures
- …