3,141 research outputs found
Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data
Conventional sequential learning methods such as Recurrent Neural Networks
(RNNs) focus on interactions between consecutive inputs, i.e. first-order
Markovian dependency. However, most of sequential data, as seen with videos,
have complex dependency structures that imply variable-length semantic flows
and their compositions, and those are hard to be captured by conventional
methods. Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for
learning video data by discovering these complex structures of the video. The
CB-GLNs represent video data as a graph, with nodes and edges corresponding to
frames of the video and their dependencies respectively. The CB-GLNs find
compositional dependencies of the data in multilevel graph forms via a
parameterized kernel with graph-cut and a message passing framework. We
evaluate the proposed method on the two different tasks for video
understanding: Video theme classification (Youtube-8M dataset) and Video
Question and Answering (TVQA dataset). The experimental results show that our
model efficiently learns the semantic compositional structure of video data.
Furthermore, our model achieves the highest performance in comparison to other
baseline methods.Comment: 8 pages, 3 figures, Association for the Advancement of Artificial
Intelligence (AAAI2020). arXiv admin note: substantial text overlap with
arXiv:1907.0170
CompILE: Compositional Imitation Learning and Execution
We introduce Compositional Imitation Learning and Execution (CompILE): a
framework for learning reusable, variable-length segments of
hierarchically-structured behavior from demonstration data. CompILE uses a
novel unsupervised, fully-differentiable sequence segmentation module to learn
latent encodings of sequential data that can be re-composed and executed to
perform new tasks. Once trained, our model generalizes to sequences of longer
length and from environment instances not seen during training. We evaluate
CompILE in a challenging 2D multi-task environment and a continuous control
task, and show that it can find correct task boundaries and event encodings in
an unsupervised manner. Latent codes and associated behavior policies
discovered by CompILE can be used by a hierarchical agent, where the high-level
policy selects actions in the latent code space, and the low-level,
task-specific policies are simply the learned decoders. We found that our
CompILE-based agent could learn given only sparse rewards, where agents without
task-specific policies struggle.Comment: ICML (2019
Building Machines That Learn and Think Like People
Recent progress in artificial intelligence (AI) has renewed interest in
building systems that learn and think like people. Many advances have come from
using deep neural networks trained end-to-end in tasks such as object
recognition, video games, and board games, achieving performance that equals or
even beats humans in some respects. Despite their biological inspiration and
performance achievements, these systems differ from human intelligence in
crucial ways. We review progress in cognitive science suggesting that truly
human-like learning and thinking machines will have to reach beyond current
engineering trends in both what they learn, and how they learn it.
Specifically, we argue that these machines should (a) build causal models of
the world that support explanation and understanding, rather than merely
solving pattern recognition problems; (b) ground learning in intuitive theories
of physics and psychology, to support and enrich the knowledge that is learned;
and (c) harness compositionality and learning-to-learn to rapidly acquire and
generalize knowledge to new tasks and situations. We suggest concrete
challenges and promising routes towards these goals that can combine the
strengths of recent neural network advances with more structured cognitive
models.Comment: In press at Behavioral and Brain Sciences. Open call for commentary
proposals (until Nov. 22, 2016).
https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/information/calls-for-commentary/open-calls-for-commentar
Teaching Archetypal Design with an Electronic Textbook
How can parallel programming be made tractable for students in high schools and community colleges, to programmers in four-year colleges, to commercial and government employees, to interested independent users learning on their own, and as CASE tools for professional software designers? The computer science community must address this question if the ability of programmers to harness the power of parallel systems is to maintain pace with technology advances forthcoming in parallel systems. This paper addresses some of the issues of bringing parallel programming to the people, ranging from newly developing programmers with little experience on any computer to seasoned programmers of single-processor machines. We aim not only to enable people to use more powerful computers, but also to enable people to use computers more powerfully, by nurturing the techniques that enable them to develop efficient, correct code with relative ease. This paper briefly presents the concept of an Archetype, a software engineering methodology developed at the Caltech for patterns of problem solving, and for providing media for quick reference and natural software reuse. We then describe eText, an interactive multimedia electronic textbook that facilitates the teaching of, navigating through, and referring to Archetypes. Initial experience with Archetypes and the electronic textbook suggests that this approach to teaching parallel programming can aid computer users in the immediate future
Activity Grammars for Temporal Action Segmentation
Sequence prediction on temporal data requires the ability to understand
compositional structures of multi-level semantics beyond individual and
contextual properties. The task of temporal action segmentation, which aims at
translating an untrimmed activity video into a sequence of action segments,
remains challenging for this reason. This paper addresses the problem by
introducing an effective activity grammar to guide neural predictions for
temporal action segmentation. We propose a novel grammar induction algorithm
that extracts a powerful context-free grammar from action sequence data. We
also develop an efficient generalized parser that transforms frame-level
probability distributions into a reliable sequence of actions according to the
induced grammar with recursive rules. Our approach can be combined with any
neural network for temporal action segmentation to enhance the sequence
prediction and discover its compositional structure. Experimental results
demonstrate that our method significantly improves temporal action segmentation
in terms of both performance and interpretability on two standard benchmarks,
Breakfast and 50 Salads.Comment: Accepted to NeurIPS 202
- …