26,078 research outputs found
Weakly Supervised Action Localization by Sparse Temporal Pooling Network
We propose a weakly supervised temporal action localization algorithm on
untrimmed videos using convolutional neural networks. Our algorithm learns from
video-level class labels and predicts temporal intervals of human actions with
no requirement of temporal localization annotations. We design our network to
identify a sparse subset of key segments associated with target actions in a
video using an attention module and fuse the key segments through adaptive
temporal pooling. Our loss function is comprised of two terms that minimize the
video-level action classification error and enforce the sparsity of the segment
selection. At inference time, we extract and score temporal proposals using
temporal class activations and class-agnostic attentions to estimate the time
intervals that correspond to target actions. The proposed algorithm attains
state-of-the-art results on the THUMOS14 dataset and outstanding performance on
ActivityNet1.3 even with its weak supervision.Comment: Accepted to CVPR 201
Learning to Localize and Align Fine-Grained Actions to Sparse Instructions
Automatic generation of textual video descriptions that are time-aligned with
video content is a long-standing goal in computer vision. The task is
challenging due to the difficulty of bridging the semantic gap between the
visual and natural language domains. This paper addresses the task of
automatically generating an alignment between a set of instructions and a first
person video demonstrating an activity. The sparse descriptions and ambiguity
of written instructions create significant alignment challenges. The key to our
approach is the use of egocentric cues to generate a concise set of action
proposals, which are then matched to recipe steps using object recognition and
computational linguistic techniques. We obtain promising results on both the
Extended GTEA Gaze+ dataset and the Bristol Egocentric Object Interactions
Dataset
Why Neurons Have Thousands of Synapses, A Theory of Sequence Memory in Neocortex
Neocortical neurons have thousands of excitatory synapses. It is a mystery
how neurons integrate the input from so many synapses and what kind of
large-scale network behavior this enables. It has been previously proposed that
non-linear properties of dendrites enable neurons to recognize multiple
patterns. In this paper we extend this idea by showing that a neuron with
several thousand synapses arranged along active dendrites can learn to
accurately and robustly recognize hundreds of unique patterns of cellular
activity, even in the presence of large amounts of noise and pattern variation.
We then propose a neuron model where some of the patterns recognized by a
neuron lead to action potentials and define the classic receptive field of the
neuron, whereas the majority of the patterns recognized by a neuron act as
predictions by slightly depolarizing the neuron without immediately generating
an action potential. We then present a network model based on neurons with
these properties and show that the network learns a robust model of time-based
sequences. Given the similarity of excitatory neurons throughout the neocortex
and the importance of sequence memory in inference and behavior, we propose
that this form of sequence memory is a universal property of neocortical
tissue. We further propose that cellular layers in the neocortex implement
variations of the same sequence memory algorithm to achieve different aspects
of inference and behavior. The neuron and network models we introduce are
robust over a wide range of parameters as long as the network uses a sparse
distributed code of cellular activations. The sequence capacity of the network
scales linearly with the number of synapses on each neuron. Thus neurons need
thousands of synapses to learn the many temporal patterns in sensory stimuli
and motor sequences.Comment: Submitted for publicatio
- …