129,123 research outputs found
Scale Attention for Learning Deep Face Representation: A Study Against Visual Scale Variation
Human face images usually appear with wide range of visual scales. The
existing face representations pursue the bandwidth of handling scale variation
via multi-scale scheme that assembles a finite series of predefined scales.
Such multi-shot scheme brings inference burden, and the predefined scales
inevitably have gap from real data. Instead, learning scale parameters from
data, and using them for one-shot feature inference, is a decent solution. To
this end, we reform the conv layer by resorting to the scale-space theory, and
achieve two-fold facilities: 1) the conv layer learns a set of scales from real
data distribution, each of which is fulfilled by a conv kernel; 2) the layer
automatically highlights the feature at the proper channel and location
corresponding to the input pattern scale and its presence. Then, we accomplish
the hierarchical scale attention by stacking the reformed layers, building a
novel style named SCale AttentioN Conv Neural Network (\textbf{SCAN-CNN}). We
apply SCAN-CNN to the face recognition task and push the frontier of SOTA
performance. The accuracy gain is more evident when the face images are blurry.
Meanwhile, as a single-shot scheme, the inference is more efficient than
multi-shot fusion. A set of tools are made to ensure the fast training of
SCAN-CNN and zero increase of inference cost compared with the plain CNN
Continual Robot Learning using Self-Supervised Task Inference
Endowing robots with the human ability to learn a growing set of skills over
the course of a lifetime as opposed to mastering single tasks is an open
problem in robot learning. While multi-task learning approaches have been
proposed to address this problem, they pay little attention to task inference.
In order to continually learn new tasks, the robot first needs to infer the
task at hand without requiring predefined task representations. In this paper,
we propose a self-supervised task inference approach. Our approach learns
action and intention embeddings from self-organization of the observed movement
and effect parts of unlabeled demonstrations and a higher-level behavior
embedding from self-organization of the joint action-intention embeddings. We
construct a behavior-matching self-supervised learning objective to train a
novel Task Inference Network (TINet) to map an unlabeled demonstration to its
nearest behavior embedding, which we use as the task representation. A
multi-task policy is built on top of the TINet and trained with reinforcement
learning to optimize performance over tasks. We evaluate our approach in the
fixed-set and continual multi-task learning settings with a humanoid robot and
compare it to different multi-task learning baselines. The results show that
our approach outperforms the other baselines, with the difference being more
pronounced in the challenging continual learning setting, and can infer tasks
from incomplete demonstrations. Our approach is also shown to generalize to
unseen tasks based on a single demonstration in one-shot task generalization
experiments.Comment: Accepted for publication in IEEE Transactions on Cognitive and
Developmental System
NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding
Research on depth-based human activity analysis achieved outstanding
performance and demonstrated the effectiveness of 3D representation for action
recognition. The existing depth-based and RGB+D-based action recognition
benchmarks have a number of limitations, including the lack of large-scale
training samples, realistic number of distinct class categories, diversity in
camera views, varied environmental conditions, and variety of human subjects.
In this work, we introduce a large-scale dataset for RGB+D human action
recognition, which is collected from 106 distinct subjects and contains more
than 114 thousand video samples and 8 million frames. This dataset contains 120
different action classes including daily, mutual, and health-related
activities. We evaluate the performance of a series of existing 3D activity
analysis methods on this dataset, and show the advantage of applying deep
learning methods for 3D-based human action recognition. Furthermore, we
investigate a novel one-shot 3D activity recognition problem on our dataset,
and a simple yet effective Action-Part Semantic Relevance-aware (APSR)
framework is proposed for this task, which yields promising results for
recognition of the novel action classes. We believe the introduction of this
large-scale dataset will enable the community to apply, adapt, and develop
various data-hungry learning techniques for depth-based and RGB+D-based human
activity understanding. [The dataset is available at:
http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI
- …