23,197 research outputs found
Embodied Artificial Intelligence through Distributed Adaptive Control: An Integrated Framework
In this paper, we argue that the future of Artificial Intelligence research
resides in two keywords: integration and embodiment. We support this claim by
analyzing the recent advances of the field. Regarding integration, we note that
the most impactful recent contributions have been made possible through the
integration of recent Machine Learning methods (based in particular on Deep
Learning and Recurrent Neural Networks) with more traditional ones (e.g.
Monte-Carlo tree search, goal babbling exploration or addressable memory
systems). Regarding embodiment, we note that the traditional benchmark tasks
(e.g. visual classification or board games) are becoming obsolete as
state-of-the-art learning algorithms approach or even surpass human performance
in most of them, having recently encouraged the development of first-person 3D
game platforms embedding realistic physics. Building upon this analysis, we
first propose an embodied cognitive architecture integrating heterogenous
sub-fields of Artificial Intelligence into a unified framework. We demonstrate
the utility of our approach by showing how major contributions of the field can
be expressed within the proposed framework. We then claim that benchmarking
environments need to reproduce ecologically-valid conditions for bootstrapping
the acquisition of increasingly complex cognitive skills through the concept of
a cognitive arms race between embodied agents.Comment: Updated version of the paper accepted to the ICDL-Epirob 2017
conference (Lisbon, Portugal
LPN: Language-guided Prototypical Network for few-shot classification
Few-shot classification aims to adapt to new tasks with limited labeled
examples. To fully use the accessible data, recent methods explore suitable
measures for the similarity between the query and support images and better
high-dimensional features with meta-training and pre-training strategies.
However, the potential of multi-modality information has barely been explored,
which may bring promising improvement for few-shot classification. In this
paper, we propose a Language-guided Prototypical Network (LPN) for few-shot
classification, which leverages the complementarity of vision and language
modalities via two parallel branches. Concretely, to introduce language
modality with limited samples in the visual task, we leverage a pre-trained
text encoder to extract class-level text features directly from class names
while processing images with a conventional image encoder. Then, a
language-guided decoder is introduced to obtain text features corresponding to
each image by aligning class-level features with visual features. In addition,
to take advantage of class-level features and prototypes, we build a refined
prototypical head that generates robust prototypes in the text branch for
follow-up measurement. Finally, we aggregate the visual and text logits to
calibrate the deviation of a single modality. Extensive experiments demonstrate
the competitiveness of LPN against state-of-the-art methods on benchmark
datasets
- …