73,629 research outputs found
End-to-End Tracking and Semantic Segmentation Using Recurrent Neural Networks
In this work we present a novel end-to-end framework for tracking and
classifying a robot's surroundings in complex, dynamic and only partially
observable real-world environments. The approach deploys a recurrent neural
network to filter an input stream of raw laser measurements in order to
directly infer object locations, along with their identity in both visible and
occluded areas. To achieve this we first train the network using unsupervised
Deep Tracking, a recently proposed theoretical framework for end-to-end space
occupancy prediction. We show that by learning to track on a large amount of
unsupervised data, the network creates a rich internal representation of its
environment which we in turn exploit through the principle of inductive
transfer of knowledge to perform the task of it's semantic classification. As a
result, we show that only a small amount of labelled data suffices to steer the
network towards mastering this additional task. Furthermore we propose a novel
recurrent neural network architecture specifically tailored to tracking and
semantic classification in real-world robotics applications. We demonstrate the
tracking and classification performance of the method on real-world data
collected at a busy road junction. Our evaluation shows that the proposed
end-to-end framework compares favourably to a state-of-the-art, model-free
tracking solution and that it outperforms a conventional one-shot training
scheme for semantic classification
Towards Language-Universal End-to-End Speech Recognition
Building speech recognizers in multiple languages typically involves
replicating a monolingual training recipe for each language, or utilizing a
multi-task learning approach where models for different languages have separate
output labels but share some internal parameters. In this work, we exploit
recent progress in end-to-end speech recognition to create a single
multilingual speech recognition system capable of recognizing any of the
languages seen in training. To do so, we propose the use of a universal
character set that is shared among all languages. We also create a
language-specific gating mechanism within the network that can modulate the
network's internal representations in a language-specific way. We evaluate our
proposed approach on the Microsoft Cortana task across three languages and show
that our system outperforms both the individual monolingual systems and systems
built with a multi-task learning approach. We also show that this model can be
used to initialize a monolingual speech recognizer, and can be used to create a
bilingual model for use in code-switching scenarios.Comment: submitted to ICASSP 201
- …