19,072 research outputs found
Latent Variable Algorithms for Multimodal Learning and Sensor Fusion
Multimodal learning has been lacking principled ways of combining information
from different modalities and learning a low-dimensional manifold of meaningful
representations. We study multimodal learning and sensor fusion from a latent
variable perspective. We first present a regularized recurrent attention filter
for sensor fusion. This algorithm can dynamically combine information from
different types of sensors in a sequential decision making task. Each sensor is
bonded with a modular neural network to maximize utility of its own
information. A gating modular neural network dynamically generates a set of
mixing weights for outputs from sensor networks by balancing utility of all
sensors' information. We design a co-learning mechanism to encourage
co-adaption and independent learning of each sensor at the same time, and
propose a regularization based co-learning method. In the second part, we focus
on recovering the manifold of latent representation. We propose a co-learning
approach using probabilistic graphical model which imposes a structural prior
on the generative model: multimodal variational RNN (MVRNN) model, and derive a
variational lower bound for its objective functions. In the third part, we
extend the siamese structure to sensor fusion for robust acoustic event
detection. We perform experiments to investigate the latent representations
that are extracted; works will be done in the following months. Our experiments
show that the recurrent attention filter can dynamically combine different
sensor inputs according to the information carried in the inputs. We consider
MVRNN can identify latent representations that are useful for many downstream
tasks such as speech synthesis, activity recognition, and control and planning.
Both algorithms are general frameworks which can be applied to other tasks
where different types of sensors are jointly used for decision making
Machine Learning Methods for Data Association in Multi-Object Tracking
Data association is a key step within the multi-object tracking pipeline that
is notoriously challenging due to its combinatorial nature. A popular and
general way to formulate data association is as the NP-hard multidimensional
assignment problem (MDAP). Over the last few years, data-driven approaches to
assignment have become increasingly prevalent as these techniques have started
to mature. We focus this survey solely on learning algorithms for the
assignment step of multi-object tracking, and we attempt to unify various
methods by highlighting their connections to linear assignment as well as to
the MDAP. First, we review probabilistic and end-to-end optimization approaches
to data association, followed by methods that learn association affinities from
data. We then compare the performance of the methods presented in this survey,
and conclude by discussing future research directions.Comment: Accepted for publication in ACM Computing Survey
Choosing Smartly: Adaptive Multimodal Fusion for Object Detection in Changing Environments
Object detection is an essential task for autonomous robots operating in
dynamic and changing environments. A robot should be able to detect objects in
the presence of sensor noise that can be induced by changing lighting
conditions for cameras and false depth readings for range sensors, especially
RGB-D cameras. To tackle these challenges, we propose a novel adaptive fusion
approach for object detection that learns weighting the predictions of
different sensor modalities in an online manner. Our approach is based on a
mixture of convolutional neural network (CNN) experts and incorporates multiple
modalities including appearance, depth and motion. We test our method in
extensive robot experiments, in which we detect people in a combined indoor and
outdoor scenario from RGB-D data, and we demonstrate that our method can adapt
to harsh lighting changes and severe camera motion blur. Furthermore, we
present a new RGB-D dataset for people detection in mixed in- and outdoor
environments, recorded with a mobile robot. Code, pretrained models and dataset
are available at http://adaptivefusion.cs.uni-freiburg.deComment: Published at the 2016 IEEE/RSJ International Conference on
Intelligent Robots and Systems. Added a new baseline with respect to the IROS
version. Project page with code, pretrained models and our InOutDoorPeople
RGB-D dataset at http://adaptivefusion.cs.uni-freiburg.de
Endo-VMFuseNet: Deep Visual-Magnetic Sensor Fusion Approach for Uncalibrated, Unsynchronized and Asymmetric Endoscopic Capsule Robot Localization Data
In the last decade, researchers and medical device companies have made major
advances towards transforming passive capsule endoscopes into active medical
robots. One of the major challenges is to endow capsule robots with accurate
perception of the environment inside the human body, which will provide
necessary information and enable improved medical procedures. We extend the
success of deep learning approaches from various research fields to the problem
of uncalibrated, asynchronous, and asymmetric sensor fusion for endoscopic
capsule robots. The results performed on real pig stomach datasets show that
our method achieves sub-millimeter precision for both translational and
rotational movements and contains various advantages over traditional sensor
fusion techniques.Comment: Submitted to ICRA 201
Deep Multimodal Representation Learning from Temporal Data
In recent years, Deep Learning has been successfully applied to multimodal
learning problems, with the aim of learning useful joint representations in
data fusion applications. When the available modalities consist of time series
data such as video, audio and sensor signals, it becomes imperative to consider
their temporal structure during the fusion process. In this paper, we propose
the Correlational Recurrent Neural Network (CorrRNN), a novel temporal fusion
model for fusing multiple input modalities that are inherently temporal in
nature. Key features of our proposed model include: (i) simultaneous learning
of the joint representation and temporal dependencies between modalities, (ii)
use of multiple loss terms in the objective function, including a maximum
correlation loss term to enhance learning of cross-modal information, and (iii)
the use of an attention model to dynamically adjust the contribution of
different input modalities to the joint representation. We validate our model
via experimentation on two different tasks: video- and sensor-based activity
classification, and audio-visual speech recognition. We empirically analyze the
contributions of different components of the proposed CorrRNN model, and
demonstrate its robustness, effectiveness and state-of-the-art performance on
multiple datasets.Comment: To appear in CVPR 201
EndoSensorFusion: Particle Filtering-Based Multi-sensory Data Fusion with Switching State-Space Model for Endoscopic Capsule Robots
A reliable, real time multi-sensor fusion functionality is crucial for
localization of actively controlled capsule endoscopy robots, which are an
emerging, minimally invasive diagnostic and therapeutic technology for the
gastrointestinal (GI) tract. In this study, we propose a novel multi-sensor
fusion approach based on a particle filter that incorporates an online
estimation of sensor reliability and a non-linear kinematic model learned by a
recurrent neural network. Our method sequentially estimates the true robot pose
from noisy pose observations delivered by multiple sensors. We experimentally
test the method using 5 degree-of-freedom (5-DoF) absolute pose measurement by
a magnetic localization system and a 6-DoF relative pose measurement by visual
odometry. In addition, the proposed method is capable of detecting and handling
sensor failures by ignoring corrupted data, providing the robustness expected
of a medical device. Detailed analyses and evaluations are presented using
ex-vivo experiments on a porcine stomach model prove that our system achieves
high translational and rotational accuracies for different types of endoscopic
capsule robot trajectories.Comment: submitted to ICRA 2018. arXiv admin note: text overlap with
arXiv:1705.0619
A Survey of Deep Learning Techniques for Mobile Robot Applications
Advancements in deep learning over the years have attracted research into how
deep artificial neural networks can be used in robotic systems. This research
survey will present a summarization of the current research with a specific
focus on the gains and obstacles for deep learning to be applied to mobile
robotics
Fusion of Deep Neural Networks for Activity Recognition: A Regular Vine Copula Based Approach
In this paper, we propose regular vine copula based fusion of multiple deep
neural network classifiers for the problem of multi-sensor based human activity
recognition. We take the cross-modal dependence into account by employing
regular vine copulas that are extremely flexible and powerful graphical models
to characterize complex dependence among multiple modalities. Multiple deep
neural networks are used to extract high-level features from multi-sensing
modalities, with each deep neural network processing the data collected from a
single sensor. The extracted high-level features are then combined using a
regular vine copula model. Numerical experiments are conducted to demonstrate
the effectiveness of our approach
Life detection strategy based on infrared vision and ultra-wideband radar data fusion
The life detection method based on a single type of information source cannot
meet the requirement of post-earthquake rescue due to its limitations in
different scenes and bad robustness in life detection. This paper proposes a
method based on deep neural network for multi-sensor decision-level fusion
which concludes Convolutional Neural Network and Long Short Term Memory neural
network (CNN+LSTM). Firstly, we calculate the value of the life detection
probability of each sensor with various methods in the same scene
simultaneously, which will be gathered to make samples for inputs of the deep
neural network. Then we use Convolutional Neural Network (CNN) to extract the
distribution characteristics of the spatial domain from inputs which is the
two-channel combination of the probability values and the smoothing probability
values of each life detection sensor respectively. Furthermore, the sequence
time relationship of the outputs from the last layers will be analyzed with
Long Short Term Memory (LSTM) layers, then we concatenate the results from
three branches of LSTM layers. Finally, two sets of LSTM neural networks that
is different from the previous layers are used to integrate the three branches
of the features, and the results of the two classifications are output using
the fully connected network with Binary Cross Entropy (BEC) loss function.
Therefore, the classification results of the life detection can be concluded
accurately with the proposed algorithm.Comment: 6 pages, 7 figures, conferenc
PerceptionNet: A Deep Convolutional Neural Network for Late Sensor Fusion
Human Activity Recognition (HAR) based on motion sensors has drawn a lot of
attention over the last few years, since perceiving the human status enables
context-aware applications to adapt their services on users' needs. However,
motion sensor fusion and feature extraction have not reached their full
potentials, remaining still an open issue. In this paper, we introduce
PerceptionNet, a deep Convolutional Neural Network (CNN) that applies a late 2D
convolution to multimodal time-series sensor data, in order to extract
automatically efficient features for HAR. We evaluate our approach on two
public available HAR datasets to demonstrate that the proposed model fuses
effectively multimodal sensors and improves the performance of HAR. In
particular, PerceptionNet surpasses the performance of state-of-the-art HAR
methods based on: (i) features extracted from humans, (ii) deep CNNs exploiting
early fusion approaches, and (iii) Long Short-Term Memory (LSTM), by an average
accuracy of more than 3%.Comment: This article has been accepted for publication in the proceedings of
Intelligent Systems Conference (IntelliSys) 201
- …