4,819 research outputs found
Geometric Cross-Modal Comparison of Heterogeneous Sensor Data
In this work, we address the problem of cross-modal comparison of aerial data
streams. A variety of simulated automobile trajectories are sensed using two
different modalities: full-motion video, and radio-frequency (RF) signals
received by detectors at various locations. The information represented by the
two modalities is compared using self-similarity matrices (SSMs) corresponding
to time-ordered point clouds in feature spaces of each of these data sources;
we note that these feature spaces can be of entirely different scale and
dimensionality. Several metrics for comparing SSMs are explored, including a
cutting-edge time-warping technique that can simultaneously handle local time
warping and partial matches, while also controlling for the change in geometry
between feature spaces of the two modalities. We note that this technique is
quite general, and does not depend on the choice of modalities. In this
particular setting, we demonstrate that the cross-modal distance between SSMs
corresponding to the same trajectory type is smaller than the cross-modal
distance between SSMs corresponding to distinct trajectory types, and we
formalize this observation via precision-recall metrics in experiments.
Finally, we comment on promising implications of these ideas for future
integration into multiple-hypothesis tracking systems.Comment: 10 pages, 13 figures, Proceedings of IEEE Aeroconf 201
Multimodal Signal Processing and Learning Aspects of Human-Robot Interaction for an Assistive Bathing Robot
We explore new aspects of assistive living on smart human-robot interaction
(HRI) that involve automatic recognition and online validation of speech and
gestures in a natural interface, providing social features for HRI. We
introduce a whole framework and resources of a real-life scenario for elderly
subjects supported by an assistive bathing robot, addressing health and hygiene
care issues. We contribute a new dataset and a suite of tools used for data
acquisition and a state-of-the-art pipeline for multimodal learning within the
framework of the I-Support bathing robot, with emphasis on audio and RGB-D
visual streams. We consider privacy issues by evaluating the depth visual
stream along with the RGB, using Kinect sensors. The audio-gestural recognition
task on this new dataset yields up to 84.5%, while the online validation of the
I-Support system on elderly users accomplishes up to 84% when the two
modalities are fused together. The results are promising enough to support
further research in the area of multimodal recognition for assistive social
HRI, considering the difficulties of the specific task. Upon acceptance of the
paper part of the data will be publicly available
Optimized Gated Deep Learning Architectures for Sensor Fusion
Sensor fusion is a key technology that integrates various sensory inputs to
allow for robust decision making in many applications such as autonomous
driving and robot control. Deep neural networks have been adopted for sensor
fusion in a body of recent studies. Among these, the so-called netgated
architecture was proposed, which has demonstrated improved performances over
the conventional convolutional neural networks (CNN). In this paper, we address
several limitations of the baseline negated architecture by proposing two
further optimized architectures: a coarser-grained gated architecture employing
(feature) group-level fusion weights and a two-stage gated architectures
leveraging both the group-level and feature level fusion weights. Using driving
mode prediction and human activity recognition datasets, we demonstrate the
significant performance improvements brought by the proposed gated
architectures and also their robustness in the presence of sensor noise and
failures.Comment: 10 pages, 5 figures. Submitted to ICLR 201
Quality of Information in Mobile Crowdsensing: Survey and Research Challenges
Smartphones have become the most pervasive devices in people's lives, and are
clearly transforming the way we live and perceive technology. Today's
smartphones benefit from almost ubiquitous Internet connectivity and come
equipped with a plethora of inexpensive yet powerful embedded sensors, such as
accelerometer, gyroscope, microphone, and camera. This unique combination has
enabled revolutionary applications based on the mobile crowdsensing paradigm,
such as real-time road traffic monitoring, air and noise pollution, crime
control, and wildlife monitoring, just to name a few. Differently from prior
sensing paradigms, humans are now the primary actors of the sensing process,
since they become fundamental in retrieving reliable and up-to-date information
about the event being monitored. As humans may behave unreliably or
maliciously, assessing and guaranteeing Quality of Information (QoI) becomes
more important than ever. In this paper, we provide a new framework for
defining and enforcing the QoI in mobile crowdsensing, and analyze in depth the
current state-of-the-art on the topic. We also outline novel research
challenges, along with possible directions of future work.Comment: To appear in ACM Transactions on Sensor Networks (TOSN
Collaborative Perception in Autonomous Driving: Methods, Datasets and Challenges
Collaborative perception is essential to address occlusion and sensor failure
issues in autonomous driving. In recent years, theoretical and experimental
investigations of novel works for collaborative perception have increased
tremendously. So far, however, few reviews have focused on systematical
collaboration modules and large-scale collaborative perception datasets. This
work reviews recent achievements in this field to bridge this gap and motivate
future research. We start with a brief overview of collaboration schemes. After
that, we systematically summarize the collaborative perception methods for
ideal scenarios and real-world issues. The former focuses on collaboration
modules and efficiency, and the latter is devoted to addressing the problems in
actual application. Furthermore, we present large-scale public datasets and
summarize quantitative results on these benchmarks. Finally, we highlight gaps
and overlook challenges between current academic research and real-world
applications. The project page is
https://github.com/CatOneTwo/Collaborative-Perception-in-Autonomous-DrivingComment: 18 pages, 6 figures. Accepted by IEEE Intelligent Transportation
Systems Magazine. URL:
https://github.com/CatOneTwo/Collaborative-Perception-in-Autonomous-Drivin
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
- …