4,792 research outputs found
Egocentric Activity Recognition with Multimodal Fisher Vector
With the increasing availability of wearable devices, research on egocentric
activity recognition has received much attention recently. In this paper, we
build a Multimodal Egocentric Activity dataset which includes egocentric videos
and sensor data of 20 fine-grained and diverse activity categories. We present
a novel strategy to extract temporal trajectory-like features from sensor data.
We propose to apply the Fisher Kernel framework to fuse video and temporal
enhanced sensor features. Experiment results show that with careful design of
feature extraction and fusion algorithm, sensor data can enhance
information-rich video data. We make publicly available the Multimodal
Egocentric Activity dataset to facilitate future research.Comment: 5 pages, 4 figures, ICASSP 2016 accepte
A Novel Two Stream Decision Level Fusion of Vision and Inertial Sensors Data for Automatic Multimodal Human Activity Recognition System
This paper presents a novel multimodal human activity recognition system. It
uses a two-stream decision level fusion of vision and inertial sensors. In the
first stream, raw RGB frames are passed to a part affinity field-based pose
estimation network to detect the keypoints of the user. These keypoints are
then pre-processed and inputted in a sliding window fashion to a specially
designed convolutional neural network for the spatial feature extraction
followed by regularized LSTMs to calculate the temporal features. The outputs
of LSTM networks are then inputted to fully connected layers for
classification. In the second stream, data obtained from inertial sensors are
pre-processed and inputted to regularized LSTMs for the feature extraction
followed by fully connected layers for the classification. At this stage, the
SoftMax scores of two streams are then fused using the decision level fusion
which gives the final prediction. Extensive experiments are conducted to
evaluate the performance. Four multimodal standard benchmark datasets (UP-Fall
detection, UTD-MHAD, Berkeley-MHAD, and C-MHAD) are used for experimentations.
The accuracies obtained by the proposed system are 96.9 %, 97.6 %, 98.7 %, and
95.9 % respectively on the UP-Fall Detection, UTDMHAD, Berkeley-MHAD, and
C-MHAD datasets. These results are far superior than the current
state-of-the-art methods
InMyFace: Inertial and Mechanomyography-Based Sensor Fusion for Wearable Facial Activity Recognition
Recognizing facial activity is a well-understood (but non-trivial) computer
vision problem. However, reliable solutions require a camera with a good view
of the face, which is often unavailable in wearable settings. Furthermore, in
wearable applications, where systems accompany users throughout their daily
activities, a permanently running camera can be problematic for privacy (and
legal) reasons. This work presents an alternative solution based on the fusion
of wearable inertial sensors, planar pressure sensors, and acoustic
mechanomyography (muscle sounds). The sensors were placed unobtrusively in a
sports cap to monitor facial muscle activities related to facial expressions.
We present our integrated wearable sensor system, describe data fusion and
analysis methods, and evaluate the system in an experiment with thirteen
subjects from different cultural backgrounds (eight countries) and both sexes
(six women and seven men). In a one-model-per-user scheme and using a late
fusion approach, the system yielded an average F1 score of 85.00% for the case
where all sensing modalities are combined. With a cross-user validation and a
one-model-for-all-user scheme, an F1 score of 79.00% was obtained for thirteen
participants (six females and seven males). Moreover, in a hybrid fusion
(cross-user) approach and six classes, an average F1 score of 82.00% was
obtained for eight users. The results are competitive with state-of-the-art
non-camera-based solutions for a cross-user study. In addition, our unique set
of participants demonstrates the inclusiveness and generalizability of the
approach.Comment: Submitted to Information Fusion, Elsevie
Robust Deep Multi-Modal Sensor Fusion using Fusion Weight Regularization and Target Learning
Sensor fusion has wide applications in many domains including health care and
autonomous systems. While the advent of deep learning has enabled promising
multi-modal fusion of high-level features and end-to-end sensor fusion
solutions, existing deep learning based sensor fusion techniques including deep
gating architectures are not always resilient, leading to the issue of fusion
weight inconsistency. We propose deep multi-modal sensor fusion architectures
with enhanced robustness particularly under the presence of sensor failures. At
the core of our gating architectures are fusion weight regularization and
fusion target learning operating on auxiliary unimodal sensing networks
appended to the main fusion model. The proposed regularized gating
architectures outperform the existing deep learning architectures with and
without gating under both clean and corrupted sensory inputs resulted from
sensor failures. The demonstrated improvements are particularly pronounced when
one or more multiple sensory modalities are corrupted.Comment: 8 page
- …