9,908 research outputs found
ModDrop: adaptive multi-modal gesture recognition
We present a method for gesture detection and localisation based on
multi-scale and multi-modal deep learning. Each visual modality captures
spatial information at a particular spatial scale (such as motion of the upper
body or a hand), and the whole system operates at three temporal scales. Key to
our technique is a training strategy which exploits: i) careful initialization
of individual modalities; and ii) gradual fusion involving random dropping of
separate channels (dubbed ModDrop) for learning cross-modality correlations
while preserving uniqueness of each modality-specific representation. We
present experiments on the ChaLearn 2014 Looking at People Challenge gesture
recognition track, in which we placed first out of 17 teams. Fusing multiple
modalities at several spatial and temporal scales leads to a significant
increase in recognition rates, allowing the model to compensate for errors of
the individual classifiers as well as noise in the separate channels.
Futhermore, the proposed ModDrop training technique ensures robustness of the
classifier to missing signals in one or several channels to produce meaningful
predictions from any number of available modalities. In addition, we
demonstrate the applicability of the proposed fusion scheme to modalities of
arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure
PVR: Patch-to-Volume Reconstruction for Large Area Motion Correction of Fetal MRI
In this paper we present a novel method for the correction of motion
artifacts that are present in fetal Magnetic Resonance Imaging (MRI) scans of
the whole uterus. Contrary to current slice-to-volume registration (SVR)
methods, requiring an inflexible anatomical enclosure of a single investigated
organ, the proposed patch-to-volume reconstruction (PVR) approach is able to
reconstruct a large field of view of non-rigidly deforming structures. It
relaxes rigid motion assumptions by introducing a specific amount of redundant
information that is exploited with parallelized patch-wise optimization,
super-resolution, and automatic outlier rejection. We further describe and
provide an efficient parallel implementation of PVR allowing its execution
within reasonable time on commercially available graphics processing units
(GPU), enabling its use in the clinical practice. We evaluate PVR's
computational overhead compared to standard methods and observe improved
reconstruction accuracy in presence of affine motion artifacts of approximately
30% compared to conventional SVR in synthetic experiments. Furthermore, we have
evaluated our method qualitatively and quantitatively on real fetal MRI data
subject to maternal breathing and sudden fetal movements. We evaluate
peak-signal-to-noise ratio (PSNR), structural similarity index (SSIM), and
cross correlation (CC) with respect to the originally acquired data and provide
a method for visual inspection of reconstruction uncertainty. With these
experiments we demonstrate successful application of PVR motion compensation to
the whole uterus, the human fetus, and the human placenta.Comment: 10 pages, 13 figures, submitted to IEEE Transactions on Medical
Imaging. v2: wadded funders acknowledgements to preprin
DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation
There is an undeniable communication barrier between deaf people and people
with normal hearing ability. Although innovations in sign language translation
technology aim to tear down this communication barrier, the majority of
existing sign language translation systems are either intrusive or constrained
by resolution or ambient lighting conditions. Moreover, these existing systems
can only perform single-sign ASL translation rather than sentence-level
translation, making them much less useful in daily-life communication
scenarios. In this work, we fill this critical gap by presenting DeepASL, a
transformative deep learning-based sign language translation technology that
enables ubiquitous and non-intrusive American Sign Language (ASL) translation
at both word and sentence levels. DeepASL uses infrared light as its sensing
mechanism to non-intrusively capture the ASL signs. It incorporates a novel
hierarchical bidirectional deep recurrent neural network (HB-RNN) and a
probabilistic framework based on Connectionist Temporal Classification (CTC)
for word-level and sentence-level ASL translation respectively. To evaluate its
performance, we have collected 7,306 samples from 11 participants, covering 56
commonly used ASL words and 100 ASL sentences. DeepASL achieves an average
94.5% word-level translation accuracy and an average 8.2% word error rate on
translating unseen ASL sentences. Given its promising performance, we believe
DeepASL represents a significant step towards breaking the communication
barrier between deaf people and hearing majority, and thus has the significant
potential to fundamentally change deaf people's lives
- …