477 research outputs found
Modified SPLICE and its Extension to Non-Stereo Data for Noise Robust Speech Recognition
In this paper, a modification to the training process of the popular SPLICE
algorithm has been proposed for noise robust speech recognition. The
modification is based on feature correlations, and enables this stereo-based
algorithm to improve the performance in all noise conditions, especially in
unseen cases. Further, the modified framework is extended to work for
non-stereo datasets where clean and noisy training utterances, but not stereo
counterparts, are required. Finally, an MLLR-based computationally efficient
run-time noise adaptation method in SPLICE framework has been proposed. The
modified SPLICE shows 8.6% absolute improvement over SPLICE in Test C of
Aurora-2 database, and 2.93% overall. Non-stereo method shows 10.37% and 6.93%
absolute improvements over Aurora-2 and Aurora-4 baseline models respectively.
Run-time adaptation shows 9.89% absolute improvement in modified framework as
compared to SPLICE for Test C, and 4.96% overall w.r.t. standard MLLR
adaptation on HMMs.Comment: Submitted to Automatic Speech Recognition and Understanding (ASRU)
2013 Worksho
ENHANCEMENTS TO THE MODIFIED COMPOSITE PATTERN METHOD OF STRUCTURED LIGHT 3D CAPTURE
The use of structured light illumination techniques for three-dimensional data acquisition is, in many cases, limited to stationary subjects due to the multiple pattern projections needed for depth analysis. Traditional Composite Pattern (CP) multiplexing utilizes sinusoidal modulation of individual projection patterns to allow numerous patterns to be combined into a single image. However, due to demodulation artifacts, it is often difficult to accurately recover the subject surface contour information. On the other hand, if one were to project an image consisting of many thin, identical stripes onto the surface, one could, by isolating each stripe center, recreate a very accurate representation of surface contour. But in this case, recovery of depth information via triangulation would be quite difficult. The method described herein, Modified Composite Pattern (MCP), is a conjunction of these two concepts. Combining a traditional Composite Pattern multiplexed projection image with a pattern of thin stripes allows for accurate surface representation combined with non-ambiguous identification of projection pattern elements. In this way, it is possible to recover surface depth characteristics using only a single structured light projection.
The technique described utilizes a binary structured light projection sequence (consisting of four unique images) modulated according to Composite Pattern methodology. A stripe pattern overlay is then applied to the pattern. Upon projection and imaging of the subject surface, the stripe pattern is isolated, and the composite pattern information demodulated and recovered, allowing for 3D surface representation.
In this research, the MCP technique is considered specifically in the context of a Hidden Markov Process Model. Updated processing methodologies explained herein make use of the Viterbi algorithm for the purpose of optimal analysis of MCP encoded images. Additionally, we techniques are introduced which, when implemented, allow fully automated processing of the Modified Composite Pattern image
Tracking interacting targets in multi-modal sensors
PhDObject tracking is one of the fundamental tasks in various applications such as surveillance,
sports, video conferencing and activity recognition. Factors such as occlusions,
illumination changes and limited field of observance of the sensor make tracking a challenging
task. To overcome these challenges the focus of this thesis is on using multiple
modalities such as audio and video for multi-target, multi-modal tracking. Particularly,
this thesis presents contributions to four related research topics, namely, pre-processing of
input signals to reduce noise, multi-modal tracking, simultaneous detection and tracking,
and interaction recognition.
To improve the performance of detection algorithms, especially in the presence
of noise, this thesis investigate filtering of the input data through spatio-temporal feature
analysis as well as through frequency band analysis. The pre-processed data from multiple
modalities is then fused within Particle filtering (PF). To further minimise the discrepancy
between the real and the estimated positions, we propose a strategy that associates the
hypotheses and the measurements with a real target, using a Weighted Probabilistic Data
Association (WPDA). Since the filtering involved in the detection process reduces the
available information and is inapplicable on low signal-to-noise ratio data, we investigate
simultaneous detection and tracking approaches and propose a multi-target track-beforedetect
Particle filtering (MT-TBD-PF). The proposed MT-TBD-PF algorithm bypasses
the detection step and performs tracking in the raw signal. Finally, we apply the proposed
multi-modal tracking to recognise interactions between targets in regions within, as well
as outside the cameras’ fields of view.
The efficiency of the proposed approaches are demonstrated on large uni-modal,
multi-modal and multi-sensor scenarios from real world detections, tracking and event
recognition datasets and through participation in evaluation campaigns
- …