477 research outputs found

    Modified SPLICE and its Extension to Non-Stereo Data for Noise Robust Speech Recognition

    Full text link
    In this paper, a modification to the training process of the popular SPLICE algorithm has been proposed for noise robust speech recognition. The modification is based on feature correlations, and enables this stereo-based algorithm to improve the performance in all noise conditions, especially in unseen cases. Further, the modified framework is extended to work for non-stereo datasets where clean and noisy training utterances, but not stereo counterparts, are required. Finally, an MLLR-based computationally efficient run-time noise adaptation method in SPLICE framework has been proposed. The modified SPLICE shows 8.6% absolute improvement over SPLICE in Test C of Aurora-2 database, and 2.93% overall. Non-stereo method shows 10.37% and 6.93% absolute improvements over Aurora-2 and Aurora-4 baseline models respectively. Run-time adaptation shows 9.89% absolute improvement in modified framework as compared to SPLICE for Test C, and 4.96% overall w.r.t. standard MLLR adaptation on HMMs.Comment: Submitted to Automatic Speech Recognition and Understanding (ASRU) 2013 Worksho

    ENHANCEMENTS TO THE MODIFIED COMPOSITE PATTERN METHOD OF STRUCTURED LIGHT 3D CAPTURE

    Get PDF
    The use of structured light illumination techniques for three-dimensional data acquisition is, in many cases, limited to stationary subjects due to the multiple pattern projections needed for depth analysis. Traditional Composite Pattern (CP) multiplexing utilizes sinusoidal modulation of individual projection patterns to allow numerous patterns to be combined into a single image. However, due to demodulation artifacts, it is often difficult to accurately recover the subject surface contour information. On the other hand, if one were to project an image consisting of many thin, identical stripes onto the surface, one could, by isolating each stripe center, recreate a very accurate representation of surface contour. But in this case, recovery of depth information via triangulation would be quite difficult. The method described herein, Modified Composite Pattern (MCP), is a conjunction of these two concepts. Combining a traditional Composite Pattern multiplexed projection image with a pattern of thin stripes allows for accurate surface representation combined with non-ambiguous identification of projection pattern elements. In this way, it is possible to recover surface depth characteristics using only a single structured light projection. The technique described utilizes a binary structured light projection sequence (consisting of four unique images) modulated according to Composite Pattern methodology. A stripe pattern overlay is then applied to the pattern. Upon projection and imaging of the subject surface, the stripe pattern is isolated, and the composite pattern information demodulated and recovered, allowing for 3D surface representation. In this research, the MCP technique is considered specifically in the context of a Hidden Markov Process Model. Updated processing methodologies explained herein make use of the Viterbi algorithm for the purpose of optimal analysis of MCP encoded images. Additionally, we techniques are introduced which, when implemented, allow fully automated processing of the Modified Composite Pattern image

    Tracking interacting targets in multi-modal sensors

    Get PDF
    PhDObject tracking is one of the fundamental tasks in various applications such as surveillance, sports, video conferencing and activity recognition. Factors such as occlusions, illumination changes and limited field of observance of the sensor make tracking a challenging task. To overcome these challenges the focus of this thesis is on using multiple modalities such as audio and video for multi-target, multi-modal tracking. Particularly, this thesis presents contributions to four related research topics, namely, pre-processing of input signals to reduce noise, multi-modal tracking, simultaneous detection and tracking, and interaction recognition. To improve the performance of detection algorithms, especially in the presence of noise, this thesis investigate filtering of the input data through spatio-temporal feature analysis as well as through frequency band analysis. The pre-processed data from multiple modalities is then fused within Particle filtering (PF). To further minimise the discrepancy between the real and the estimated positions, we propose a strategy that associates the hypotheses and the measurements with a real target, using a Weighted Probabilistic Data Association (WPDA). Since the filtering involved in the detection process reduces the available information and is inapplicable on low signal-to-noise ratio data, we investigate simultaneous detection and tracking approaches and propose a multi-target track-beforedetect Particle filtering (MT-TBD-PF). The proposed MT-TBD-PF algorithm bypasses the detection step and performs tracking in the raw signal. Finally, we apply the proposed multi-modal tracking to recognise interactions between targets in regions within, as well as outside the cameras’ fields of view. The efficiency of the proposed approaches are demonstrated on large uni-modal, multi-modal and multi-sensor scenarios from real world detections, tracking and event recognition datasets and through participation in evaluation campaigns
    corecore