21 research outputs found

    Implicitly Supervised Language Model Adaptation for Meeting Transcription

    Get PDF
    We describe the use of meeting metadata, acquired using a computerized meeting organization and note-taking system, to improve automatic transcription of meetings. By applying a two-step language model adaptation process based on notes and agenda items, we were able to reduce perplexity by 9 % and word error rate by 4 % relative on a set of ten meetings recorded in-house. This approach can be used to leverage other types of metadata.

    Audio-Motor Integration for Robot Audition

    Get PDF
    International audienceIn the context of robotics, audio signal processing in the wild amounts to dealing with sounds recorded by a system that moves and whose actuators produce noise. This creates additional challenges in sound source localization, signal enhancement and recognition. But the speci-ficity of such platforms also brings interesting opportunities: can information about the robot actuators' states be meaningfully integrated in the audio processing pipeline to improve performance and efficiency? While robot audition grew to become an established field, methods that explicitly use motor-state information as a complementary modality to audio are scarcer. This chapter proposes a unified view of this endeavour, referred to as audio-motor integration. A literature review and two learning-based methods for audio-motor integration in robot audition are presented, with application to single-microphone sound source localization and ego-noise reduction on real data

    Interactive ASR error correction for touchscreen devices

    No full text
    We will demonstrate a novel graphical interface for correcting search errors in the output of a speech recognizer. This interface allows the user to visualize the word lattice by “pulling apart ” regions of the hypothesis to reveal a cloud of words simlar to the “tag clouds ” popular in many Web applications. This interface is potentially useful for dictation on portable touchscreen devices such as the Nokia N800 and other mobile Internet devices.

    Interactive ASR Error Correction for Touchscreen Devices

    No full text
    We will demonstrate a novel graphical interface for correcting search errors in the output of a speech recognizer. This interface allows the user to visualize the word lattice by “pulling apart” regions of the hypothesis to reveal a cloud of words simlar to the “tag clouds” popular in many Web applications. This interface is potentially useful for dictation on portable touchscreen devices such as the Nokia N800 and other mobile Internet devices. </p

    Mixture Pruning and Roughening for Scalable Acoustic Models

    No full text
    In an automatic speech recognition system using a tied-mixture acoustic model, the main cost in CPU time and memory lies not in the evaluation and storage of Gaussians themselves but rather in evaluating the mixture likelihoods for each state output distribution. Using a simple entropy-based technique for pruning the mixture weight distributions, we can achieve a signiïŹcant speedup in recognition for a 5000-word vocabulary with a negligible increase in word error rate. This allows us to achieve real-time connected-word dictation on an ARM-based mobile device. </p

    A Constrained Baum-Welch Algorithm for Improved Phoneme Segmentation and EfïŹcient Training

    No full text
    We describe an extension to the Baum-Welch algorithm for training Hidden Markov Models that uses explicit phoneme segmentation to constrain the forward and backward lattice. The HMMs trained with this algorithm can be shown to improve the accuracy of automatic phoneme segmentation. In addition, this algorithm is signiïŹcantly more computationally efïŹcient than the full BaumWelch algorithm, while producing models that achieve equivalent accuracy on a standard phoneme recognition task.</p
    corecore