79 research outputs found

    Weighted finite-state transducers in speech recognition : a compaction algorithm for non-determinizable transducers

    Full text link
    Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal

    Speech Recognition by Composition of Weighted Finite Automata

    Full text link
    We present a general framework based on weighted finite automata and weighted finite-state transducers for describing and implementing speech recognizers. The framework allows us to represent uniformly the information sources and data structures used in recognition, including context-dependent units, pronunciation dictionaries, language models and lattices. Furthermore, general but efficient algorithms can used for combining information sources in actual recognizers and for optimizing their application. In particular, a single composition algorithm is used both to combine in advance information sources such as language models and dictionaries, and to combine acoustic observations and information sources dynamically during recognition.Comment: 24 pages, uses psfig.st

    Visual units and confusion modelling for automatic lip-reading

    Get PDF
    Automatic lip-reading (ALR) is a challenging task because the visual speech signal is known to be missing some important information, such as voicing. We propose an approach to ALR that acknowledges that this information is missing but assumes that it is substituted or deleted in a systematic way that can be modelled. We describe a system that learns such a model and then incorporates it into decoding, which is realised as a cascade of weighted finite-state transducers. Our results show a small but statistically significant improvement in recognition accuracy. We also investigate the issue of suitable visual units for ALR, and show that visemes are sub-optimal, not but because they introduce lexical ambiguity, but because the reduction in modelling units entailed by their use reduces accuracy

    Confusion modelling for lip-reading

    Get PDF
    Lip-reading is mostly used as a means of communication by people with hearing di�fficulties. Recent work has explored the automation of this process, with the aim of building a speech recognition system entirely driven by lip movements. However, this work has so far produced poor results because of factors such as high variability of speaker features, diffi�culties in mapping from visual features to speech sounds, and high co-articulation of visual features. The motivation for the work in this thesis is inspired by previous work in dysarthric speech recognition [Morales, 2009]. Dysathric speakers have poor control over their articulators, often leading to a reduced phonemic repertoire. The premise of this thesis is that recognition of the visual speech signal is a similar problem to recog- nition of dysarthric speech, in that some information about the speech signal has been lost in both cases, and this brings about a systematic pattern of errors in the decoded output. This work attempts to exploit the systematic nature of these errors by modelling them in the framework of a weighted finite-state transducer cascade. Results indicate that the technique can achieve slightly lower error rates than the conventional approach. In addition, it explores some interesting more general questions for automated lip-reading
    corecore