1,929 research outputs found

    Speech Recognition by Composition of Weighted Finite Automata

    Full text link
    We present a general framework based on weighted finite automata and weighted finite-state transducers for describing and implementing speech recognizers. The framework allows us to represent uniformly the information sources and data structures used in recognition, including context-dependent units, pronunciation dictionaries, language models and lattices. Furthermore, general but efficient algorithms can used for combining information sources in actual recognizers and for optimizing their application. In particular, a single composition algorithm is used both to combine in advance information sources such as language models and dictionaries, and to combine acoustic observations and information sources dynamically during recognition.Comment: 24 pages, uses psfig.st

    Language Model Combination and Adaptation Using Weighted Finite State Transducers

    Get PDF
    In speech recognition systems language model (LMs) are often constructed by training and combining multiple n-gram models. They can be either used to represent different genres or tasks found in diverse text sources, or capture stochastic properties of different linguistic symbol sequences, for example, syllables and words. Unsupervised LM adaption may also be used to further improve robustness to varying styles or tasks. When using these techniques, extensive software changes are often required. In this paper an alternative and more general approach based on weighted finite state transducers (WFSTs) is investigated for LM combination and adaptation. As it is entirely based on well-defined WFST operations, minimum change to decoding tools is needed. A wide range of LM combination configurations can be flexibly supported. An efficient on-the-fly WFST decoding algorithm is also proposed. Significant error rate gains of 7.3% relative were obtained on a state-of-the-art broadcast audio recognition task using a history dependently adapted multi-level LM modelling both syllable and word sequence

    Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation

    Get PDF
    We investigate whether infant-directed speech (IDS) could facilitate word form learning when compared to adult-directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: the IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability.Comment: Draf

    ANT COLONY ALGORITHM APPLIED TO AUTOMATIC SPEECH RECOGNITION GRAPH DECODING

    Get PDF
    International audienceIn this article we propose an original approach that allows the decoding of Automatic Speech Recognition Graphs by using a constructive algorithm based on ant colonies. In classical approaches, when a graph is decoded with higher order language models; the algorithm must expand the graph in order to develop each new observed n-gram. This extension process increases the computation time and memory consumption. We propose to use an ant colony algorithm in order to explore ASR graphs with a new language model, without the necessity of expanding it. We first present results based on the TED English corpus where 2-grams graph are decoded with a 4-grams language model. Then, we show that our approach performs better than a conventional Viterbi algorithm when computing time is constrained and allows a highly threaded decoding process with a single graph and a strict control of computation time and memory consumption

    A Generalized Dynamic Composition Algorithm of Weighted Finite State Transducers for Large Vocabulary Speech Recognition

    Get PDF
    We propose a generalized dynamic composition algorithm of weighted finite state transducers (WFST), which avoids the creation of non-coaccessible paths, performs weight look-ahead and does not impose any constraints to the topology of the WFSTs. Experimental results on Wall Street Journal (WSJ1) 20k-word trigram task show that at 17\% WER (moderately-wide beam width), the decoding time of the proposed approach is about 48\% and 65\% of the other two dynamic composition approaches. In comparison with static composition, at the same level of 17\% WER, we observe a reduction of about 60\% in memory requirement, with an increase of about 60\% in decoding time due to extra overheads for dynamic composition

    An algorithm for fast composition of weighted finite-state transducers

    Get PDF
    Abstract In automatic speech recognition based on weighted-finite transducers, a static decoding graph HC • L • G is typically constructed. In this work, we first show how the size of the decoding graph can be reduced and the necessity of determinizing it can be eliminated by removing the ambiguity associated with transitions to the backoff state or states in G. We then show how the static construction can be avoided entirely by performing fast on-the-fly composition of HC and L • G. We demonstrate that speech recognition based on this on-the-fly composition approximately 80% more run-time than recognition based on the statically-expanded network R, which makes it competitive compared with other dynamic expansion algorithms that have appeared in the literature. Moreover, the dynamic algorithm requires a factor of approximately seven less main memory as the recognition based on the static decoding graph

    A library for automatic natural language generation of Spanish texts

    Get PDF
    In this article we present a novel system for natural language generation (nlg) of Spanish sentences from a minimum set of meaningful words (such as nouns, verbs and adjectives) which, unlike other state-of-the-art solutions, performs the nlg task in a fully automatic way, exploiting both knowledge-based and statistical approaches. Relying on its linguistic knowledge of vocabulary and grammar, the system is able to generate complete, coherent and correctly spelled sentences from the main word sets presented by the user. The system, which was designed to be integrable, portable and efficient, can be easily adapted to other languages by design and can feasibly be integrated in a wide range of digital devices. During its development we also created a supplementary lexicon for Spanish, aLexiS, with wide coverage and high precision, as well as syntactic trees from a freely available definite-clause grammar. The resulting nlg library has been evaluated both automatically and manually (annotation). The system can potentially be used in different application domains such as augmentative communication and automatic generation of administrative reports or news.Xunta de Galicia | Ref. ED341D R2016/012Xunta de Galicia | Ref. GRC 2014/046Ministerio de EconomĂ­a, Industria y Competitividad | Ref. TEC2016-76465-C2-2-
    • …
    corecore