2 research outputs found

    Nozomi - A Fast, Memory-Efficient Stack Decoder For Lvcsr

    No full text
    This paper describes some of the implementation details of the "Nozomi" 1 stack decoder for LVCSR. The decoder was tested on a Japanese Newspaper Dictation Task using a 5000 word vocabulary. Using continuous density acoustic models with 2000 and 3000 states trained on the JNAS/ASJ corpora and a 3-gram LM trained on the RWC text corpus, both models provided by the IPA group [7], it was possible to reach more than 95% word accuracy on the standard test set. With computationally cheap acoustic models we could achieve around 89% accuracy in nearly realtime on a 300 Mhz Pentium II. Using a disk-based LM the memory usage could be optimized to 4 MB in total. 1. INTRODUCTION LVCSR is currently limited to workstations and fast highend laptops with a lot of memory. To make LVCSR work on PDAs, cellular phones, user-interfaces, wrist watches etc., it is necessary find time- and memory-efficient algorithms. The goal for implementation of any search engine must be to minimize time and memory requ..

    Evaluation of a stack decoder on a Japanese Newspaper Dictation Task

    No full text
    This paper describes some of the implementation details of the "Nozomi" stack decoder for LVCSR. The decoder was tested on a Japanese Newspaper Dictation Task using a 5000 word vocabulary. Using continuous density acoustic models with 2000 and 3000 states trained on the JNAS/ASJ corpora and a 3-gram LM trained on the RWC text corpus, both models provided by the IPA group [9], it was possible to reach more than 95% word accuracy on the standard test set. With computationally cheap acoustic models we could achieve around 89% accuracy in nearly realtime on a 300 Mhz Pentium II. Using a disk-based LM the memory usage could be optimized to 4 MB in total. key words ffl speech recognition ffl Japanese newspaper dictation ffl one-pass stack decoder 1 INTRODUCTION LVCSR is currently limited to workstations and fast high-end laptops with a lot of memory. To make LVCSR work on PDAs, cellular phones, userinterfaces, wrist watches etc., it is necessary find time- and memory-efficient algorithms..
    corecore