2 research outputs found
Very Fast Keyword Spotting System with Real Time Factor below 0.01
In the paper we present an architecture of a keyword spotting (KWS) system
that is based on modern neural networks, yields good performance on various
types of speech data and can run very fast. We focus mainly on the last aspect
and propose optimizations for all the steps required in a KWS design: signal
processing and likelihood computation, Viterbi decoding, spot candidate
detection and confidence calculation. We present time and memory efficient
modelling by bidirectional feedforward sequential memory networks (an
alternative to recurrent nets) either by standard triphones or so called
quasi-monophones, and an entirely forward decoding of speech frames (with
minimal need for look back). Several variants of the proposed scheme are
evaluated on 3 large Czech datasets (broadcast, internet and telephone, 17
hours in total) and their performance is compared by Detection Error Tradeoff
(DET) diagrams and real-time (RT) factors. We demonstrate that the complete
system can run in a single pass with a RT factor close to 0.001 if all
optimizations (including a GPU for likelihood computation) are applied.Comment: 11 pages, 3 figure