This paper presents a context-dependent hybrid connectionist speech recognition system that uses a set of generalized hierarchical mixtures of experts (HME) to estimate context-dependent posterior acoustic class probabilities. The connectionist part of the system is organized in a modular fashion, allowing the distributed training of such a system on regular workstations. Context classes are based on polyphonic contexts, clustered using decision trees which we adopt from our continuous density HMM recognizer JANUS . The system is evaluated on ESST, an english speaker-independent spontaneous speech database. Context dependent modeling is shown to yield significant improvements over simple context-independent modeling, requiring only small additional overhead in terms of training and decoding time. 1. INTRODUCTION It was recently shown by a variety of researchers (eg. [1, 2, 4]) that hybrid HMM systems which rely on connectionist discriminative acoustic modeling can be competitive wi..
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.