In this paper, three different techniques for building semicontinuousHMMbased
speech recognisers are compared:
the classical one, using Euclidean generated codebooks and independently trained acoustic models; jointly reestimating
the codebooks and models obtained with the classical method; and jointly creating codebooks and models growing their size from one centroid to the desired number
of them. The way this growth may be done is carefully addressed, focusing on the selection of the splitting direction and the way splitting is implemented. Results in a large vocabulary task show the ef ciency of the approach, with noticeable improvements both in accuracy and CPU consumption. Moreover, this scheme enables the use of the concatenation of features, avoiding the independence assumption usually needed in semi-continuous HMM modelling, and leading to further improvements in accuracy and CPU.Peer ReviewedPostprint (published version