We recently proposed a new algorithm to perform acoustic model adaptation to noisy environments called Linear Spline Interpolation (LSI). In this method, the nonlinear relationship between clean and noisy speech features is modeled using linear spline regression. Linear spline parameters that minimize the error the between the predicted noisy features and the actual noisy features are learned from training data. A variance associated with each spline segment captures the uncertainty in the assumed model. In this work, we extend the LSI algorithm in two ways. First, the adaptation scheme is extended to compensate for the presence of linear channel distortion. Second, we show how the noise and channel parameters can be updated during decoding in an unsupervised manner within the LSI framework. Using LSI, we obtain an average relative improvement in word error rate of 10.8 % over VTS adaptation on the Aurora 2 task with improvements of 15-18 % at SNRs between 10 and 15 dB. Index Terms — robust speech recognition, model adaptation 1
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.