Search CORE

1 research outputs found

Optimization of discriminative models for speech and handwriting recognition

Author: Wiesler Simon
Publication venue
Publication date: 01/01/2016
Field of study

Conventional speech recognition systems are based on Gaussian hidden Markov models. These systems are typically rst trained generatively, i.e. a model of the acoustic signalis learned. In a subsequent discriminative training step, the models are ne-tuned to directlyoptimize the classier. More recently, it has been found that neural network-based speech recognition systems outperform Gaussian mixture systems. Neural networks asconsidered in this work are discriminative models, i.e. they do not require a generativetraining step. Learning their parameters from data is a high-dimensional optimizationproblem. This optimization problem is the central topic of this thesis. Further contributionscover dierent aspects of modeling and training, such as generalization ability,model structure, and training criteria. The generality of our methods is conrmed bytransferring them from speech to handwriting recognition.In the rst part of this thesis, we study a sub-class of neural networks, known as loglinearmodels. Because of their shallow structure, their training is a convex optimizationproblem. Our experiments show that this conceptually simple approach already reachesperformance comparable to that of a discriminatively trained Gaussian mixture system. Furthermore, a theoretical convergence analysis of log-linear training is presented. The second part of the thesis deals with deep neural networks. First, the feasibilityof a recently proposed second-order batch optimization algorithm for large-scale tasksis investigated. Motivated by these results, a novel stochastic second-order optimizationalgorithm for neural network training is developed. This algorithm is capable ofoptimizing bottleneck networks from scratch. This allows for reducing the size of themodels considerably, thereby accelerating both the training and evaluation of the networks.Furthermore, the bottleneck structure acts as a regularization method, thus theaccuracy of the models is improved. Another contribution of this thesis is an investigationof sequence-discriminative training of neural networks, which in particular conrms the benet of the bottleneck structure in combination with this method. Finally, we describethe neural network training tool, which has been implemented within the scopeof this work as part of the the publicly available RWTH Aachen speech recognitiontoolkit

Publikationsserver der RWTH Aachen University