2 research outputs found
An analysis-by-synthesis approach to vocal tract modeling for robust speech recognition
In this thesis we present a novel approach to speech recognition that incorporates knowledge of the speech production process. The major contribution is the development of a speech recognition system that is motivated by the physical generative process of speech, rather than the purely statistical approach that has been the basis for virtually all current recognizers. We follow an analysis-by-synthesis approach. We begin by attributing a physical meaning to the inner states of the recognition system pertaining to the configurations the human vocal tract takes over time. We utilize a geometric model of the vocal tract, adapt it to our speakers, and derive realistic vocal tract shapes from electromagnetic articulograph (EMA) measurements in the MOCHA database. We then synthesize speech from the vocal tract configurations using a physiologically-motivated articulatory synthesis model of speech generation. Finally, the observation probability of the Hidden Markov Model (HMM) used for phone classification is a function of the distortion between the speech synthesized from the vocal tract configurations and the real speech. The output of each state in the HMM is based on a mixture of density functions