9 research outputs found
Characterization of Speakers for Improved Automatic Speech Recognition
Automatic speech recognition technology is becoming increasingly widespread in many
applications. For dictation tasks, where a single talker is to use the system for long
periods of time, the high recognition accuracies obtained are in part due to the user
performing a lengthy enrolment procedure to ātuneā the parameters of the recogniser
to their particular voice characteristics and speaking style. Interactive speech systems,
where the speaker is using the system for only a short period of time (for example to
obtain information) do not have the luxury of long enrolments and have to adapt rapidly
to new speakers and speaking styles.
This thesis discusses the variations between speakers and speaking styles which result
in decreased recognition performance when there is a mismatch between the talker
and the systems models. An unsupervised method to rapidly identify and normalise
differences in vocal tract length is presented and shown to give improvements in recognition
accuracy for little computational overhead.
Two unsupervised methods of identifying speakers with similar speaking styles are
also presented. The first, a data-driven technique, is shown to accurately classify British
and American accented speech, and is also used to improve recognition accuracy by
clustering groups of similar talkers. The second uses the phonotactic information available
within pronunciation dictionaries to model British and American accented speech.
This model is then used to rapidly and accurately classify speakers
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
The use of speaker correlation information for automatic speech recognition
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 171-179).by Timothy J. Hazen.Ph.D