497 research outputs found
Acoustic-Phonetic Approaches for Improving Segment-Based Speech Recognition for Large Vocabulary Continuous Speech
Segment-based speech recognition has shown to be a competitive alternative to the state-of-the-art HMM-based techniques. Its accuracies rely heavily on the quality of the segment graph from which the recognizer searches for the most likely recognition hypotheses. In order to increase the inclusion rate of actual segments in the graph, it is important to recover possible missing segments generated by segment-based segmentation algorithm. An aspect of this research focuses on determining the missing segments due to missed detection of segment boundaries. The acoustic discontinuities, together with manner-distinctive features are utilized to recover the missing segments. Another aspect of improvement to our segment-based framework tackles the restriction of having limited amount of training speech data which prevents the usage of more complex covariance matrices for the acoustic models. Feature dimensional reduction in the form of the Principal Component Analysis (PCA) is applied to enable the training of full covariance matrices and it results in improved segment-based phoneme recognition. Furthermore, to benefit from the fact that segment-based approach allows the integration of phonetic knowledge, we incorporate the probability of each segment being one type of sound unit of a certain specific common manner of articulation into the scoring of the segment graphs. Our experiment shows that, with the proposed improvements, our segment-based framework approximately increases the phoneme recognition accuracy by approximately 25% of the one obtained from the baseline segment-based speech recognition
Contributions of cochlea-scaled entropy and consonant-vowel boundaries to prediction of speech intelligibility in noise
published_or_final_versio
Perception of allophonic cues to English word boundaries by Polish learners: Approximant devoicing in English
The study investigates the perception of devoicing of English /w, r, j, l/ after /p, t, k/ as a
word-boundary cue by Polish listeners. Polish does not devoice sonorants following
voiceless stops in word-initial positions. As a result, Polish learners are not made sensitive
to sonorant devoicing as a segmentation cue. Higher-proficiency and lower-proficiency
Polish learners of English participated in the task in which they recognised phrases such as
buy train vs. bite rain or pie plot vs. pipe lot. The analysis of accuracy scores revealed that
successful segmentation was only above chance level, indicating that sonorant
voicing/devoicing cue was largely unattended to in identifying the boundary location.
Moreover, higher proficiency did not lead to more successful segmentation. The analysis
of reaction times showed an unclear pattern in which higher-proficiency listeners
segmented the test phrases faster but not more accurately than lower-proficiency listeners.
Finally, #CS sequences were recognised more accurately than C#S sequences, which was
taken to suggest that the listeners may have had some limited knowledge that devoiced
sonorants appear only in word-initial positions, but they treated voiced sonorants as equal
candidates for word-final and word-initial position
An Improved GA Based Modified Dynamic Neural Network for Cantonese-Digit Speech Recognition
Author name used in this publication: F. H. F. Leung2007-2008 > Academic research: refereed > Chapter in an edited book (author)published_fina
Automatic prosodic analysis for computer aided pronunciation teaching
Correct pronunciation of spoken language requires the appropriate modulation of acoustic characteristics of speech to convey linguistic information at a suprasegmental level. Such prosodic modulation is a key aspect of spoken language and is an important component of foreign language learning, for purposes of both comprehension and intelligibility. Computer aided pronunciation teaching involves automatic analysis of the speech of a non-native talker in order to provide a diagnosis of the learner's performance in comparison with the speech of a native talker. This thesis describes research undertaken to automatically analyse the prosodic aspects of speech for computer aided pronunciation teaching. It is necessary to describe the suprasegmental composition of a learner's speech in order to characterise significant deviations from a native-like prosody, and to offer some kind of corrective diagnosis. Phonological theories of prosody aim to describe the suprasegmental composition of speech..
Acoustic-phonetic constraints in continuous speech recognition: a case study using the digit vocabulary.
Thesis (Ph.D.)—Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1985.Includes bibliographical references (leaves 155-159).This electronic version was scanned from a copy of the thesis on file at the Speech Communication Group. The certified thesis is available in the Institute Archives and Special Collections.Vinton-Hayes Fellowship.
DARPA, monitored through the Office of Naval Research.
System Development Foundation.Ph.D
Acoustic-Phonetic Features for the Automatic Classification of Stop Consonants
In this paper, the acoustic–phonetic characteristics of American English stop consonants are investigated. Features studied in the literature are evaluated for their information content and new features are proposed. A statistically guided, knowledge-based, acoustic–phonetic system for the automatic classification of stops, in speaker independent continuous speech, is proposed. The system uses a new auditory-based front-end processing and incorporates new algorithms for the extraction and manipulation of the acoustic–phonetic features that proved to be rich in their information content. Recognition experiments are performed using hard decision algorithms on stops extracted from the TIMIT database continuous speech of 60 speakers (not used in the design process) from seven different dialects of American English. An accuracy of 96% is obtained for voicing detection, 90% for place articulation detection and 86% for the overall classification of stops
Investigating potential acoustic correlates of sonority: Intensity vs. periodic energy
This empirical study examines possible acoustic correlates of sonority. The results indicate that periodic energy (in particular its sum) is a more reliable cue to sonority than intensity
- …