16,308 research outputs found

    Comprehensibility and Prosody Ratings for Pronunciation Software Development

    Get PDF
    In the context of a project developing software for pronunciation practice and feedback for Mandarin-speaking learners of English, a key issue is how to decide which features of pronunciation to focus on in giving feedback. We used naïve and experienced native speaker ratings of comprehensibility and nativeness to establish the key features affecting comprehensibility of the utterances of a group of Chinese learners of English. Native speaker raters assessed the comprehensibility of recorded utterances, pinpointed areas of difficulty and then rated for nativeness the same utterances, but after segmental information had been filtered out. The results show that prosodic information is important for comprehensibility, and that there are no significant differences between naïve and experienced raters on either comprehensibility or nativeness judgements. This suggests that naïve judgements are a useful and accessible source of data for identifying the parameters to be used in setting up automated feedback

    Improving Statistical Language Model Performance with Automatically Generated Word Hierarchies

    Full text link
    An automatic word classification system has been designed which processes word unigram and bigram frequency statistics extracted from a corpus of natural language utterances. The system implements a binary top-down form of word clustering which employs an average class mutual information metric. Resulting classifications are hierarchical, allowing variable class granularity. Words are represented as structural tags --- unique nn-bit numbers the most significant bit-patterns of which incorporate class information. Access to a structural tag immediately provides access to all classification levels for the corresponding word. The classification system has successfully revealed some of the structure of English, from the phonemic to the semantic level. The system has been compared --- directly and indirectly --- with other recent word classification systems. Class based interpolated language models have been constructed to exploit the extra information supplied by the classifications and some experiments have shown that the new models improve model performance.Comment: 17 Page Paper. Self-extracting PostScript Fil

    Improving The Students' Speaking Accuracy Through “Lse 9.0 Software Version”

    Full text link
    The objective of the research was to find out the improvement students' speaking accuracy through Learn to Speak English 9.0 Software Version at SMK Negeri 1 Pattallassang Gowa. This research used classroom action research that consists two cycles. The research object was the first year electric students' of SMK Negeri 1 Pattallassang Gowa academic year 2011/2012. The object of this research consisted of 32 students. The researcher obtained the data by using the speaking test in the diagnostic test, the cycle I and Cycle II. The results of the student's speaking test in cycle I and cycle II had significantly different scores. There was a better improvement of gains by students at the end of action cycle II. The research findings indicated that use of Learn to Speak English 9.0 Software as teaching media could improve the students' speaking accuracy after evaluation in cycles I and II, the means scores in diagnostic test is 5.21 and then it became 5.95 in the cycle I and 7.10 in the cycle II. It can be stated that the students' speaking accuracy at the first year electric students' of SMK Negeri 1 Pattallassang Gowa was in poor level after the test in the diagnostic test with mean score 5.21 had improved to fairly good level after the test in the cycle II with mean score 7.10

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Implant technology and TFS processing in relation to speech discrimination and music perception and appreciation

    Get PDF
    Direct stimulation of the auditory nerve via a Cochlear Implant (CI) enables profoundly deaf subjects to perceive sounds. Many CI users find language comprehension satisfactory in quiet and accessible in the presence of noise. However, music contains different dimensions which need to be approached in different ways. Whilst both language and music take advantage of the modulation of acoustic parameters to convey information, music is an acoustically more complex stimulus than language, demanding more complex resolution mechanisms. One of the most important aspects that contributes to speech perception skills, especially when listening in a fluctuating background, is Temporal Fine Structure processing. TFS cues are pre-dominant in conveying Low Frequency (LF) signals. Harmonic (HI) and Disharmonic (DI) In-tonation are tests of pitch perception in the LF domain which are thought to depend on avail-ability of TFS cues and which are included in the protocol on this group of adult CI recipients. One of the primary aims of this thesis was the production of a new assessment tool, the Italian STARR test which was based on the measurement of speech perception using a roving-level adaptive method where the presentation level of both speech and noise signals varied between each sentence presentation. The STARR test attempts to reflect a better representation of real world listening conditions where background noise is usually present and speech intensity var-ies according to vocal capacity as well as the distance of the speaker. The outcomes for the Italian STARR in NH adults were studied to produce normative data, as well as to evaluate inter-list variability and learning effects. (Chapter 4). The second aim was to investigate LF pitch perception outcomes linked to availability of TFS cues in a group of adult CI recipients including bimodal users in relation to speech perception, in particular Italian STARR outcomes. Here it was seen that age had a significant effect on performance especially in older adults. Similarly, CI recipients (even better performers) showed abnormal findings in comparison to NH subjects. On the other hand, the significant effect of CI thresholds re-emphasized the sensitivity of the test to low intensity speech which a CI user can often encounter under everyday listening conditions. Statistically significant correlations between HI/DI and STARR performance were found. Moreover, bimodal benefit was seen both for HI/DI and STARR tests. Overall findings confirmed the usefulness of evaluating both LF pitch and speech perception in noise in order to track changes in TFS sen-sitivity for CI recipients over time and across different listening conditions which might be provided by future technological progress. (Chapter 5) Finally, the last and main aspect taken into account in this thesis was the study of the difficul-ties experienced by CI users when listening to music. An attempt was made to correlate find-ings resulting from the previous phases of this study both to Speech in Noise and to the com-plex subjective aspects of Music Perception and Appreciation: correlation analysis between HI/DI tests and the main dimensions of Speech in Noise (STARR and OLSA) and Music Ap-preciation was performed. (Chapter 6). Interestingly, positive findings were found for the two most complex types of Music (Classical, Jazz), whereas Soul did not seem to require particular competence in Pitch perception for the appreciation of the subjective variables taken into con-sideration by this study

    Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages

    Full text link
    Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages belong to different language families, resulting in differences in generated audio duration. This is further compounded by the original speaker's rhythm, especially for extempore speech. This paper describes the challenges in regenerating English lecture videos in Indian languages semi-automatically. A prototype is developed for dubbing lectures into 9 Indian languages. A mean-opinion-score (MOS) is obtained for two languages, Hindi and Tamil, on two different courses. The output video is compared with the original video in terms of MOS (1-5) and lip synchronisation with scores of 4.09 and 3.74, respectively. The human effort also reduces by 75%
    corecore