80 research outputs found

    Speech verification for computer assisted pronunciation training

    Get PDF
    Computer assisted pronunciation training (CAPT) is an approach that uses computer technology and computer-based resources in teaching and learning pronunciation. It is also part of computer assisted language learning (CALL) technology that has been widely applied to online learning platforms in the past years. This thesis deals with one of the central tasks in CAPT, i.e. speech veri- fication. The goal is to provide a framework that identifies pronunciation errors in speech data of second language (L2) learners and generates feedback with information and instruction for error correction. Furthermore, the framework is supposed to support the adaptation to new L1-L2 language pairs with minimal adjustment and modification. The central result is a novel approach to L2 speech verification, which combines both modern language technologies and linguistic expertise. For pronunciation verification, we select a set of L2 speech data, create alias phonemes from the errors annotated by linguists, then train an acoustic model with mixed L2 and gold standard data and perform HTK phoneme recognition to identify the error phonemes. For prosody verification, FD-PSOLA and Dynamic time warping are both applied to verify the differences in duration, pitch and stress. Feedback is generated for both verifications. Our feedback is presented to learners not only visually as with other existing CAPT systems, but also perceptually by synthesizing the learner’s own audio, e.g. for prosody verification, the gold standard prosody is transplanted onto the learner’s own voice. The framework is self-adaptable under semi-supervision, and requires only a certain amount of mixed gold standard and annotated L2 speech data for boot- strapping. Verified speech data is validated by linguists, annotated in case of wrong verification, and used in the next iteration of training. Mary Annotation Tool (MAT) is developed as an open-source component of MARYTTS for both annotating and validating. To deal with uncertain pauses and interruptions in L2 speech, the silence model in HTK is also adapted, and used in all components of the framework where forced alignment is required. Various evaluations are conducted that help us obtain insights into the applicability and potential of our CAPT system. The pronunciation verification shows high accuracy in both precision and recall, and encourages us to acquire more error-annotated L2 speech data to enhance the trained acoustic model. To test the effect of feedback, a progressive evaluation is carried out and it shows that our perceptual feedback helps learners realize their errors, which they could not otherwise observe from visual feedback and textual instructions. In order to im- prove the user interface, a questionnaire is also designed to collect the learners’ experiences and suggestions.Computer Assisted Pronunciation Training (CAPT) ist ein Ansatz, der mittels Computer und computergestützten Ressourcen das Erlernen der korrekten Aussprache im Fremdsprachenunterricht erleichtert. Dieser Ansatz ist ein Teil der Computer Assisted Language Learning (CALL) Technologie, die seit mehreren Jahren auf Online-Lernplattformen häufig zum Einsatz kommt. Diese Arbeit ist der Sprachverifikation gewidmet, einer der zentralen Aufgaben innerhalb des CAPT. Das Ziel ist, ein Framework zur Identifikation von Aussprachefehlern zu entwickeln fürMenschen, die eine Fremdsprache (L2-Sprache) erlernen. Dabei soll Feedback mit fehlerspezifischen Informationen und Anweisungen für eine richtige Aussprache erzeugt werden. Darüber hinaus soll das Rahmenwerk die Anpassung an neue Sprachenpaare (L1-L2) mit minimalen Adaptationen und Modifikationen unterstützen. Das zentrale Ergebnis ist ein neuartiger Ansatz für die L2-Sprachprüfung, der sowohl auf modernen Sprachtechnologien als auch auf corpuslinguistischen Ansätzen beruht. Für die Ausspracheüberprüfung erstellen wir Alias-Phoneme aus Fehlern, die von Linguisten annotiert wurden. Dann trainieren wir ein akustisches Modell mit gemischten L2- und Goldstandarddaten und führen eine HTK-Phonemerkennung3 aus, um die Fehlerphoneme zu identifizieren. Für die Prosodieüberprüfung werden sowohl FD-PSOLA4 und Dynamic Time Warping angewendet, um die Unterschiede in der Dauer, Tonhöhe und Betonung zwischen dem Gesprochenen und dem Goldstandard zu verifizieren. Feedbacks werden für beide Überprüfungen generiert und den Lernenden nicht nur visuell präsentiert, so wie in anderen vorhandenen CAPT-Systemen, sondern auch perzeptuell vorgestellt. So wird unter anderem für die Prosodieverifikation die Goldstandardprosodie auf die eigene Stimme des Lernenden übergetragen. Zur Anpassung des Frameworks an weitere L1-L2 Sprachdaten muss das System über Maschinelles Lernen trainiert werden. Da es sich um ein semi-überwachtes Lernverfahren handelt, sind nur eine gewisseMenge an gemischten Goldstandardund annotierten L2-Sprachdaten für das Bootstrapping erforderlich. Verifizierte Sprachdaten werden von Linguisten validiert, im Falle einer falschen Verifizierung nochmals annotiert, und bei der nächsten Iteration des Trainings verwendet. Für die Annotation und Validierung wurde das Mary Annotation Tool (MAT) als Open-Source-Komponente von MARYTTS entwickelt. Um mit unsicheren Pausen und Unterbrechungen in der L2-Sprache umzugehen, wurde auch das sogenannte Stillmodell in HTK angepasst und in allen Komponenten des Rahmenwerks verwendet, in denen Forced Alignment erforderlich ist. Unterschiedliche Evaluierungen wurden durchgeführt, um Erkenntnisse über die Anwendungspotenziale und die Beschränkungen des Systems zu gewinnen. Die Ausspracheüberprüfung zeigt eine hohe Genauigkeit sowohl bei der Präzision als auch beim Recall. Dadurch war es möglich weitere fehlerbehaftete L2-Sprachdaten zu verwenden, um somit das trainierte akustische Modell zu verbessern. Um die Wirkung des Feedbacks zu testen, wird eine progressive Auswertung durchgeführt. Das Ergebnis zeigt, dass perzeptive Feedbacks dabei helfen, dass die Lernenden sogar Fehler erkennen, die sie nicht aus visuellen Feedbacks und Textanweisungen beobachten können. Zudem wurden mittels Fragebogen die Erfahrungen und Anregungen der Benutzeroberfläche der Lernenden gesammelt, um das System künftig zu verbessern. 3 Hidden Markov Toolkit 4 Pitch Synchronous Overlap and Ad

    Hearing versus Listening: Attention to Speech and Its Role in Language Acquisition in Deaf Infants with Cochlear Implants

    Get PDF
    The advent of cochlear implantation has provided thousands of deaf infants and children access to speech and the opportunity to learn spoken language. Whether or not deaf infants successfully learn spoken language after implantation may depend in part on the extent to which they listen to speech rather than just hear it. We explore this question by examining the role that attention to speech plays in early language development according to a prominent model of infant speech perception – Jusczyk’s WRAPSA model – and by reviewing the kinds of speech input that maintains normal-hearing infants’ attention. We then review recent findings suggesting that cochlear-implanted infants’ attention to speech is reduced compared to normal-hearing infants and that speech input to these infants differs from input to infants with normal hearing. Finally, we discuss possible roles attention to speech may play on deaf children’s language acquisition after cochlear implantation in light of these findings and predictions from Jusczyk’s WRAPSA model

    Advances in the neurocognition of music and language

    Get PDF

    Exploring Early Language Acquisition from Different Kinds of Input: The Role of Attention

    Get PDF
    While most research on infant word segmentation has investigated the extraction of words from fluent speech in standard laboratory settings, the series of experiments in this dissertation examined the role of different kinds of input and exposure on infants’ word segmentation and word learning abilities. The first experiment suggests that infants are able to successfully segment words from fluent infant-directed speech (hereafter, IDS), which includes longer pauses, shorter utterances, higher fundamental frequencies, and wider pitch ranges, but also the fast and monotone input of adult-directed speech (hereafter, ADS) provided they were familiarized with these words over an extended-exposure period of six weeks at home. These 9-month-old infants, however, seem to be unable to segment words from fluent IDS during a standard laboratory-familiarization. Therefore, the second experiment examined whether German infants might require a more exaggerated IDS exposure similar to the one American English infants are addressed with. Here, 9-month-old, but not 7.5-month-old, infants successfully segmented the words presented in an exaggerated IDS register. Using neurophysiological measures, the third experiment further explored 7.5-month-old infants’ segmentation abilities and revealed that these infants were only able to successfully segment words from fluent exaggerated IDS. The final study of the dissertation extended the investigations to infants’ word learning abilities. Critically, infants were trained on word-object associations in fluent IDS or ADS. The results of this study demonstrated that infants were able to learn words regardless of the register they were trained in and suggest that infants are able to learn from a much greater variety of input available to them than previously suggested, extending the findings from word segmentation to word learning. Importantly, this dissertation presents the earliest evidence of ADS word segmentation and word learning presented in the literature to date. Hence, it provides new insights into infant word segmentation and word learning from different kinds of input and different kinds of exposure. Furthermore, the idea of attention as being a central mechanism in early language acquisition is supported by this dissertation.2018-05-0

    Søren Buus. Thirty years of psychoacoustic inspiration

    Get PDF
    • …
    corecore