6 research outputs found

    Speech verification for computer assisted pronunciation training

    Get PDF
    Computer assisted pronunciation training (CAPT) is an approach that uses computer technology and computer-based resources in teaching and learning pronunciation. It is also part of computer assisted language learning (CALL) technology that has been widely applied to online learning platforms in the past years. This thesis deals with one of the central tasks in CAPT, i.e. speech veri- fication. The goal is to provide a framework that identifies pronunciation errors in speech data of second language (L2) learners and generates feedback with information and instruction for error correction. Furthermore, the framework is supposed to support the adaptation to new L1-L2 language pairs with minimal adjustment and modification. The central result is a novel approach to L2 speech verification, which combines both modern language technologies and linguistic expertise. For pronunciation verification, we select a set of L2 speech data, create alias phonemes from the errors annotated by linguists, then train an acoustic model with mixed L2 and gold standard data and perform HTK phoneme recognition to identify the error phonemes. For prosody verification, FD-PSOLA and Dynamic time warping are both applied to verify the differences in duration, pitch and stress. Feedback is generated for both verifications. Our feedback is presented to learners not only visually as with other existing CAPT systems, but also perceptually by synthesizing the learner’s own audio, e.g. for prosody verification, the gold standard prosody is transplanted onto the learner’s own voice. The framework is self-adaptable under semi-supervision, and requires only a certain amount of mixed gold standard and annotated L2 speech data for boot- strapping. Verified speech data is validated by linguists, annotated in case of wrong verification, and used in the next iteration of training. Mary Annotation Tool (MAT) is developed as an open-source component of MARYTTS for both annotating and validating. To deal with uncertain pauses and interruptions in L2 speech, the silence model in HTK is also adapted, and used in all components of the framework where forced alignment is required. Various evaluations are conducted that help us obtain insights into the applicability and potential of our CAPT system. The pronunciation verification shows high accuracy in both precision and recall, and encourages us to acquire more error-annotated L2 speech data to enhance the trained acoustic model. To test the effect of feedback, a progressive evaluation is carried out and it shows that our perceptual feedback helps learners realize their errors, which they could not otherwise observe from visual feedback and textual instructions. In order to im- prove the user interface, a questionnaire is also designed to collect the learners’ experiences and suggestions.Computer Assisted Pronunciation Training (CAPT) ist ein Ansatz, der mittels Computer und computergestützten Ressourcen das Erlernen der korrekten Aussprache im Fremdsprachenunterricht erleichtert. Dieser Ansatz ist ein Teil der Computer Assisted Language Learning (CALL) Technologie, die seit mehreren Jahren auf Online-Lernplattformen häufig zum Einsatz kommt. Diese Arbeit ist der Sprachverifikation gewidmet, einer der zentralen Aufgaben innerhalb des CAPT. Das Ziel ist, ein Framework zur Identifikation von Aussprachefehlern zu entwickeln fürMenschen, die eine Fremdsprache (L2-Sprache) erlernen. Dabei soll Feedback mit fehlerspezifischen Informationen und Anweisungen für eine richtige Aussprache erzeugt werden. Darüber hinaus soll das Rahmenwerk die Anpassung an neue Sprachenpaare (L1-L2) mit minimalen Adaptationen und Modifikationen unterstützen. Das zentrale Ergebnis ist ein neuartiger Ansatz für die L2-Sprachprüfung, der sowohl auf modernen Sprachtechnologien als auch auf corpuslinguistischen Ansätzen beruht. Für die Ausspracheüberprüfung erstellen wir Alias-Phoneme aus Fehlern, die von Linguisten annotiert wurden. Dann trainieren wir ein akustisches Modell mit gemischten L2- und Goldstandarddaten und führen eine HTK-Phonemerkennung3 aus, um die Fehlerphoneme zu identifizieren. Für die Prosodieüberprüfung werden sowohl FD-PSOLA4 und Dynamic Time Warping angewendet, um die Unterschiede in der Dauer, Tonhöhe und Betonung zwischen dem Gesprochenen und dem Goldstandard zu verifizieren. Feedbacks werden für beide Überprüfungen generiert und den Lernenden nicht nur visuell präsentiert, so wie in anderen vorhandenen CAPT-Systemen, sondern auch perzeptuell vorgestellt. So wird unter anderem für die Prosodieverifikation die Goldstandardprosodie auf die eigene Stimme des Lernenden übergetragen. Zur Anpassung des Frameworks an weitere L1-L2 Sprachdaten muss das System über Maschinelles Lernen trainiert werden. Da es sich um ein semi-überwachtes Lernverfahren handelt, sind nur eine gewisseMenge an gemischten Goldstandardund annotierten L2-Sprachdaten für das Bootstrapping erforderlich. Verifizierte Sprachdaten werden von Linguisten validiert, im Falle einer falschen Verifizierung nochmals annotiert, und bei der nächsten Iteration des Trainings verwendet. Für die Annotation und Validierung wurde das Mary Annotation Tool (MAT) als Open-Source-Komponente von MARYTTS entwickelt. Um mit unsicheren Pausen und Unterbrechungen in der L2-Sprache umzugehen, wurde auch das sogenannte Stillmodell in HTK angepasst und in allen Komponenten des Rahmenwerks verwendet, in denen Forced Alignment erforderlich ist. Unterschiedliche Evaluierungen wurden durchgeführt, um Erkenntnisse über die Anwendungspotenziale und die Beschränkungen des Systems zu gewinnen. Die Ausspracheüberprüfung zeigt eine hohe Genauigkeit sowohl bei der Präzision als auch beim Recall. Dadurch war es möglich weitere fehlerbehaftete L2-Sprachdaten zu verwenden, um somit das trainierte akustische Modell zu verbessern. Um die Wirkung des Feedbacks zu testen, wird eine progressive Auswertung durchgeführt. Das Ergebnis zeigt, dass perzeptive Feedbacks dabei helfen, dass die Lernenden sogar Fehler erkennen, die sie nicht aus visuellen Feedbacks und Textanweisungen beobachten können. Zudem wurden mittels Fragebogen die Erfahrungen und Anregungen der Benutzeroberfläche der Lernenden gesammelt, um das System künftig zu verbessern. 3 Hidden Markov Toolkit 4 Pitch Synchronous Overlap and Ad

    Proceedings of the VIIth GSCP International Conference

    Get PDF
    The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)

    Chinese elements : a bridge of the integration between Chinese -English translation and linguaculture transnational mobility

    Get PDF
    [Abstract] As the popularity of Chinese elements in the innovation of the translation part in Chinese CET, we realized that Chinese elements have become a bridge between linguaculture transnational mobility and Chinese-English translation.So, Chinese students translation skills should be critically improved; for example, on their understanding about Chinese culture, especially the meaning of Chinese culture. Five important secrets of skillful translation are introduced to improve students’ translation skills

    Speaking on the record

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2005.Includes bibliographical references (p. 258-273).Reading and writing have become the predominant way of acquiring and expressing intellect in Western culture. Somewhere along the way, the ability to write has become completely identified with intellectual power, creating a graphocentric myopia concerning the very nature and transfer of knowledge. One of the effects of graphocentrism is a conflation of concepts proper to knowledge in general with concepts specific to written expression. The words 'literate' and 'literacy' themselves are a simple case: their connotations sometimes focus on the process of reading text and sometimes on the kinds of knowledge that happen to be associated in our culture with people who read many books. This thesis has a conceptual and an empirical component. On the conceptual side a central task is to disengage certain concepts that have become conflated by defining new terms. Our vocabulary is insufficient to describe alternatives that serve some or all of the functions of writing and reading in a different modality. As a first step, I introduce a new word to provide a counterpart to writing in a spoken modality: speak + write = sprite. Spriting in its general form is the activity of speaking 'on the record' that yields a technologically-supported representation of oral speech with essential properties of writing such as permanence of record, possibilities of editing, indexing, and scanning, but without the difficult transition to a deeply different form of representation such as writing itself. This thesis considers a particular (still primitive compared with might come in the future) version of spriting in the form of two technology-supported representations of speech: (1) the speech ·in audible form, and (2) the speech in visible form.(cont.) The product of spriting is a kind of 'spoken' document, or talkument. As one reads a text, one may likewise aude a talkument. In contrast, I use the word writing for the manual activity of making marks, while text refers to the marks made. Making these distinctions is a small step towards envisioning a deep change in the world that might go beyond graphocentrism and come to appreciate spriting as the first step--but just the first--towards developing ways of manipulating spoken language, exemplified by turning it into a permanent record, permitting editing, indexing, searching and more. The empirical side of the thesis is confined to exploring implications of spriting in educational settings. I study one group of urban adults who are at elementary levels of reading and writing, and two groups of urban elementary school children who are of different ages, cultures and socioeconomic status, and who have appropriated writing as a tool for thought and expression to greater or lesser extents. One effect of graphocentrism in our culture is the very limited and constrained developmental path of literacy and learning. This has not always been the case. And it does not need to be so in the future. This thesis discusses some small ways in which we might re-value modes of expression in education closer to oral language than to writing. This thesis recognizes three ways in which spriting is relevant to education: (1) spriting can serve as a stepping stone to writing skills, (2) it can in some circumstances serve as a substitute for writing, and (3) it provides a window onto cognitive processes that are present but less apparent in the context of producing text.Tara Michelle Rosenberger Shankar.Ph.D

    Ethics in High-Quality Research

    Get PDF
    corecore