Search CORE

2,651 research outputs found

The Production of Speech Corpora

Author: Baumann Angela
Draxler Christoph
Ellbogen Tania
Schiel Florian
Steffen Alexander
Publication venue
Publication date: 21/03/2012
Field of study

Open Access LMU

Improving pronunciation through SpeechAce in Secondary Education

Author: Reinaldo Bueno Soraya
Publication venue
Publication date: 01/01/2018
Field of study

Repositorio de Universidad de La Rioja

PLASER: Pronunciation Learning via Automatic Speech Recognition

Author: Brian Mak
Brian Mak Manhung
Fong-ho Chong
Jacqueline Lo
Jimmy Wong
Ka-yee Leung
Kin-wah Chan
Manhung Siu
Mimi Ng
Simon Ho
Yik-cheung Tam
Yu-chung Chan
Publication venue
Publication date: 01/01/2003
Field of study

PLASER is a multimedia tool with instant feedback designed to teach English pronunciation for high-school students of Hong Kong whose mother tongue is Cantonese Chinese. The objective is to teach correct pronunciation and not to assess a student's overall pronunciation quality. Major challenges related to speech recognition technology include: allowance for non-native accent, reliable and corrective feedbacks, and visualization of errors

CiteSeerX

Hong Kong University of Science and Technology Institutional Repository

Integrating Automatic Transcription into the Language Documentation Workflow: Experiments with Na Data and the Persephone Toolkit

Author: Adams Oliver
Cohn Trevor Anthony
Guillaume Séverine
Michaud Alexis
Neubig Graham
Publication venue: 'University of Hawaii Press (Project Muse)'
Publication date: 01/01/2018
Field of study

Automatic speech recognition tools have potential for facilitating language documentation, but in practice these tools remain little-used by linguists for a variety of reasons, such as that the technology is still new (and evolving rapidly), user-friendly interfaces are still under development, and case studies demonstrating the practical usefulness of automatic recognition in a low-resource setting remain few. This article reports on a success story in integrating automatic transcription into the language documentation workflow, specifically for Yongning Na, a language of Southwest China. Using Persephone, an open-source toolkit, a single-speaker speech transcription tool was trained over five hours of manually transcribed speech. The experiments found that this method can achieve a remarkably low error rate (on the order of 17%), and that automatic transcriptions were useful as a canvas for the linguist. The present report is intended for linguists with little or no knowledge of speech processing. It aims to provide insights into (i) the way the tool operates and (ii) the process of collaborating with natural language processing specialists. Practical recommendations are offered on how to anticipate the requirements of this type of technology from the early stages of data collection in the field.National Foreign Language Resource Cente

ScholarSpace at University of Hawai'i at Manoa

University of Melbourne Institutional Repository

Self-managed Speech Therapy

Author: Kanevsky Dimitri
Savla Sagar
Starner Thad
Publication venue: Technical Disclosure Commons
Publication date: 14/08/2018
Field of study

Speech defects are typically addressed by having the patient or learner undergo several sessions with speech therapists, who apply specialized therapeutic tools. Speech therapies tend to be expensive, require the scheduling of appointments, and do not lend themselves easily to self-paced self-improvement. This disclosure presents techniques that automatically provide speech-improvement feedback, thereby enabling self-managed speech therapy. Given a speech utterance by a user, the techniques cause display of a sequence of images of speech-organ positions, e.g., tongue, lips, throat muscles, etc., that correspond to the actual utterance as well as a targeted, ideal utterance. Further phonetic feedback is provided to the user using visual, tactile, spectrogram, or other modes, such that a speaker who is hard of learning can work towards a target pronunciation. The techniques also apply to foreign language learning

Technical Disclosure Common

Voice and speech functions (B310-B340)

Author: McCartney Elspeth
Publication venue: Mac Keith Press
Publication date: 01/01/2012
Field of study

The International Classification of Functioning, Disability and Health for Children and Youth (ICF-CY) domain ‘voice and speech functions’ (b3) includes production and quality of voice (b310), articulation functions (b320), fluency and rhythm of speech (b330) and alternative vocalizations (b340, such as making musical sounds and crying, which are not reviewed here)

University of Strathclyde Institutional Repository

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

An XML Coding Scheme for Multimodal Corpus Annotation

Author: Blache Philippe
Ferré Gaëlle
Rauzy Stéphane
Publication venue: HAL CCSD
Publication date: 01/07/2007
Field of study

International audienceMultimodality has become one of today's most crucial challenges both for linguistics and computer science, entailing theoretical issues as well as practical ones (verbal interaction description, human-machine dialogues, virtual reality etc...). Understanding interaction processes is one of the main targets of these sciences, and requires to take into account the whole set of modalities and the way they interact.From a linguistic standpoint, language and speech analysis are based on studies of distinct research fields, such as phonetics, phonemics, syntax, semantics, pragmatics or gesture studies. Each of them have been investigated in the past either separately or in relation with another field that was considered as closely connected (e.g. syntax and semantics, prosody and syntax, etc.). The perspective adopted by modern linguistics is a considerably broader one: even though each domain reveals a certain degree of autonomy, it cannot be accounted for independently from its interactions with the other domains. Accordingly, the study of the interaction between the fields appears to be as important as the study of each distinct field. This is a pre-requisite for an elaboration of a valid theory of language. However, as important as the needs in this area might be, high level multimodal resources and adequate methods in order to construct them are scarce and unequally developed. Ongoing projects mainly focus on one modality as a main target, with an alternate modality as an optional complement. Moreover, coding standards in this field remain very partial and do not cover all the needs in terms of multimodal annotation. One of the first issues we have to face is the definition of a coding scheme providing adequate responses to the needs of the various levels encompassed, from phonetics to pragmatics or syntax. While working in the general context of international coding standards, we plan to create a specific coding standard designed to supply proper responses to the specific needs of multimodal annotation, as available solutions in the area do not seem to be totally satisfactory. <BR /

HAL AMU

An XML Coding Scheme for Multimodal Corpus Annotation

Author: Blache Philippe
Ferré Gaëlle
Rauzy Stéphane
Publication venue: HAL CCSD
Publication date: 01/07/2007
Field of study

HAL AMU