Search CORE

337 research outputs found

Recommended from our members

Deep Learning for Automatic Assessment and Feedback of Spoken English

Author: Kyriakopoulos Konstantinos
Publication venue: University of Cambridge
Publication date: 12/03/2022
Field of study

Growing global demand for learning a second language (L2), particularly English, has led to considerable interest in automatic spoken language assessment, whether for use in computerassisted language learning (CALL) tools or for grading candidates for formal qualifications. This thesis presents research conducted into the automatic assessment of spontaneous nonnative English speech, with a view to be able to provide meaningful feedback to learners. One of the challenges in automatic spoken language assessment is giving candidates feedback on particular aspects, or views, of their spoken language proficiency, in addition to the overall holistic score normally provided. Another is detecting pronunciation and other types of errors at the word or utterance level and feeding them back to the learner in a useful way. It is usually difficult to obtain accurate training data with separate scores for different views and, as examiners are often trained to give holistic grades, single-view scores can suffer issues of consistency. Conversely, holistic scores are available for various standard assessment tasks such as Linguaskill. An investigation is thus conducted into whether assessment scores linked to particular views of the speaker’s ability can be obtained from systems trained using only holistic scores. End-to-end neural systems are designed with structures and forms of input tuned to single views, specifically each of pronunciation, rhythm, intonation and text. By training each system on large quantities of candidate data, individual-view information should be possible to extract. The relationships between the predictions of each system are evaluated to examine whether they are, in fact, extracting different information about the speaker. Three methods of combining the systems to predict holistic score are investigated, namely averaging their predictions and concatenating and attending over their intermediate representations. The combined graders are compared to each other and to baseline approaches. The tasks of error detection and error tendency diagnosis become particularly challenging when the speech in question is spontaneous and particularly given the challenges posed by the inconsistency of human annotation of pronunciation errors. An approach to these tasks is presented by distinguishing between lexical errors, wherein the speaker does not know how a particular word is pronounced, and accent errors, wherein the candidate’s speech exhibits consistent patterns of phone substitution, deletion and insertion. Three annotated corpora x of non-native English speech by speakers of multiple L1s are analysed, the consistency of human annotation investigated and a method presented for detecting individual accent and lexical errors and diagnosing accent error tendencies at the speaker level

Apollo (Cambridge)

Automatic Pronunciation Assessment -- A Review

Author: Ali Ahmed
Chowdhury Shammur Absar
Kheir Yassine El
Publication venue
Publication date: 21/10/2023
Field of study

Pronunciation assessment and its application in computer-aided pronunciation training (CAPT) have seen impressive progress in recent years. With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. We categorize the main challenges observed in prominent research trends, and highlight existing limitations, and available resources. This is followed by a discussion of the remaining challenges and possible directions for future work.Comment: 9 pages, accepted to EMNLP Finding

arXiv.org e-Print Archive

Detecting autism, emotions and social signals using AdaBoost

Author: Busa-Fekete Róbert
Gosztolya Gábor
Tóth László
Publication venue: Interspeech
Publication date: 01/01/2013
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

English Lexical Stress Recognition Using Recurrent Neural Networks

Author: Tuhola Matti
Publication venue
Publication date: 18/10/2019
Field of study

Lexical stress is an integral part of English pronunciation. The command of lexical stress has an effect on the perceived fluency of the speaker. Moreover, it serves as a cue to recognize words. Methods that can automatically recognize lexical stress in spoken audio can be used to help English learners improve their pronunciation. This thesis evaluated lexical stress recognition methods based on recurrent neural networks. The purpose was to compare two sets of features: a set of prosodic features making use of existing speech recognition technologies, and simple spectral features. Using the latter feature set would allow for an end-to-end model, significantly simplifying the overall process. The problem was formulated as one of locating the primary stress, the most prominently stressed syllable in the word, in an isolated word. Datasets of both native and non-native speech were used in the experiments. The results show that models using the prosodic features outperform models using the spectral features. The difference between the two was particularly stark on the non-native dataset. It is possible that the datasets were too small to enable training end-to-end models. There was a considerable variation in performance among different words. It was also observed that the presence of a secondary stress made it more difficult to detect the primary stress.Sanapaino on olennainen osa englannin kielen ääntämistä. Sen osaaminen vaikuttaa puhujan havaittuun sujuvuuteen, ja se toimii vihjeenä sanojen tunnistamiselle. Menetelmiä, joilla sanapaino voidaan automaattisesti tunnistaa puheesta, voidaan käyttää apuna englannin oppijoiden ääntämisen parantamisessa. Tämä diplomityö arvioi takaisinkytkeytyviin neuroverkkoihin perustuvia menetelmiä sanapainon tunnistukseen. Tarkoitus oli vertailla kahdenlaisia piirteitä: joukkoa prosodisia piirteitä, jotka hyödyntävät olemassa olevia puheentunnistusteknologioita, ja yksinkertaisia äänen spektriin perustuvia piirteitä. Jälkimmäisten piirteiden käyttö mahdollistaisi päästä-päähän -mallien käyttämisen, mikä yksinkertaistaisi kokonaisprosessia merkittävästi. Ongelma esitettiin muodossa, jossa tarkoitus oli löytää pääpainon sijainti, eli sanan voimakkaiten erottuva tavu, yksittäisestä sanasta. Tutkimuksessa käytettiin dataa sekä englantia äidinkielenään että ei-äidinkielenään puhuvilta. Tulosten mukaan prosodisia piirteitä käyttävät mallit suoriutuvat tehtävästä paremmin kuin äänen spektriin perustuvia piirteitä käyttävät mallit. Erot olivat erityisen suuria datajoukossa, joka koostui englantia ei-äidinkielenään puhuvien puheesta. On mahdollista, että käytetyt datajoukot olivat liian pieniä päästä-päähän -mallien opettamista varten. Mallien suorituskyvyssä oli huomattavaa vaihtelua eri sanojen välillä. Tutkimuksessa havaittiin myös, että sivupainon läsnäolo vaikeutti pääpainon tunnistamista

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Automatic Screening of Childhood Speech Sound Disorders and Detection of Associated Pronunciation Errors

Author: Shahin Mostafa
Publication venue: UNSW, Sydney
Publication date: 01/01/2023
Field of study

Speech disorders in children can affect their fluency and intelligibility. Delay in their diagnosis and treatment increases the risk of social impairment and learning disabilities. With the significant shortage of Speech and Language Pathologists (SLPs), there is an increasing interest in Computer-Aided Speech Therapy tools with automatic detection and diagnosis capability. However, the scarcity and unreliable annotation of disordered child speech corpora along with the high acoustic variations in the child speech data has impeded the development of reliable automatic detection and diagnosis of childhood speech sound disorders. Therefore, this thesis investigates two types of detection systems that can be achieved with minimum dependency on annotated mispronounced speech data. First, a novel approach that adopts paralinguistic features which represent the prosodic, spectral, and voice quality characteristics of the speech was proposed to perform segment- and subject-level classification of Typically Developing (TD) and Speech Sound Disordered (SSD) child speech using a binary Support Vector Machine (SVM) classifier. As paralinguistic features are both language- and content-independent, they can be extracted from an unannotated speech signal. Second, a novel Mispronunciation Detection and Diagnosis (MDD) approach was introduced to detect the pronunciation errors made due to SSDs and provide low-level diagnostic information that can be used in constructing formative feedback and a detailed diagnostic report. Unlike existing MDD methods where detection and diagnosis are performed at the phoneme level, the proposed method achieved MDD at the speech attribute level, namely the manners and places of articulations. The speech attribute features describe the involved articulators and their interactions when making a speech sound allowing a low-level description of the pronunciation error to be provided. Two novel methods to model speech attributes are further proposed in this thesis, a frame-based (phoneme-alignment) method leveraging the Multi-Task Learning (MTL) criterion and training a separate model for each attribute, and an alignment-free jointly-learnt method based on the Connectionist Temporal Classification (CTC) sequence to sequence criterion. The proposed techniques have been evaluated using standard and publicly accessible adult and child speech corpora, while the MDD method has been validated using L2 speech corpora

UNSWorks

Proceedings of the Sixteenth Australasian International Conference on Speech Science and Technology

Author
Publication venue: ASSTA
Publication date: 31/12/2016
Field of study

UCL Discovery

The Processing of Emotional Sentences by Young and Older Adults: A Visual World Eye-movement Study

Author: Carminati Maria Nella
Knoeferle Pia
Publication venue
Publication date: 01/01/2012
Field of study

Carminati MN, Knoeferle P. The Processing of Emotional Sentences by Young and Older Adults: A Visual World Eye-movement Study. Presented at the Architectures and Mechanisms of Language and Processing (AMLaP), Riva del Garda, Italy

Publications at Bielefeld University