Search CORE

314 research outputs found

The listening talker: A review of human and algorithmic context-induced modifications of speech

Author: Adriaans
Albin
Alcántara
Andruski
ANSI S3.5-1997
Arai
Assmann
Assmann
Aubanel
Aubanel
Aubanel
Babel
Babel
Bailly
Baran
Barker
Batliner
Beautemps
Beckford Wassink
Beckman
Beckman
Bele
Bell
Benoit
Best
Biersack
Bird
Blamey
Boike
Bond
Bond
Bond
Boril
Bradlow
Bradlow
Bradlow
Bradlow
Branigan
Bregman
Bronkhorst
Brungart
Brungart
Brunskog
Burnham
Burnham
Burnham
Burnham
Castellanos
Chen
Cheskin
Cheyne
Chládková
Chung
Church
Cole
Cooke
Cooke
Cooke
Cooke
Cooke
Cooke
Cooper
Cooper
Cox
Cox
Cristia
Cristià
Cutler
Darwin
Dau
Davis
Davis
Dejonckere
Delvaux
Dodane
Dreher
Dudley
Dunst
Egan
Englund
Eriksson
Erting
Estival
Falk
Farris
Ferguson
Ferguson
Fernald
Fernald
Fernald
Fernald
Fernald
Field
Fisher
Fisher
Fitzpatrick
Floccia
Fogerty
Fogerty
Fowler
Fowler
Freed
Fux
Fux
Fux
Gagne
Gagne
Gagne
Galati
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garrod
Giles
Goldwater
Golinkoff
Golinkoff
Gordon-Salant
Granlund
Granlund
Green
Grieser
Hawley
Hazan
Hazan
Hazan
Hazan
Healey
Helfer
Helfer
Hornsby
Horwitz
Howell
Imaizumi
Imaizumi
Ishizuka
Janarthanam
Johnson
Jun
Jung
Junqua
Junqua
Junqua
Kadiri
Kang
Kaplan
Kappes
Kawahara
Kewley-Port
Kim
Kim
Kirchhoff
Kitamura
Kitamura
Kondaurova
Kondaurova
Korn
Krause
Krause
Krause
Krause
Krause
Kretsinger
Kryter
Kuhl
Kusumoto
Lam
Lane
Laures
Laures
Lee
Lienard
Lindblom
Lindblom
Little
Liu
Liu
Liu
Lombard
Long
Long
Lu
Lu
Lu
Malsheen
Maniwa
Marin
Martin Cooke
Masataka
Matthies
Mattys
Mattys
Mattys
Maye
Maye
Mayo
Maëva Garnier
Metz
Michael
Miller
Mokbel
Monsen
Montgomery
Moon
Moon
Moore
Moore
Moulines
Naoi
Natale
Nejime
Newport
Niederjohn
Niwano
Niwano
Ostroff
Oviatt
Owren
Papoušek
Papoušek
Papoušek
Pardo
Patel
Patel
Payne
Payton
Pegg
Pelegrín-García
Perkell
Petkov
Peutz
Phillips
Picheny
Picheny
Picheny
Pickering
Pickett
Pickett
Pisoni
Pittman
Pollack
Pucher
Pye
Rasetshwane
Ratner
Ratner
Ratner
Rieser
Rogers
Rostolland
Rostolland
Ryan
Räsänen
Sachs
Sankowska
Sauert
Scarborough
Schmitt
Schulman
Schum
Shimron
Simon King
Sims
Singh
Skowronski
Smiljanic
Smith
Snow
Song
Stanton
Stern
Stilp
Stylianou
Summers
Summers
Sundberg
Sundberg
Sundberg
Suni
Synnestvedt
Taal
Taal
Tang
Tang
Tang
Tartter
Ternström
Thanavisuth
Titze
Torick
Trainor
Trainor
Traunmuller
Uchanski
Uchanski
Uther
Valentini-Botinhao
Valentini-Botinhao
Valian
Valian
van de Weijer
van Rooij
Vatikiotis-Bateson
Villegas
Vincent Aubanel
Vitevitch
Wang
Warner
Warren
Watson
Webster
Welby
Welby
Werker
World Health Organisation
Xu
Xu
Yamagishi
Yang
Yoo
Zajdó
Zampini
Zangl
Zhao
Zipf
Zorilă
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

Crossref

Hal - Université Grenoble Alpes

Edinburgh Research Explorer

Western Sydney ResearchDirect

Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study

Author: Ali Ahmed
Baali Massa
El-Hajj Wassim
Hayashi Tomoki
Maiti Soumi
Mubarak Hamdy
Watanabe Shinji
Publication venue
Publication date: 26/01/2023
Field of study

Several high-resource Text to Speech (TTS) systems currently produce natural, well-established human-like speech. In contrast, low-resource languages, including Arabic, have very limited TTS systems due to the lack of resources. We propose a fully unsupervised method for building TTS, including automatic data selection and pre-training/fine-tuning strategies for TTS training, using broadcast news as a case study. We show how careful selection of data, yet smaller amounts, can improve the efficiency of TTS system in generating more natural speech than a system trained on a bigger dataset. We adopt to propose different approaches for the: 1) data: we applied automatic annotations using DNSMOS, automatic vowelization, and automatic speech recognition (ASR) for fixing transcriptions' errors; 2) model: we used transfer learning from high-resource language in TTS model and fine-tuned it with one hour broadcast recording then we used this model to guide a FastSpeech2-based Conformer model for duration. Our objective evaluation shows 3.9% character error rate (CER), while the groundtruth has 1.3% CER. As for the subjective evaluation, where 1 is bad and 5 is excellent, our FastSpeech2-based Conformer model achieved a mean opinion score (MOS) of 4.4 for intelligibility and 4.2 for naturalness, where many annotators recognized the voice of the broadcaster, which proves the effectiveness of our proposed unsupervised method

arXiv.org e-Print Archive

Evaluating the predictions of objective intelligibility metrics for modified and synthetic speech

Author: Cooke Martin
Tang Yan
Valentini-Botinhao Cassia
Publication venue: 'Elsevier BV'
Publication date: 20/06/2015
Field of study

Several modification algorithms that alter natural or synthetic speech with the goal of improving intelligibility in noise have been proposed recently. A key requirement of many modification techniques is the ability to predict intelligibility, both offline during algorithm development, and online, in order to determine the optimal modification for the current noise context. While existing objective intelligibility metrics (OIMs) have good predictive power for unmodified natural speech in stationary and fluctuating noise, little is known about their effectiveness for other forms of speech. The current study evaluated how well seven OIMs predict listener responses in three large datasets of modified and synthetic speech which together represent 396 combinations of speech modification, masker type and signal-to-noise ratio. The chief finding is a clear reduction in predictive power for most OIMs when faced with modified and synthetic speech. Modifications introducing durational changes are particularly harmful to intelligibility predictors. OIMs that measure masked audibility tend to over-estimate intelligibility in the presence of fluctuating maskers relative to stationary maskers, while OIMs that estimate the distortion caused by the masker to a clean speech prototype exhibit the reverse pattern

University of Salford Institutional Repository

Crossref

Edinburgh Research Explorer

The language and literacy skills and behaviours of two middle primary severely to profoundly hearing impaired students in the school environment

Author: Kinsman Renee M.
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/1998
Field of study

Much research has shown that the hearing impaired population typically achieve only very low levels of literacy. Many researchers have examined the language and literacy deficits of the hearing impaired population in order to explain this. Nevertheless, a recent study has shown that hearing impaired children\u27s preschool language and literacy development may occur along a similar pathway to that of their hearing peers. The present study aimed to investigate the language and literacy skills, behaviours and interactions of two severely to profoundly hearing impaired middle primary boys in the context of their mainstream school. Both qualitative and quantitative data sources were accessed, which included background records, interviews, standardised testing, sample analyses and observations in the school environment. The boys were reported as having strong visual skills. Results showed that whilst they displayed delays in receptive language and metalinguistic awareness both boys were able to read, but with different levels of achievement: one showed delays in both word recognition and comprehension; the other demonstrated particularly strong word recognition but less highly developed comprehension. There were also differences between the boys in their levels of writing and social language. Nevertheless, whilst one of them showed appropriate social language and interaction skills, they were both often excluded by their hearing peers. Various peer, teacher and environmental factors were identified within the school setting which may have interfered with the boys\u27 social interactions and language and literacy learning. These findings are interpreted in terms of theories of language and literacy acquisition in hearing impaired children and their integration into mainstream settings. Some implications for educational practice and further research are presented

Research Online @ ECU

Detecting autism, emotions and social signals using AdaBoost

Author: Busa-Fekete Róbert
Gosztolya Gábor
Tóth László
Publication venue: Interspeech
Publication date: 01/01/2013
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Children's acoustic and linguistic adaptations of peers with hearing impairment

Author: Granlund S
Hazan VL
Mahon HM
Publication venue: 'American Speech Language Hearing Association'
Publication date: 31/12/2017
Field of study

Purpose: This study aims to examine the clear speaking strategies used by older children when interacting with a peer with hearing loss, focusing on both acoustic and linguistic adaptations in speech. Method: The Grid task, a problem-solving task developed to elicit spontaneous interactive speech, was used to obtain a range of global acoustic and linguistic measures. Eighteen 9- to 14-year-old children with normal-hearing (NH) performed the task in pairs, once with a friend with NH, and once with a friend with a hearing-impairment (HI). Results: In HI-directed speech, children increased their fundamental frequency range and mid-frequency intensity, decreased the number of words per phrase, and expanded their vowel space area by increasing F1 and F2 range, relative to NH-directed speech. However, participants did not appear to make changes to their articulation rate, the lexical frequency of content words, or to lexical diversity, when talking to their friend with HI compared to their friend with NH. Conclusions: Older children show evidence of listener-oriented adaptations to their speech production; although their speech production systems are still developing, they are able to make speech adaptations to benefit the needs of a peer with HI, even without being given specific instruction to do so

UCL Discovery

English as an Academic Lingua Franca in Spanish Tertiary Education: An Analysis of the use of Pragmatic Strategies in English-Medium LectureS.

Author: Luzón Marco María José
Velilla Sánchez María de los Ángeles
Vázquez Orta Ignacio
Publication venue: Universidad de Zaragoza, Prensas de la Universidad
Publication date: 01/01/2021
Field of study

Durante la última década, un cambio lingüístico ha sido especialmente notable en los contextos de educación superior debido al creciente uso del inglés como medio de instrucción (EMI) en las universidades europeas. Por ello, existe una innegable necesidad de saber más sobre las prácticas diarias de quienes participan en actividades académicas internacionales usando el inglés como vehículo de comunicación. Numerosos estudios se han realizado previamente en relación al inglés utilizado como lengua franca (ELF) en el ámbito académico. Sin embargo, existe una relativa falta de estudios empíricos sobre este uso del inglés en las universidades españolas en comparación con estudios similares en instituciones académicas europeas (Mauranen, 2006b; Björkman, 2010, 2011b, 2013). Esta investigación pretende estudiar las prácticas de inglés como medio de instrucción en diferentes disciplinas en la Universidad de Zaragoza (España), centrándose en el tipo de estrategias pragmáticas que utilizan los participantes para facilitar la comprensión. Estas prácticas lingüísticas son analizadas en este estudio con el fin de arrojar luz sobre el impacto que tiene el inglés en la eficacia comunicativa en estos entornos de enseñanza-aprendizaje.Los resultados derivan del análisis de un corpus de 12 clases magistrales impartidas en inglés como medio de instrucción que fueron grabadas en dos titulaciones diferentes. Estas se complementan con entrevistas semiestructuradas con los profesores y un pequeño corpus de diapositivas de presentaciones en formato PowerPoint que los mismos profesores utilizaron para impartir sus clases. Para analizar estos tres conjuntos de datos se ha utilizado un enfoque discursivo-pragmático y una metodología de orientación etnográfica. Por lo tanto, en este estudio se utiliza la triangulación de datos y la triangulación metodológica, ambas derivando en resultados tanto cuantitativos como cualitativos. Los resultados del estudio muestran 13 estrategias pragmáticas diferentes utilizadas en las sesiones magistrales grabadas para cumplir funciones comunicativas tales como potenciar la explicitud, aclarar y negociar el significado y/o el uso aceptable del lenguaje. El análisis de datos revela que las estrategias pragmáticas observadas en el corpus se utilizan principalmente para evitar posibles problemas comunicativos, pero también para remediar problemas de producción que obstaculizan abiertamente la comunicación y para co-construir la comprensión. Respaldando los estudios existentes sobre el inglés utilizado como lengua vehicular para la instrucción, los resultados revelan un uso altamente contextual y situacional de estrategias pragmáticas.<br /

Repositorio Universidad de Zaragoza

Exploring Intelligent Personal Assistants in Second Language Acquisition

Author: Moussalli Souheila
Publication venue
Publication date: 15/03/2022
Field of study

Abstract Exploring Intelligent Personal Assistants in Second Language Acquisition Souheila Moussalli, Ph.D. Concordia University, 2022 The goal of this dissertation is to investigate Intelligent Personal Assistants (IPAs), a voice-controlled service that can complete various functions by orally interacting with its users, as pedagogical tools in English second language classrooms to assess their pedagogical suitability. This dissertation begins with a review of the literature focusing on the importance of using technology in the language classroom. The remainder is divided into three manuscript-based chapters in which each manuscript addresses one aspect of the general research questions: (a) What are language learners’ perceptions of the use of IPAs as learning tools? (Manuscript A); (b) Can IPAs understand different language learners, and can these learners understand IPAs? (Manuscript B); and (c) Can IPAs help English language learners improve their receptive and productive skills? (Manuscript C). The first manuscript investigates the use of IPAs and users’ perceptions of the technology as a language learning tool. It examines a number of variables such as the IPAs’ ease of use, options for learner self-regulation (defined as learners’ ability to understand and control their learning environment), learner motivation and, more importantly, opportunities for learner input and output practice. The second manuscript explores IPA’s ability to interact with different accented language learners of English. The focus is on exploring the IPA’s ability to understand speech from different levels of language accentedness, and vice versa: to explore learners’ ability to understand the synthesized speech. The third manuscript investigates whether the pedagogical use of IPAs can lead to improvements in learners’ phonological awareness, perception and production of the allomorphy that characterizes regular past tense -ed marking in English (example depending on the preceding phonological environment, suffix -ed can be pronounced as talk/t/, play/d/ and add/id/). This dissertation contributes to our knowledge of learner experience and attitudes towards IPAs as it can further unfold the potentials and limitations of the technology. As far as second language phonology/pronunciation is concerned, the dissertation breaks new ground in research since little is known about IPAs and their pedagogical potential for the development of second language listening and speaking skills

Concordia University Research Repository

The impact of shared knowledge on speakers’ prosody

Author: Cau Cecile
Champagne-Lavau Maud
Michelas Amandine
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

International audienceHow does the knowledge shared by interlocutors during interaction modify the way speakers speak? Specifically, how does prosody change when speakers know that their addressees do not share the same knowledge as them? We studied these effects in an interactive paradigm in which French speakers gave instructions to addressees about where to place a cross between different objects (e.g., You put the cross between the red mouse and the red house). We manipulated (i) whether the two interlocutors shared or did not necessarily share the same objects and (ii) the informational status of referents. We were interested in two types of prosodic variations: global prosodic variations that affect entire utterances (i.e., pitch range and speech rate variations) and more local prosodic variations that encode infor-mational status of referents (i.e., prosodic phrasing for French). We found that participants spoke more slowly and with larger pitch excursions in the not-shared knowledge condition than in the shared knowledge condition while they did not prosodically encode the informa-tional status of referents regardless of the knowledge condition. Results demonstrated that speakers kept track of what the addressee knew, and that they adapted their global prosody to their interlocutors. This made the task too cognitively demanding to allow the prosodic encoding of the informational status of referents. Our findings are in line with the idea that complex reasoning usually implicated in constructing a model of the addressee co-exists with speaker-internal constraints such as cognitive load to affect speaker's prosody during interaction

HAL AMU

Directory of Open Access Journals

Asynchronous telemedicine applications in rehabilitation of acquired speech-language disorders in neurologic patients

Author
Publication venue: 'Dove Medical Press Ltd.'
Publication date
Field of study

Crossref