Search CORE

438 research outputs found

The listening talker: A review of human and algorithmic context-induced modifications of speech

Author: Adriaans
Albin
Alcántara
Andruski
ANSI S3.5-1997
Arai
Assmann
Assmann
Aubanel
Aubanel
Aubanel
Babel
Babel
Bailly
Baran
Barker
Batliner
Beautemps
Beckford Wassink
Beckman
Beckman
Bele
Bell
Benoit
Best
Biersack
Bird
Blamey
Boike
Bond
Bond
Bond
Boril
Bradlow
Bradlow
Bradlow
Bradlow
Branigan
Bregman
Bronkhorst
Brungart
Brungart
Brunskog
Burnham
Burnham
Burnham
Burnham
Castellanos
Chen
Cheskin
Cheyne
Chládková
Chung
Church
Cole
Cooke
Cooke
Cooke
Cooke
Cooke
Cooke
Cooper
Cooper
Cox
Cox
Cristia
Cristià
Cutler
Darwin
Dau
Davis
Davis
Dejonckere
Delvaux
Dodane
Dreher
Dudley
Dunst
Egan
Englund
Eriksson
Erting
Estival
Falk
Farris
Ferguson
Ferguson
Fernald
Fernald
Fernald
Fernald
Fernald
Field
Fisher
Fisher
Fitzpatrick
Floccia
Fogerty
Fogerty
Fowler
Fowler
Freed
Fux
Fux
Fux
Gagne
Gagne
Gagne
Galati
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garrod
Giles
Goldwater
Golinkoff
Golinkoff
Gordon-Salant
Granlund
Granlund
Green
Grieser
Hawley
Hazan
Hazan
Hazan
Hazan
Healey
Helfer
Helfer
Hornsby
Horwitz
Howell
Imaizumi
Imaizumi
Ishizuka
Janarthanam
Johnson
Jun
Jung
Junqua
Junqua
Junqua
Kadiri
Kang
Kaplan
Kappes
Kawahara
Kewley-Port
Kim
Kim
Kirchhoff
Kitamura
Kitamura
Kondaurova
Kondaurova
Korn
Krause
Krause
Krause
Krause
Krause
Kretsinger
Kryter
Kuhl
Kusumoto
Lam
Lane
Laures
Laures
Lee
Lienard
Lindblom
Lindblom
Little
Liu
Liu
Liu
Lombard
Long
Long
Lu
Lu
Lu
Malsheen
Maniwa
Marin
Martin Cooke
Masataka
Matthies
Mattys
Mattys
Mattys
Maye
Maye
Mayo
Maëva Garnier
Metz
Michael
Miller
Mokbel
Monsen
Montgomery
Moon
Moon
Moore
Moore
Moulines
Naoi
Natale
Nejime
Newport
Niederjohn
Niwano
Niwano
Ostroff
Oviatt
Owren
Papoušek
Papoušek
Papoušek
Pardo
Patel
Patel
Payne
Payton
Pegg
Pelegrín-García
Perkell
Petkov
Peutz
Phillips
Picheny
Picheny
Picheny
Pickering
Pickett
Pickett
Pisoni
Pittman
Pollack
Pucher
Pye
Rasetshwane
Ratner
Ratner
Ratner
Rieser
Rogers
Rostolland
Rostolland
Ryan
Räsänen
Sachs
Sankowska
Sauert
Scarborough
Schmitt
Schulman
Schum
Shimron
Simon King
Sims
Singh
Skowronski
Smiljanic
Smith
Snow
Song
Stanton
Stern
Stilp
Stylianou
Summers
Summers
Sundberg
Sundberg
Sundberg
Suni
Synnestvedt
Taal
Taal
Tang
Tang
Tang
Tartter
Ternström
Thanavisuth
Titze
Torick
Trainor
Trainor
Traunmuller
Uchanski
Uchanski
Uther
Valentini-Botinhao
Valentini-Botinhao
Valian
Valian
van de Weijer
van Rooij
Vatikiotis-Bateson
Villegas
Vincent Aubanel
Vitevitch
Wang
Warner
Warren
Watson
Webster
Welby
Welby
Werker
World Health Organisation
Xu
Xu
Yamagishi
Yang
Yoo
Zajdó
Zampini
Zangl
Zhao
Zipf
Zorilă
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

Crossref

Hal - Université Grenoble Alpes

Edinburgh Research Explorer

Western Sydney ResearchDirect

Double Articulation Analyzer with Prosody for Unsupervised Word and Phoneme Discovery

Author: Okuda Yasuaki
Ozaki Ryo
Taniguchi Tadahiro
Publication venue
Publication date: 15/03/2021
Field of study

Infants acquire words and phonemes from unsegmented speech signals using segmentation cues, such as distributional, prosodic, and co-occurrence cues. Many pre-existing computational models that represent the process tend to focus on distributional or prosodic cues. This paper proposes a nonparametric Bayesian probabilistic generative model called the prosodic hierarchical Dirichlet process-hidden language model (Prosodic HDP-HLM). Prosodic HDP-HLM, an extension of HDP-HLM, considers both prosodic and distributional cues within a single integrative generative model. We conducted three experiments on different types of datasets, and demonstrate the validity of the proposed method. The results show that the Prosodic DAA successfully uses prosodic cues and outperforms a method that solely uses distributional cues. The main contributions of this study are as follows: 1) We develop a probabilistic generative model for time series data including prosody that potentially has a double articulation structure; 2) We propose the Prosodic DAA by deriving the inference procedure for Prosodic HDP-HLM and show that Prosodic DAA can discover words directly from continuous human speech signals using statistical information and prosodic information in an unsupervised manner; 3) We show that prosodic cues contribute to word segmentation more in naturally distributed case words, i.e., they follow Zipf's law.Comment: 11 pages, Submitted to IEEE Transactions on Cognitive and Developmental System

arXiv.org e-Print Archive

Children\u27s Sensitivity to Pitch Variation in Language

Author: Quam Carolyn
Publication venue: ScholarlyCommons
Publication date: 01/01/2010
Field of study

Children acquire consonant and vowel categories by 12 months, but take much longer to learn to interpret perceptible variation. This dissertation considers children’s interpretation of pitch variation. Pitch operates, often simultaneously, at different levels of linguistic structure. English-learning children must disregard pitch at the lexical level—since English is not a tone language—while still attending to pitch for its other functions. Chapters 1 and 5 outline the learning problem and suggest ways children might solve it. Chapter 2 demonstrates that 2.5-year-olds know pitch cannot differentiate words in English. Chapter 3 finds that not until age 4–5 do children correctly interpret pitch cues to emotions. Chapter 4 demonstrates some sensitivity between 2.5 and 5 years to the pitch cue to lexical stress, but continuing difficulties at the older ages. These findings suggest a late trajectory for interpretation of prosodic variation; throughout, I propose explanations for this protracted time-course

ScholarlyCommons@Penn

Categories, words and rules in language acquisition

Author: Hochmann Jean Remy
Publication venue: place:Trieste
Publication date: 06/12/2010
Field of study

Acquiring language requires learning a set of words (i.e. the lexicon) and abstract rules that combine them to form sentences (i.e. syntax). In this thesis, we show that infants acquiring their mother tongue rely on different speech categories to extract: words and to abstract regularities. We address this issue with a study that investigates how young infants use consonants and vowels, showing that certain computations are tuned to one or the other of these speech categories..

Sissa Digital Library

Lexical and Prosodic Pitch Modifications in Cantonese Infant-directed Speech

Author: Kager René
Kalashnikova Marina
Lai Regine
Wang Luchang
Wong Patrick C.M.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2021
Field of study

Published online 03 February 2021The functions of acoustic-phonetic modifications in infant-directed speech (IDS) remain a question: do they specifically serve to facilitate language learning via enhanced phonemic contrasts (the hyperarticulation hypothesis) or primarily to improve communication via prosodic exaggeration (the prosodic hypothesis)? The study of lexical tones provides a unique opportunity to shed light on this, as lexical tones are phonemically contrastive, yet their primary cue, pitch, is also a prosodic cue. This study investigated Cantonese IDS and found increased intra-talker variation of lexical tones, which more likely posed a challenge to rather than facilitated phonetic learning. Although tonal space was expanded which could facilitate phonetic learning, its expansion was a function of overall intonational modifications. Similar findings were observed in speech to pets who should not benefit from larger phonemic distinction. We conclude that lexicaltone adjustments in IDS mainly serve to broadly enhance communication rather than specifically increase phonemic contrast for learners.This work was supported by the University Grants Committee (HKSAR) (RGC34000118), the Innovation and Technology Fund (HKSAR) (ITS/067/18), Dr. Stanley Ho Medical Development Foundation, and the Global Parent Child Resource Centre Limited. The second author’s work is supported by the Basque Government through the BERC 2018-2021 program and by the Spanish Ministry of Science and Innovation through the Ramon y Cajal Research Fellowship, PID2019-105528GA-I00

Archivo Digital para la Docencia y la Investigación

Voice Onset Time in Infant-directed Speech at Two Ages

Author: Synnestvedt Anna
Publication venue
Publication date: 01/01/2010
Field of study

Studies have reported differences between infant-directed speech (IDS) and adult-directed speech (ADS), suggesting that mothers adjust speech to their infants in ways that may help children process the incoming acoustical signal. One aspect of IDS that has been examined is clarification of voice onset time (VOT). Results have been inconsistent and many studies only report differences in VOT values rather than differences in amount of overlap between voiced and voiceless items. The present study examines 15 mothers' VOT in IDS at 7.5 months old and again at 11 months as compared to their VOT values in ADS. Words with initial stop consonants that occurred in IDS and ADS conditions were analyzed using PRAAT. Contrary to hypotheses, results show that VOT in IDS was less differentiated than VOT in ADS. Additionally, voiced items had significantly longer VOT in IDS than ADS, with no difference for voiceless items. Possible explanations are discussed

Digital Repository at the University of Maryland

How tone, intonation and emotion shape the development of infants' fundamental frequency perception

Author: Goetz Antonia (R20414)
Liu Liquan (R18335)
Lorette Pernelle
Tyler Michael D. (R11374)
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2022
Field of study

Fundamental frequency (ƒ0), perceived as pitch, is the first and arguably most salient auditory component humans are exposed to since the beginning of life. It carries multiple linguistic (e.g., word meaning) and paralinguistic (e.g., speakers’ emotion) functions in speech and communication. The mappings between these functions and ƒ0 features vary within a language and differ cross-linguistically. For instance, a rising pitch can be perceived as a question in English but a lexical tone in Mandarin. Such variations mean that infants must learn the specific mappings based on their respective linguistic and social environments. To date, canonical theoretical frameworks and most empirical studies do not view or consider the multi-functionality of ƒ0, but typically focus on individual functions. More importantly, despite the eventual mastery of ƒ0 in communication, it is unclear how infants learn to decompose and recognize these overlapping functions carried by ƒ0. In this paper, we review the symbioses and synergies of the lexical, intonational, and emotional functions that can be carried by ƒ0 and are being acquired throughout infancy. On the basis of our review, we put forward the Learnability Hypothesis that infants decompose and acquire multiple ƒ0 functions through native/environmental experiences. Under this hypothesis, we propose representative cases such as the synergy scenario, where infants use visual cues to disambiguate and decompose the different ƒ0 functions. Further, viable ways to test the scenarios derived from this hypothesis are suggested across auditory and visual modalities. Discovering how infants learn to master the diverse functions carried by ƒ0 can increase our understanding of linguistic systems, auditory processing and communication functions

Western Sydney ResearchDirect

NORA - Norwegian Open Research Archives

Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation

Author: Cristia Alejandrina
Dupoux Emmanuel
Guevara-Rukoz Adriana
Ludusan Bogdan
Martin Andrew
Mazuka Reiko
Thiollière Roland
Publication venue
Publication date: 23/12/2017
Field of study

We investigate whether infant-directed speech (IDS) could facilitate word form learning when compared to adult-directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: the IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability.Comment: Draf

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Proceedings of the Sixteenth Australasian International Conference on Speech Science and Technology

Author
Publication venue: ASSTA
Publication date: 31/12/2016
Field of study

UCL Discovery

Input quality and speech perception development in bilingual infants' first year of life

Author: Carreiras Manuel
Kalashnikova Marina
Publication venue: 'Wiley'
Publication date: 01/01/2022
Field of study

Epub 2021 Oct 20Individual differences in infants’ native phonological development have been linked to the quantity and quality of infant-directed speech (IDS). The effects of parental and infant bilingualism on this relation in 131 five-and nine-month- old monolingual and bilingual Spanish and Basque infants (72 male; 59 female; from white middle-class background) were investigated. Bilingualism did not affect the developmental trajectory of infants’ native and non-native speech perception and the quality of maternal speech. In both language groups, vowel exaggeration in IDS was significantly related to speech perception skills for 9-month- olds (r = −.30), but not for 5-month- olds. This demonstrates that bilingual and monolingual caregivers provide their infants with speech input that assists their task of learning the phonological inventory of one or two languages.Eusko Jaurlaritza, Grant/Award Number: BERC 2018-2021; Severo Ochoa Excellence Program, Grant/Award Number: SEV-2015- 0490; Ministerio de Ciencia e Innovación, Grant/Award Number: PID2019-105528GA- I00; H2020 Marie Skłodowska-Curie Actions, Grant/Award Number: 79890

Archivo Digital para la Docencia y la Investigación

Western Sydney ResearchDirect