2,862 research outputs found
The listening talker: A review of human and algorithmic context-induced modifications of speech
International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output
Seeing a talking face matters to infants, children and adults : behavioural and neurophysiological studies
Everyday conversations typically occur face-to-face. Over and above auditory information, visual information from a speaker’s face, e.g., lips, eyebrows, contributes to speech perception and comprehension. The facilitation that visual speech cues bring— termed the visual speech benefit—are experienced by infants, children and adults. Even so, studies on speech perception have largely focused on auditory-only speech leaving a relative paucity of research on the visual speech benefit. Central to this thesis are the behavioural and neurophysiological manifestations of the visual speech benefit. As the visual speech benefit assumes that a listener is attending to a speaker’s talking face, the investigations are conducted in relation to the possible modulating effects that gaze behaviour brings. Three investigations were conducted. Collectively, these studies demonstrate that visual speech information facilitates speech perception, and this has implications for individuals who do not have clear access to the auditory speech signal. The results, for instance the enhancement of 5-month-olds’ cortical tracking by visual speech cues, and the effect of idiosyncratic differences in gaze behaviour on speech processing, expand knowledge of auditory-visual speech processing, and provide firm bases for new directions in this burgeoning and important area of research
Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation
We investigate whether infant-directed speech (IDS) could facilitate word
form learning when compared to adult-directed speech (ADS). To study this, we
examine the distribution of word forms at two levels, acoustic and
phonological, using a large database of spontaneous speech in Japanese. At the
acoustic level we show that, as has been documented before for phonemes, the
realizations of words are more variable and less discriminable in IDS than in
ADS. At the phonological level, we find an effect in the opposite direction:
the IDS lexicon contains more distinctive words (such as onomatopoeias) than
the ADS counterpart. Combining the acoustic and phonological metrics together
in a global discriminability score reveals that the bigger separation of
lexical categories in the phonological space does not compensate for the
opposite effect observed at the acoustic level. As a result, IDS word forms are
still globally less discriminable than ADS word forms, even though the effect
is numerically small. We discuss the implication of these findings for the view
that the functional role of IDS is to improve language learnability.Comment: Draf
Language discrimination by newborns: Teasing apart phonotactic, rhythmic, and intonational cues
Speech rhythm has long been claimed to be a useful bootstrapping cue in the very first steps of language acquisition. Previous studies have suggested that newborn infants do categorize varieties of speech rhythm, as demonstrated by their ability to discriminate between certain languages. However, the existing evidence is not unequivocal: in previous studies, stimuli discriminated by newborns always contained additional speech cues on top of rhythm. Here, we conducted a series of experiments assessing discrimination between Dutch and Japanese by newborn infants, using a speech resynthesis technique to progressively degrade non-rhythmical properties of the sentences. When the stimuli are resynthesized using identical phonemes and artificial intonation contours for the two languages, thereby preserving only their rhythmic and broad phonotactic structure, newborns still seem to be able to discriminate between the two languages, but the effect is weaker than when intonation is present. This leaves open the possibility that the temporal correlation between intonational and rhythmic cues might actually facilitate the processing of speech rhythm
PERCEPTION OF ACCENTS AND DIALECTS IN ADULTS AND INFANTS
This thesis has been undertaken with the purpose of investigating how adult speech
processing systems are affected by. and how they cope with, the presence of different
regional and foreign accents in speech, and to investigate the developmental origins of
adult accent perception capabilities.
Experiments 1 to 4 were designed to investigate the long term effects of exposure to
different accents, and whether short term adaptation to an accent was possible, using
a lexical decision task. The results demonstrated an effect of accent familiarity but no
short term adaptation was evident. Experiments 5 to 7 investigated the short term
effects of accents by looking at the length of activation of accent-related information in
working memory by using a cross-modal matching task. The results found that
selective accent related effects were reduced after a 1500 millisecond delay.
Experiments 8 to 11 investigated infants' discrimination abilities for regional and
foreign accents using a preferential looking habituation method, and found infants at 5
and 7 months could discriminate their own accent from another, unfamiliar regional
accent, but could not discriminate two unfamiliar regional accents at 5 months or a
foreign accent from their own at 7 months. Experiments 12 and 13 investigated how
accents affected infants' word segmentation abilities with continuous speech at 10
months, and found that segmentation was impaired in the presence of regional and
foreign accents.
Using these results, the Accent Training Model (ATP) is proposed, which attempts to
explain how accent related indexical information is processed in the speech processing
system. The findings of the infant studies further our understanding of the effect of
indexicat variation in early speech perception
The Processing of Accented Speech
This thesis examines the processing of accented speech in both infants and adults. Accents provide a natural and reasonably consistent form of inter-speaker variation in the speech signal, but it is not yet clear exactly what processes are used to normalise this form of variation, or when and how those processes develop. Two adult studies use ERP data to examine differences between the online processing of regional- and foreign-accented speech as compared to a baseline consisting of the listeners’ home accent. These studies demonstrate that the two types of accents recruit normalisation processes which are qualitatively, and not just quantitatively, different. This provided support for the hypothesis that foreign and regional accents require different mechanisms to normalise accent-based variation (Adank et al., 2009, Floccia et al., 2009), rather than for the hypothesis that different types of accents are normalised according to their perceptual distance from the listener’s own accent (Clarke & Garrett, 2004). They also provide support for the Abstract entry approach to lexical storage of variant forms, which suggests that variant forms undergo a process of prelexical normalisation, allowing access to a canonical lexical entry (Pallier et al., 2001), rather than for the Exemplar-based approach, which suggests that variant word-forms are individually represented in the lexicon (Johnson, 1997). Two further studies examined how infants segment words from continuous speech when presented with accented speakers. The first of these includes a set of behavioural experiments, which highlight some methodological issues in the existing literature and offer some potential explanations for conflicting evidence about the age at which infants are able to segment speech. The second uses ERP data to investigate segmentation within and across accents, and provides neurophysiological evidence that 11-month-olds are able to distinguish newly-segmented words at the auditory level even within a foreign accent, or across accents, but that they are more able to treat new word-forms as word-like in a familiar accent than a foreign accent
An integrated theory of language production and comprehension
Currently, production and comprehension are regarded as quite distinct in accounts of language processing. In rejecting this dichotomy, we instead assert that producing and understanding are interwoven, and that this interweaving is what enables people to predict themselves and each other. We start by noting that production and comprehension are forms of action and action perception. We then consider the evidence for interweaving in action, action perception, and joint action, and explain such evidence in terms of prediction. Specifically, we assume that actors construct forward models of their actions before they execute those actions, and that perceivers of others' actions covertly imitate those actions, then construct forward models of those actions. We use these accounts of action, action perception, and joint action to develop accounts of production, comprehension, and interactive language. Importantly, they incorporate well-defined levels of linguistic representation (such as semantics, syntax, and phonology). We show (a) how speakers and comprehenders use covert imitation and forward modeling to make predictions at these levels of representation, (b) how they interweave production and comprehension processes, and (c) how they use these predictions to monitor the upcoming utterances. We show how these accounts explain a range of behavioral and neuroscientific data on language processing and discuss some of the implications of our proposal
The Acoustic Features and Didactic Function of Foreigner-Directed Speech: A Scoping Review
Published online: Aug 1, 2022Purpose: This scoping review considers the acoustic features of a clear
speech register directed to nonnative listeners known as foreigner-directed
speech (FDS). We identify vowel hyperarticulation and low speech rate as the
most representative acoustic features of FDS; other features, including wide
pitch range and high intensity, are still under debate. We also discuss factors
that may influence the outcomes and characteristics of FDS. We start by
examining accommodation theories, outlining the reasons why FDS is likely
to serve a didactic function by helping listeners acquire a second language
(L2). We examine how this speech register adapts to listeners’ identities and
linguistic needs, suggesting that FDS also takes listeners’ L2 proficiency into
account. To confirm the didactic function of FDS, we compare it to other
clear speech registers, specifically infant-directed speech and Lombard
speech.
Conclusions: Our review reveals that research has not yet established whether
FDS succeeds as a didactic tool that supports L2 acquisition. Moreover, a complex
set of factors determines specific realizations of FDS, which need further
exploration. We conclude by summarizing open questions and indicating directions
and recommendations for future research.This research was supported by a Doctoral Fellowship
(LCF/BQ/DI19/11730045) from “La Caixa” Foundation
(ID 100010434) awarded to Giorgio Piazza and by the
Spanish Ministry of Science and Innovation through the
Ramon y Cajal Research Fellowship (RYC2018-024284-I)
awarded to Marina Kalashnikova. This research was supported
by the Basque Government through the BERC
2022-2025 program and by the Spanish State Research
Agency through BCBL Severo Ochoa excellence accreditation
CEX2020-001010-S. This research was also supported
by the Spanish Ministry of Economy and Competitiveness
(PID2020-113926GB-I00 awarded to Clara D. Martin)
and by the European Research Council under the European
Union’s Horizon 2020 research and innovation programme
(Grant Agreement 819093 awarded to Clara D.
Martin)
Mothers Reveal More of Their Vocal Identity When Talking to Infants
Voice timbre – the unique acoustic information in a voice by which its speaker can be recognized – is particularly critical in mother-infant interaction. Correct identification of vocal timbre is necessary in order for infants to recognize their mothers as familiar both before and after birth, providing a basis for social bonding between infant and mother. The exact mechanisms underlying infant voice recognition remain ambiguous and have predominantly been studied in terms of cognitive voice recognition abilities of the infant. Here, we show – for the first time – that caregivers actively maximize their chances of being correctly recognized by presenting more details of their vocal timbre through adjustments to their voices known as infant-directed speech (IDS) or baby talk, a vocal register which is wide-spread through most of the world’s cultures. Using acoustic modelling (k-means clustering of Mel Frequency Cepstral Coefficients) of IDS in comparison with adult-directed speech (ADS), we found in two cohorts of speakers - US English and Swiss German mothers - that voice timbre clusters of in IDS are significantly larger to comparable clusters in ADS. This effect leads to a more detailed representation of timbre in IDS with subsequent benefits for recognition. Critically, an automatic speaker identification using a Gaussian-mixture model based on Mel Frequency Cepstral Coefficients showed significantly better performance in two experiments when trained with IDS as opposed to ADS. We argue that IDS has evolved as part of an adaptive set of evolutionary strategies that serve to promote indexical signalling by caregivers to their offspring which thereby promote social bonding via voice and acquiring linguistic systems
- …