2,862 research outputs found

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Seeing a talking face matters to infants, children and adults : behavioural and neurophysiological studies

    Get PDF
    Everyday conversations typically occur face-to-face. Over and above auditory information, visual information from a speaker’s face, e.g., lips, eyebrows, contributes to speech perception and comprehension. The facilitation that visual speech cues bring— termed the visual speech benefit—are experienced by infants, children and adults. Even so, studies on speech perception have largely focused on auditory-only speech leaving a relative paucity of research on the visual speech benefit. Central to this thesis are the behavioural and neurophysiological manifestations of the visual speech benefit. As the visual speech benefit assumes that a listener is attending to a speaker’s talking face, the investigations are conducted in relation to the possible modulating effects that gaze behaviour brings. Three investigations were conducted. Collectively, these studies demonstrate that visual speech information facilitates speech perception, and this has implications for individuals who do not have clear access to the auditory speech signal. The results, for instance the enhancement of 5-month-olds’ cortical tracking by visual speech cues, and the effect of idiosyncratic differences in gaze behaviour on speech processing, expand knowledge of auditory-visual speech processing, and provide firm bases for new directions in this burgeoning and important area of research

    Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation

    Get PDF
    We investigate whether infant-directed speech (IDS) could facilitate word form learning when compared to adult-directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: the IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability.Comment: Draf

    Language discrimination by newborns: Teasing apart phonotactic, rhythmic, and intonational cues

    Get PDF
    Speech rhythm has long been claimed to be a useful bootstrapping cue in the very first steps of language acquisition. Previous studies have suggested that newborn infants do categorize varieties of speech rhythm, as demonstrated by their ability to discriminate between certain languages. However, the existing evidence is not unequivocal: in previous studies, stimuli discriminated by newborns always contained additional speech cues on top of rhythm. Here, we conducted a series of experiments assessing discrimination between Dutch and Japanese by newborn infants, using a speech resynthesis technique to progressively degrade non-rhythmical properties of the sentences. When the stimuli are resynthesized using identical phonemes and artificial intonation contours for the two languages, thereby preserving only their rhythmic and broad phonotactic structure, newborns still seem to be able to discriminate between the two languages, but the effect is weaker than when intonation is present. This leaves open the possibility that the temporal correlation between intonational and rhythmic cues might actually facilitate the processing of speech rhythm

    PERCEPTION OF ACCENTS AND DIALECTS IN ADULTS AND INFANTS

    Get PDF
    This thesis has been undertaken with the purpose of investigating how adult speech processing systems are affected by. and how they cope with, the presence of different regional and foreign accents in speech, and to investigate the developmental origins of adult accent perception capabilities. Experiments 1 to 4 were designed to investigate the long term effects of exposure to different accents, and whether short term adaptation to an accent was possible, using a lexical decision task. The results demonstrated an effect of accent familiarity but no short term adaptation was evident. Experiments 5 to 7 investigated the short term effects of accents by looking at the length of activation of accent-related information in working memory by using a cross-modal matching task. The results found that selective accent related effects were reduced after a 1500 millisecond delay. Experiments 8 to 11 investigated infants' discrimination abilities for regional and foreign accents using a preferential looking habituation method, and found infants at 5 and 7 months could discriminate their own accent from another, unfamiliar regional accent, but could not discriminate two unfamiliar regional accents at 5 months or a foreign accent from their own at 7 months. Experiments 12 and 13 investigated how accents affected infants' word segmentation abilities with continuous speech at 10 months, and found that segmentation was impaired in the presence of regional and foreign accents. Using these results, the Accent Training Model (ATP) is proposed, which attempts to explain how accent related indexical information is processed in the speech processing system. The findings of the infant studies further our understanding of the effect of indexicat variation in early speech perception

    The Processing of Accented Speech

    Get PDF
    This thesis examines the processing of accented speech in both infants and adults. Accents provide a natural and reasonably consistent form of inter-speaker variation in the speech signal, but it is not yet clear exactly what processes are used to normalise this form of variation, or when and how those processes develop. Two adult studies use ERP data to examine differences between the online processing of regional- and foreign-accented speech as compared to a baseline consisting of the listeners’ home accent. These studies demonstrate that the two types of accents recruit normalisation processes which are qualitatively, and not just quantitatively, different. This provided support for the hypothesis that foreign and regional accents require different mechanisms to normalise accent-based variation (Adank et al., 2009, Floccia et al., 2009), rather than for the hypothesis that different types of accents are normalised according to their perceptual distance from the listener’s own accent (Clarke & Garrett, 2004). They also provide support for the Abstract entry approach to lexical storage of variant forms, which suggests that variant forms undergo a process of prelexical normalisation, allowing access to a canonical lexical entry (Pallier et al., 2001), rather than for the Exemplar-based approach, which suggests that variant word-forms are individually represented in the lexicon (Johnson, 1997). Two further studies examined how infants segment words from continuous speech when presented with accented speakers. The first of these includes a set of behavioural experiments, which highlight some methodological issues in the existing literature and offer some potential explanations for conflicting evidence about the age at which infants are able to segment speech. The second uses ERP data to investigate segmentation within and across accents, and provides neurophysiological evidence that 11-month-olds are able to distinguish newly-segmented words at the auditory level even within a foreign accent, or across accents, but that they are more able to treat new word-forms as word-like in a familiar accent than a foreign accent

    An integrated theory of language production and comprehension

    Get PDF
    Currently, production and comprehension are regarded as quite distinct in accounts of language processing. In rejecting this dichotomy, we instead assert that producing and understanding are interwoven, and that this interweaving is what enables people to predict themselves and each other. We start by noting that production and comprehension are forms of action and action perception. We then consider the evidence for interweaving in action, action perception, and joint action, and explain such evidence in terms of prediction. Specifically, we assume that actors construct forward models of their actions before they execute those actions, and that perceivers of others' actions covertly imitate those actions, then construct forward models of those actions. We use these accounts of action, action perception, and joint action to develop accounts of production, comprehension, and interactive language. Importantly, they incorporate well-defined levels of linguistic representation (such as semantics, syntax, and phonology). We show (a) how speakers and comprehenders use covert imitation and forward modeling to make predictions at these levels of representation, (b) how they interweave production and comprehension processes, and (c) how they use these predictions to monitor the upcoming utterances. We show how these accounts explain a range of behavioral and neuroscientific data on language processing and discuss some of the implications of our proposal

    The Acoustic Features and Didactic Function of Foreigner-Directed Speech: A Scoping Review

    Get PDF
    Published online: Aug 1, 2022Purpose: This scoping review considers the acoustic features of a clear speech register directed to nonnative listeners known as foreigner-directed speech (FDS). We identify vowel hyperarticulation and low speech rate as the most representative acoustic features of FDS; other features, including wide pitch range and high intensity, are still under debate. We also discuss factors that may influence the outcomes and characteristics of FDS. We start by examining accommodation theories, outlining the reasons why FDS is likely to serve a didactic function by helping listeners acquire a second language (L2). We examine how this speech register adapts to listeners’ identities and linguistic needs, suggesting that FDS also takes listeners’ L2 proficiency into account. To confirm the didactic function of FDS, we compare it to other clear speech registers, specifically infant-directed speech and Lombard speech. Conclusions: Our review reveals that research has not yet established whether FDS succeeds as a didactic tool that supports L2 acquisition. Moreover, a complex set of factors determines specific realizations of FDS, which need further exploration. We conclude by summarizing open questions and indicating directions and recommendations for future research.This research was supported by a Doctoral Fellowship (LCF/BQ/DI19/11730045) from “La Caixa” Foundation (ID 100010434) awarded to Giorgio Piazza and by the Spanish Ministry of Science and Innovation through the Ramon y Cajal Research Fellowship (RYC2018-024284-I) awarded to Marina Kalashnikova. This research was supported by the Basque Government through the BERC 2022-2025 program and by the Spanish State Research Agency through BCBL Severo Ochoa excellence accreditation CEX2020-001010-S. This research was also supported by the Spanish Ministry of Economy and Competitiveness (PID2020-113926GB-I00 awarded to Clara D. Martin) and by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement 819093 awarded to Clara D. Martin)

    Mothers Reveal More of Their Vocal Identity When Talking to Infants

    Full text link
    Voice timbre – the unique acoustic information in a voice by which its speaker can be recognized – is particularly critical in mother-infant interaction. Correct identification of vocal timbre is necessary in order for infants to recognize their mothers as familiar both before and after birth, providing a basis for social bonding between infant and mother. The exact mechanisms underlying infant voice recognition remain ambiguous and have predominantly been studied in terms of cognitive voice recognition abilities of the infant. Here, we show – for the first time – that caregivers actively maximize their chances of being correctly recognized by presenting more details of their vocal timbre through adjustments to their voices known as infant-directed speech (IDS) or baby talk, a vocal register which is wide-spread through most of the world’s cultures. Using acoustic modelling (k-means clustering of Mel Frequency Cepstral Coefficients) of IDS in comparison with adult-directed speech (ADS), we found in two cohorts of speakers - US English and Swiss German mothers - that voice timbre clusters of in IDS are significantly larger to comparable clusters in ADS. This effect leads to a more detailed representation of timbre in IDS with subsequent benefits for recognition. Critically, an automatic speaker identification using a Gaussian-mixture model based on Mel Frequency Cepstral Coefficients showed significantly better performance in two experiments when trained with IDS as opposed to ADS. We argue that IDS has evolved as part of an adaptive set of evolutionary strategies that serve to promote indexical signalling by caregivers to their offspring which thereby promote social bonding via voice and acquiring linguistic systems
    corecore