8,244 research outputs found

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation

    Get PDF
    We investigate whether infant-directed speech (IDS) could facilitate word form learning when compared to adult-directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: the IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability.Comment: Draf

    Infants segment words from songs - an EEG study

    No full text
    Children’s songs are omnipresent and highly attractive stimuli in infants’ input. Previous work suggests that infants process linguistic–phonetic information from simplified sung melodies. The present study investigated whether infants learn words from ecologically valid children’s songs. Testing 40 Dutch-learning 10-month-olds in a familiarization-then-test electroencephalography (EEG) paradigm, this study asked whether infants can segment repeated target words embedded in songs during familiarization and subsequently recognize those words in continuous speech in the test phase. To replicate previous speech work and compare segmentation across modalities, infants participated in both song and speech sessions. Results showed a positive event-related potential (ERP) familiarity effect to the final compared to the first target occurrences during both song and speech familiarization. No evidence was found for word recognition in the test phase following either song or speech. Comparisons across the stimuli of the present and a comparable previous study suggested that acoustic prominence and speech rate may have contributed to the polarity of the ERP familiarity effect and its absence in the test phase. Overall, the present study provides evidence that 10-month-old infants can segment words embedded in songs, and it raises questions about the acoustic and other factors that enable or hinder infant word segmentation from songs and speech

    Hearing versus Listening: Attention to Speech and Its Role in Language Acquisition in Deaf Infants with Cochlear Implants

    Get PDF
    The advent of cochlear implantation has provided thousands of deaf infants and children access to speech and the opportunity to learn spoken language. Whether or not deaf infants successfully learn spoken language after implantation may depend in part on the extent to which they listen to speech rather than just hear it. We explore this question by examining the role that attention to speech plays in early language development according to a prominent model of infant speech perception – Jusczyk’s WRAPSA model – and by reviewing the kinds of speech input that maintains normal-hearing infants’ attention. We then review recent findings suggesting that cochlear-implanted infants’ attention to speech is reduced compared to normal-hearing infants and that speech input to these infants differs from input to infants with normal hearing. Finally, we discuss possible roles attention to speech may play on deaf children’s language acquisition after cochlear implantation in light of these findings and predictions from Jusczyk’s WRAPSA model

    An automatic child-directed speech detector for the study of child language development

    Get PDF
    http://interspeech2012.org/accepted-abstract.html?id=210In this paper, we present an automatic child-directed speech detection system to be used in the study of child language development. Child-directed speech (CDS) is speech that is directed by caregivers towards infants. It is not uncommon for corpora used in child language development studies to have a combination of CDS and non-CDS. As the size of the corpora used in these studies grow, manual annotation of CDS becomes impractical. Our automatic CDS detector addresses this issue. The focus of this paper is to propose and evaluate different sets of features for the detection of CDS, using several offthe-shelf classifiers. First, we look at the performance of a set of acoustic features. We continue by combining these acoustic features with several linguistic and eventually contextual features. Using the full set of features, our CDS detector was able to correctly identify CDS with an accuracy of.88 and F1 score of.87 using Naive Bayes. Index Terms: motherese, automatic, child-directed speech, infant-directed speech, adult-directed speech, prosody, language development

    Pup Directed Vocalizations of Adult Females and Males in a Vocal Learning Bat

    Get PDF
    Social feedback plays an important role in human language development and in the vocal ontogeny of non-human animals. A special form of vocal feedback in humans, infant-directed speech – or motherese – facilitates language learning and is socially beneficial by increasing attention and arousal in the child. It is characterized by high pitch, expanded intonation contours and slower speech tempo. Furthermore, the vocal timbre (i.e., “color” of voice) of motherese differs from the timbre of adult-directed speech. In animals, pup-directed vocalizations are very common, especially in females. But so far there is hardly any research on whether there is a similar phenomenon as motherese in animal vocalizations. The greater sac-winged bat, Saccopteryx bilineata, is a vocal production learner with a large vocal repertoire that is acquired during ontogeny. We compared acoustic features between female pup-directed and adult-directed vocalizations and demonstrated that they differed in timbre and peak frequency. Furthermore, we described pup-directed vocalizations of adult males. During the ontogenetic period when pups’ isolation calls (ICs) (used to solicit maternal care) are converging toward each other to form a group signature, adult males also produce ICs. Pups’ ICs are acoustically more similar to those of males from the same social group than to other males. In conclusion, our novel findings indicate that parent-offspring communication in bats is more complex and multifaceted than previously thought, with female pup-directed vocalizations reminiscent of human motherese and male pup-directed vocalizations that may facilitate the transmission of a vocal signature across generations

    Mothers Reveal More of Their Vocal Identity When Talking to Infants

    Full text link
    Voice timbre – the unique acoustic information in a voice by which its speaker can be recognized – is particularly critical in mother-infant interaction. Correct identification of vocal timbre is necessary in order for infants to recognize their mothers as familiar both before and after birth, providing a basis for social bonding between infant and mother. The exact mechanisms underlying infant voice recognition remain ambiguous and have predominantly been studied in terms of cognitive voice recognition abilities of the infant. Here, we show – for the first time – that caregivers actively maximize their chances of being correctly recognized by presenting more details of their vocal timbre through adjustments to their voices known as infant-directed speech (IDS) or baby talk, a vocal register which is wide-spread through most of the world’s cultures. Using acoustic modelling (k-means clustering of Mel Frequency Cepstral Coefficients) of IDS in comparison with adult-directed speech (ADS), we found in two cohorts of speakers - US English and Swiss German mothers - that voice timbre clusters of in IDS are significantly larger to comparable clusters in ADS. This effect leads to a more detailed representation of timbre in IDS with subsequent benefits for recognition. Critically, an automatic speaker identification using a Gaussian-mixture model based on Mel Frequency Cepstral Coefficients showed significantly better performance in two experiments when trained with IDS as opposed to ADS. We argue that IDS has evolved as part of an adaptive set of evolutionary strategies that serve to promote indexical signalling by caregivers to their offspring which thereby promote social bonding via voice and acquiring linguistic systems

    Early phonetic learning without phonetic categories: Insights from large-scale simulations on realistic input

    Get PDF
    International audienceBefore they even speak, infants become attuned to the sounds of the language(s) they hear, processing native phonetic contrasts more easily than non-native ones. For example, between 6-8 months and 10-12 months, infants learning American English get better at distinguishing English [ɹ] and [l], as in ‘rock’ vs ‘lock’, relative to infants learning Japanese. Influential accounts of this early phonetic learning phenomenon initially proposed that infants group sounds into native vowel- and consonant-like phonetic categories—like [ɹ] and [l] in English—through a statistical clustering mechanism dubbed ‘distributional learning’. The feasibility of this mechanism for learning phonetic categories has been challenged, however. Here we demonstrate that a distributional learning algorithm operating on naturalistic speech can predict early phonetic learning as observed in Japanese and American English infants, suggesting that infants might learn through distributional learning after all. We further show, however, that contrary to the original distributional learning proposal, our model learns units too brief and too fine-grained acoustically to correspond to phonetic categories. This challenges the influential idea that what infants learn are phonetic categories. More broadly, our work introduces a novel mechanism-driven approach to the study of early phonetic learning, together with a quantitative modeling framework that can handle realistic input. This allows, for the first time, accounts of early phonetic learning to be linked to concrete, systematic predictions regarding infants’ attunement

    Does child-directed speech facilitate language development in all domains? A study space analysis of the existing evidence

    Get PDF
    Because child-directed speech (CDS) is ubiquitous in some cultures and because positive associations between certain features of the language input and certain learning outcomes have been attested it has often been claimed that the function of CDS is to aid children’s language development in general. We argue that for this claim to be generalisable, superior learning from CDS compared to non-CDS, such as adult-directed speech (ADS), must be demonstrated across multiple input domains and learning outcomes. To determine the availability of such evidence we performed a study space analysis of the research literature on CDS. A total of 942 relevant papers were coded with respect to (i) CDS features under consideration, (ii) learning outcomes and (iii) whether a comparison between CDS and ADS was reported. The results show that only 16.2% of peer-reviewed studies in this field compared learning outcomes between CDS and ADS, almost half of which focussed on the ability to discriminate between the two registers. Crucially, we found only 20 studies comparing learning outcomes between CDS and ADS for morphosyntactic and lexico-semantic features and none for pragmatic and extra-linguistic features. Although these 20 studies provided preliminary evidence for a facilitative effect of some specific morphosyntactic and lexico-semantic features, overall CDS-ADS comparison studies are very unevenly distributed across the space of CDS features and outcome measures. The disproportional emphasis on prosodic, phonetic, and phonological input features, and register discrimination as the outcome invites caution with respect to the generalisability of the claim that CDS facilitates language development across the breadth of input domains and learning outcomes. Future research ought to resolve the discrepancy between sweeping claims about the function of CDS as facilitating language development on the one hand and the narrow evidence base for such a claim on the other by conducting CDS-ADS comparisons across a wider range of input features and outcome measures

    Neural processing of changes in phonetic and emotional speech sounds and tones in preterm infants at term age

    Get PDF
    Objective: Auditory change-detection responses provide information on sound discrimination and memory skills in infants. We examined both the automatic change-detection process and the processing of emotional information content in speech in preterm infants in comparison to full-term infants at term age. Methods: Preterm (n = 21) and full-term infants' (n = 20) event-related potentials (ERP) were recorded at term age. A challenging multi-feature mismatch negativity (MMN) paradigm with phonetic deviants and rare emotional speech sounds (happy, sad, angry), and a simple one-deviant oddball paradigm with pure tones were used. Results: Positive mismatch responses (MMR) were found to the emotional sounds and some of the phonetic deviants in preterm and full-term infants in the multi-feature MMN paradigm. Additionally, late positive MMRs to the phonetic deviants were elicited in the preterm group. However, no group differences to speech-sound changes were discovered. In the oddball paradigm, preterm infants had positive MMRs to the deviant change in all latency windows. Responses to non-speech sounds were larger in preterm infants in the second latency window, as well as in the first latency window at the left hemisphere electrodes (F3, C3). Conclusions: No significant group-level differences were discovered in the neural processing of speech sounds between preterm and full-term infants at term age. Change-detection of non-speech sounds, however, may be enhanced in preterm infants at term age. Significance: Auditory processing of speech sounds in healthy preterm infants showed similarities to full-term infants at term age. Large individual variations within the groups may reflect some underlying differences that call for further studies.Peer reviewe
    corecore