5,135 research outputs found

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Lenition in English

    Get PDF

    Saliency or template? ERP evidence for long-term representation of word stress

    Get PDF
    The present study investigated the event-related brain potential (ERP) correlates of word stress processing. Previous results showed that the violation of a legal stress pattern elicited two consecutive Mismatch Negativity (MMN) components synchronized to the changes on the first and second syllable. The aim of the present study was to test whether ERPs reflect only the detection of salient features present on the syllables, or they reflect the activation of long-term stress related representations. We examined ERPs elicited by pseudowords with no lexical representation in two conditions: the standard having a legal stress patterns, and the deviant an illegal one, and the standard having an illegal stress pattern, and the deviant a legal one. We found that the deviant having an illegal stress pattern elicited two consecutive MMN components, whereas the deviant having a legal stress pattern did not elicit MMN. Moreover, pseudowords with a legal stress pattern elicited the same ERP responses irrespective of their role in the oddball sequence, i.e., if they were standards or deviants. The results suggest that stress pattern changes are processed relying on long-term representation of word stress. To account for these results, we propose that the processing of stress cues is based on language-specific, pre-lexical stress templates

    How to Do Things Without Words: Infants, utterance-activity and distributed cognition

    Get PDF
    Clark and Chalmers (1998) defend the hypothesis of an ‘Extended Mind’, maintaining that beliefs and other paradigmatic mental states can be implemented outside the central nervous system or body. Aspects of the problem of ‘language acquisition’ are considered in the light of the extended mind hypothesis. Rather than ‘language’ as typically understood, the object of study is something called ‘utterance-activity’, a term of art intended to refer to the full range of kinetic and prosodic features of the on-line behaviour of interacting humans. It is argued that utterance activity is plausibly regarded as jointly controlled by the embodied activity of interacting people, and that it contributes to the control of their behaviour. By means of specific examples it is suggested that this complex joint control facilitates easier learning of at least some features of language. This in turn suggests a striking form of the extended mind, in which infants’ cognitive powers are augmented by those of the people with whom they interact

    Infant Cry Signal Processing, Analysis, and Classification with Artificial Neural Networks

    Get PDF
    As a special type of speech and environmental sound, infant cry has been a growing research area covering infant cry reason classification, pathological infant cry identification, and infant cry detection in the past two decades. In this dissertation, we build a new dataset, explore new feature extraction methods, and propose novel classification approaches, to improve the infant cry classification accuracy and identify diseases by learning infant cry signals. We propose a method through generating weighted prosodic features combined with acoustic features for a deep learning model to improve the performance of asphyxiated infant cry identification. The combined feature matrix captures the diversity of variations within infant cries and the result outperforms all other related studies on asphyxiated baby crying classification. We propose a non-invasive fast method of using infant cry signals with convolutional neural network (CNN) based age classification to diagnose the abnormality of infant vocal tract development as early as 4-month age. Experiments discover the pattern and tendency of the vocal tract changes and predict the abnormality of infant vocal tract by classifying the cry signals into younger age category. We propose an approach of generating hybrid feature set and using prior knowledge in a multi-stage CNNs model for robust infant sound classification. The dominant and auxiliary features within the set are beneficial to enlarge the coverage as well as keeping a good resolution for modeling the diversity of variations within infant sound and the experimental results give encouraging improvements on two relative databases. We propose an approach of graph convolutional network (GCN) with transfer learning for robust infant cry reason classification. Non-fully connected graphs based on the similarities among the relevant nodes are built to consider the short-term and long-term effects of infant cry signals related to inner-class and inter-class messages. With as limited as 20% of labeled training data, our model outperforms that of the CNN model with 80% labeled training data in both supervised and semi-supervised settings. Lastly, we apply mel-spectrogram decomposition to infant cry classification and propose a fusion method to further improve the infant cry classification performance

    The conceptualization of a theoretical framework for a music intervention to improve auditory development in very preterm infants

    Get PDF
    Very preterm infants are at a high risk for language delays that can persist throughout their lifetime. The auditory system is rapidly developing and highly sensitive to acoustic stimulation during the third trimester of pregnancy. The acoustic nature of the womb provides the essential foundation for auditory perceptual skills necessary for language acquisition. In contrast, the NICU environment presents a wider spectrum of sounds that can alter the early development of the auditory system and cause delays in language acquisition. Research supports the importance of early exposure to speech sounds for optimal development of auditory perceptual ability and the critical role of the intrauterine characteristics of language. Pitches below 300 Hz, as well as rhythmic patterns and prosodic contours are highly salient intrauterine features of language that make up the infant’s initial auditory experience. The purpose of this study is to form a theoretical framework as a structure for understanding how intrauterine speech characteristics of pitch, rhythm, and prosody can be implemented as active ingredients in a music intervention to improve auditory development and long-term language outcomes in very premature infants. The framework is presented and described in detail. Implications for a future research agenda and applications for clinical practice are explored
    corecore