150 research outputs found

    Reactions of adult listeners to infant speech-like vocalizations and cry

    Get PDF

    Vocal Flexibility in Early Human Communication

    Get PDF
    The dissertation contains two papers on the theme of flexibility in infant communication using an infrastructural approach. An infrastructural approach considers infant communication in terms of properties of human language (i.e., spontaneous vocalization, functional flexibility, social interactivity, and etc.). Infants\u27 vocal flexibility is explored in two ways in the dissertation: 1) How infants use sounds with varying emotional valences, a primary determiner of their communicative functions, and when this infrastrucutral property emerges (the first paper, in Chapter 2), and 2) what role the voice plays independently and jointly with the face in the transmission of affect and vocal type (the second paper, in Chapter 3). The first paper demonstrates that infants explore vocalizations in protophones and associate them with a range of affect as early as the first month of life. That is, all the protophone types we examined showed strong functional flexibility by showing significantly more neutral facial affect than cry and significantly less negative facial affect than cry. Further, infant protophones were functionally flexible across all three months, being differentiated from cry at all the ages. The second study revealed an important distinction in the use of face and voice in affect vs. protophone expression. Affect was transmitted with audio and video being flexibly interwoven, suggesting infant vocal capabilities establish a foundation for the flexible use of the voice, as is required in language. Both works contribute to our understanding of the path leading to the infants\u27 speech capacity

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    Acoustic and Respiratory Characteristics of Infant Vocalization

    Get PDF
    The purpose of this dissertation was to explore vibratory regime of infant phonation. The first study examined 1) differences in overall levels of acoustic and respiratory variables between different regimes and 2) differences in relationships between the acoustic and respiratory variables among regimes. The second study examined 3) the acoustic and respiratory ranges of modal phonation with respect to other regimes and 4) the range of modal phonation among infants of different ages. Two datasets were used in the study. Dataset I was acquired from eight infants of ages 8-18 months, and Dataset II from one infant of ages 4-6 months. Their vocalizations and respiratory movements were recorded during adult-interaction. Phonated segments were identified through waveform, spectrogram, and auditory inspection, and categorized into six mutually exclusive regimes (modal, pulse, loft, subharmonics, biphonation, and chaos). For each regime segment, the following measurements were made: fundamental frequency (F0), sound pressure level (SPL), expiratory slope, and relative lung volume at regime initiation. A series of linear mixed-effects model analysis and analysis of variance revealed differences in mean F0 between regimes, mean SPL, and mean. Correlations between the acoustic and respiratory variables differed among regimes, indicating their relationships were regime-dependent. The most revealing findings were that regime categories readily distributed into different regions of the intensity-frequency space, and that F0 ranges of modal regime tended to decrease with increasing age. In addition to modal, pulse, and loft distributing around the mid, low, and high intensity-frequency regions, respectively, biphonation and subharmonics were found between modal and loft ranges. The upper end of F0 range for pulse was much higher in infants compared to adults, however, biphonation and subharmonics rarely occurred between pulse and modal ranges. A range of modal F0 was about 500 Hz for the young infant in the vocal expansion stage, and about 200 Hz for older infants in the (post-)canonical stage. Although the results are tentative, this finding suggests that F0 variability decreases with age and phonation becomes more restricted to a lower end of an F0 range

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Automatic vocal recognition of a child's perceived emotional state within the Speechome corpus

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 137-149).With over 230,000 hours of audio/video recordings of a child growing up in the home setting from birth to the age of three, the Human Speechome Project has pioneered a comprehensive, ecologically valid observational dataset that introduces far-reaching new possibilities for the study of child development. By offering In vivo observation of a child's daily life experience at ultra-dense, longitudinal time scales, the Speechome corpus holds great potential for discovering developmental insights that have thus far eluded observation. The work of this thesis aspires to enable the use of the Speechome corpus for empirical study of emotional factors in early child development. To fully harness the benefits of Speechome for this purpose, an automated mechanism must be created to perceive the child's emotional state within this medium. Due to the latent nature of emotion, we sought objective, directly measurable correlates of the child's perceived emotional state within the Speechome corpus, focusing exclusively on acoustic features of the child's vocalizations and surrounding caretaker speech. Using Partial Least Squares regression, we applied these features to build a model that simulates human perceptual heuristics for determining a child's emotional state. We evaluated the perceptual accuracy of models built across child-only, adult-only, and combined feature sets within the overall sampled dataset, as well as controlling for social situations, vocalization behaviors (e.g. crying, laughing, babble), individual caretakers, and developmental age between 9 and 24 months. Child and combined models consistently demonstrated high perceptual accuracy, with overall adjusted R-squared values of 0.54 and 0.58, respectively, and an average of 0.59 and 0.67 per month. Comparative analysis across longitudinal and socio-behavioral contexts yielded several notable developmental and dyadic insights. In the process, we have developed a data mining and analysis methodology for modeling perceived child emotion and quantifying caretaker intersubjectivity that we hope to extend to future datasets across multiple children, as new deployments of the Speechome recording technology are established. Such large-scale comparative studies promise an unprecedented view into the nature of emotional processes in early childhood and potentially enlightening discoveries about autism and other developmental disorders.by Sophia Yuditskaya.S.M

    Infant Cry Signal Processing, Analysis, and Classification with Artificial Neural Networks

    Get PDF
    As a special type of speech and environmental sound, infant cry has been a growing research area covering infant cry reason classification, pathological infant cry identification, and infant cry detection in the past two decades. In this dissertation, we build a new dataset, explore new feature extraction methods, and propose novel classification approaches, to improve the infant cry classification accuracy and identify diseases by learning infant cry signals. We propose a method through generating weighted prosodic features combined with acoustic features for a deep learning model to improve the performance of asphyxiated infant cry identification. The combined feature matrix captures the diversity of variations within infant cries and the result outperforms all other related studies on asphyxiated baby crying classification. We propose a non-invasive fast method of using infant cry signals with convolutional neural network (CNN) based age classification to diagnose the abnormality of infant vocal tract development as early as 4-month age. Experiments discover the pattern and tendency of the vocal tract changes and predict the abnormality of infant vocal tract by classifying the cry signals into younger age category. We propose an approach of generating hybrid feature set and using prior knowledge in a multi-stage CNNs model for robust infant sound classification. The dominant and auxiliary features within the set are beneficial to enlarge the coverage as well as keeping a good resolution for modeling the diversity of variations within infant sound and the experimental results give encouraging improvements on two relative databases. We propose an approach of graph convolutional network (GCN) with transfer learning for robust infant cry reason classification. Non-fully connected graphs based on the similarities among the relevant nodes are built to consider the short-term and long-term effects of infant cry signals related to inner-class and inter-class messages. With as limited as 20% of labeled training data, our model outperforms that of the CNN model with 80% labeled training data in both supervised and semi-supervised settings. Lastly, we apply mel-spectrogram decomposition to infant cry classification and propose a fusion method to further improve the infant cry classification performance

    Processing of nonverbal vocalisations in dementia

    Get PDF
    Nonverbal emotional vocalisations are fundamental communicative signals used to convey a diverse repertoire of social and emotional information. They transcend the boundaries of language and cultural specificity that hamper many neuropsychological tests, making them ideal candidates for understanding impaired socio-emotional signal processing in dementia. Symptoms related to changes in social behaviour and emotional responsiveness are poorly understood yet have significant impact on patients with dementia and those who care for them. In this thesis, I investigated processing of nonverbal emotional vocalisations in patients with Alzheimer’s disease and frontotemporal dementia (FTD), a disease spectrum encompassing three canonical syndromes characterised by marked socio-emotional and communication difficulties - behavioural variant FTD (bvFTD), semantic variant primary progressive aphasia (svPPA) and nonfluent/agrammatic variant primary progressive aphasia (nfvPPA). I demonstrated distinct profiles of impairment in identifying three salient vocalisations (laughter, crying and screaming) and the emotions they convey. All three FTD syndromes showed impairments, with the most marked deficits of emotion categorisation seen in the bvFTD group. Voxel-based morphometry was used to define critical brain substrates for processing vocalisations, identifying correlates of vocal sound processing with auditory perceptual regions (superior temporal sulcus and posterior insula) and emotion identification with limbic and medial frontal regions. The second half of this thesis focused on the more fine-grained distinction of laughter subtypes. I studied cognitive (labelling), affective (valence) and autonomic (pupillometric) processing of laughter subtypes representing dimensions of valence (mirthful versus hostile) and arousal (spontaneous versus posed). Again, FTD groups showed greatest impairment with profiles suggestive of primary perceptual deficits in nfvPPA, cognitive overgeneralisation in svPPA and disordered reward and hedonic valuation in bvFTD. Neuroanatomical correlates of explicit laughter identification included inferior frontal and cingulo-insular cortices whilst implicit processing (indexed as autonomic arousal) was particularly impaired in those conditions associated with insular compromise (nfvPPA and bvFTD). These findings demonstrate the potential of nonverbal emotional vocalisations as a probe of neural mechanisms underpinning socio-emotional dysfunction in neurodegenerative diseases

    Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference

    Caractérisation des cris des nourrissons en vue du diagnostic précoce de différentes pathologies

    Get PDF
    L’utilisation des signaux de cris dans le diagnostic se base sur les théories qui ont été proposées par les différents chercheurs dans le domaine. Le principal objectif de leurs travaux était l’analyse spectrographique ainsi que la modélisation des signaux de cris. Ils ont démontré que les caractéristiques acoustiques des cris des nouveau-nés sont liées à des conditions médicales particulières. Cette thèse est destinée à contribuer à l’amélioration de la précision de la reconnaissance des cris pathologiques par la combinaison de plusieurs paramètres acoustiques issus de l'analyse spectrographique et des paramètres qui qualifient les cordes et le conduit vocal. Car les caractéristiques acoustiques représentant le conduit vocal ont été largement utilisées pour la classification des cris, alors que les caractéristiques des cordes vocales pour la reconnaissance automatique des cris, ainsi que leurs techniques efficaces d’extraction n’ont pas été exploitées. Pour répondre à cet objectif, nous avons procédé en premier lieu à une caractérisation qualitative des cris des nouveau-nés sains et malades en utilisant les caractéristiques qui ont été définies dans la littérature et qui qualifient le comportement des cordes et du conduit vocal pendant le cri. Cette étape nous a permis d’identifier les caractéristiques les plus importantes dans la différenciation des cris pathologiques étudiés. Pour l’extraction des caractéristiques sélectionnées, nous avons implémenté des méthodes de mesures efficaces permettant de dépasser la surestimation et la sous-estimation des caractéristiques. L’approche de quantification proposée et utilisée dans ce travail facilite l’analyse automatique des cris et permet une utilisation efficace de ces caractéristiques dans le système de diagnostic. Nous avons procédé aussi à des tests expérimentaux pour la validation de toutes les approches introduites dans cette thèse. Les résultats sont satisfaisants et montrent une amélioration dans la reconnaissance des cris par pathologie. Les travaux réalisés sont présentés dans cette thèse sous forme de trois articles publiés dans différents journaux. Deux autres articles publiés dans des comptes rendus de conférences avec comité de lecture sont présentés en annexes
    • …
    corecore