7,817 research outputs found

    A new approach for motherese detection using a semi-supervised algorithm

    Full text link
    International audienc

    SCALa: A blueprint for computational models of language acquisition in social context

    Get PDF
    International audienceTheories and data on language acquisition suggest a range of cues are used, ranging from information on structure found in the linguistic signal itself, to information gleaned from the environmental context or through social interaction. We propose a blueprint for computational models of the early language learner (SCALa, for Socio-Computational Architecture of Language Acquisition) that makes explicit the connection between the kinds of information available to the social learner and the computational mechanisms required to extract language-relevant information and learn from it. SCALa integrates a range of views on language acquisition, further allowing us to make precise recommendations for future large-scale empirical research

    Audio diarization for LENA data and its application to computing language behavior statistics for individuals with autism

    Get PDF
    The objective of this dissertation is to develop diarization algorithms for LENA data and study its application to compute language behavior statistics for individuals with autism. LENA device is one of the most commonly used devices to collect audio data in autism and language development studies. LENA child and adult detector algorithms were evaluated for two different datasets: i) older children dataset consisting of children already diagnosed with autism spectrum disor- der and ii) infants dataset consisting of infants at risk for autism. I-vector based diarization algorithms were developed for the two datasets to tackle two scenarios: a) some amount of labeled data is present for every speaker present in the audio recording and b) no labeled data is present for the audio recording to be diarized. Further, i-vector based diarization methods were applied to compute objective measures of assessment. These objective measures of assessment were analyzed to show they can reveal some aspects of autism severity. Also, a method to extract a 5 minute high child vocalization audio window from a 16 hour day long recording was developed, which was then used to compute canonical babble statistics using human annotation.Ph.D

    Automatic Recognition of Non-Verbal Acoustic Communication Events With Neural Networks

    Get PDF
    Non-verbal acoustic communication is of high importance to humans and animals: Infants use the voice as a primary communication tool. Animals of all kinds employ acoustic communication, such as chimpanzees, which use pant-hoot vocalizations for long-distance communication. Many applications require the assessment of such communication for a variety of analysis goals. Computational systems can support these areas through automatization of the assessment process. This is of particular importance in monitoring scenarios over large spatial and time scales, which are infeasible to perform manually. Algorithms for sound recognition have traditionally been based on conventional machine learning approaches. In recent years, so-called representation learning approaches have gained increasing popularity. This particularly includes deep learning approaches that feed raw data to deep neural networks. However, there remain open challenges in applying these approaches to automatic recognition of non-verbal acoustic communication events, such as compensating for small data set sizes. The leading question of this thesis is: How can we apply deep learning more effectively to automatic recognition of non-verbal acoustic communication events? The target communication types were specifically (1) infant vocalizations and (2) chimpanzee long-distance calls. This thesis comprises four studies that investigated aspects of this question: Study (A) investigated the assessment of infant vocalizations by laypersons. The central goal was to derive an infant vocalization classification scheme based on the laypersons' perception. The study method was based on the Nijmegen Protocol, where participants rated vocalization recordings through various items, such as affective ratings and class labels. Results showed a strong association between valence ratings and class labels, which was used to derive a classification scheme. Study (B) was a comparative study on various neural network types for the automatic classification of infant vocalizations. The goal was to determine the best performing network type among the currently most prevailing ones, while considering the influence of their architectural configuration. Results showed that convolutional neural networks outperformed recurrent neural networks and that the choice of the frequency and time aggregation layer inside the network is the most important architectural choice. Study (C) was a detailed investigation on computer vision-like convolutional neural networks for infant vocalization classification. The goal was to determine the most important architectural properties for increasing classification performance. Results confirmed the importance of the aggregation layer and additionally identified the input size of the fully-connected layers and the accumulated receptive field to be of major importance. Study (D) was an investigation on compensating class imbalance for chimpanzee call detection in naturalistic long-term recordings. The goal was to determine which compensation method among a selected group improved performance the most for a deep learning system. Results showed that spectrogram denoising was most effective, while methods for compensating relative imbalance either retained or decreased performance.:1. Introduction 2. Foundations in Automatic Recognition of Acoustic Communication 3. State of Research 4. Study (A): Investigation of the Assessment of Infant Vocalizations by Laypersons 5. Study (B): Comparison of Neural Network Types for Automatic Classification of Infant Vocalizations 6. Study (C): Detailed Investigation of CNNs for Automatic Classification of Infant Vocalizations 7. Study (D): Compensating Class Imbalance for Acoustic Chimpanzee Detection With Convolutional Recurrent Neural Networks 8. Conclusion and Collected Discussion 9. AppendixNonverbale akustische Kommunikation ist für Menschen und Tiere von großer Bedeutung: Säuglinge nutzen die Stimme als primäres Kommunikationsmittel. Schimpanse verwenden sogenannte 'Pant-hoots' und Trommeln zur Kommunikation über weite Entfernungen. Viele Anwendungen erfordern die Beurteilung solcher Kommunikation für verschiedenste Analyseziele. Algorithmen können solche Bereiche durch die Automatisierung der Beurteilung unterstützen. Dies ist besonders wichtig beim Monitoring langer Zeitspannen oder großer Gebiete, welche manuell nicht durchführbar sind. Algorithmen zur Geräuscherkennung verwendeten bisher größtenteils konventionelle Ansätzen des maschinellen Lernens. In den letzten Jahren hat eine alternative Herangehensweise Popularität gewonnen, das sogenannte Representation Learning. Dazu gehört insbesondere Deep Learning, bei dem Rohdaten in tiefe neuronale Netze eingespeist werden. Jedoch gibt es bei der Anwendung dieser Ansätze auf die automatische Erkennung von nonverbaler akustischer Kommunikation ungelöste Herausforderungen, wie z.B. die Kompensation der relativ kleinen Datenmengen. Die Leitfrage dieser Arbeit ist: Wie können wir Deep Learning effektiver zur automatischen Erkennung nonverbaler akustischer Kommunikation verwenden? Diese Arbeit konzentriert sich speziell auf zwei Kommunikationsarten: (1) vokale Laute von Säuglingen (2) Langstreckenrufe von Schimpansen. Diese Arbeit umfasst vier Studien, welche Aspekte dieser Frage untersuchen: Studie (A) untersuchte die Beurteilung von Säuglingslauten durch Laien. Zentrales Ziel war die Ableitung eines Klassifikationsschemas für Säuglingslaute auf der Grundlage der Wahrnehmung von Laien. Die Untersuchungsmethode basierte auf dem sogenannten Nijmegen-Protokoll. Hier beurteilten die Teilnehmenden Lautaufnahmen von Säuglingen anhand verschiedener Variablen, wie z.B. affektive Bewertungen und Klassenbezeichnungen. Die Ergebnisse zeigten eine starke Assoziation zwischen Valenzbewertungen und Klassenbezeichnungen, die zur Ableitung eines Klassifikationsschemas verwendet wurde. Studie (B) war eine vergleichende Studie verschiedener Typen neuronaler Netzwerke für die automatische Klassifizierung von Säuglingslauten. Ziel war es, den leistungsfähigsten Netzwerktyp unter den momentan verbreitetsten Typen zu ermitteln. Hierbei wurde der Einfluss verschiedener architektonischer Konfigurationen innerhalb der Typen berücksichtigt. Die Ergebnisse zeigten, dass Convolutional Neural Networks eine höhere Performance als Recurrent Neural Networks erreichten. Außerdem wurde gezeigt, dass die Wahl der Frequenz- und Zeitaggregationsschicht die wichtigste architektonische Entscheidung ist. Studie (C) war eine detaillierte Untersuchung von Computer Vision-ähnlichen Convolutional Neural Networks für die Klassifizierung von Säuglingslauten. Ziel war es, die wichtigsten architektonischen Eigenschaften zur Steigerung der Erkennungsperformance zu bestimmen. Die Ergebnisse bestätigten die Bedeutung der Aggregationsschicht. Zusätzlich Eigenschaften, die als wichtig identifiziert wurden, waren die Eingangsgröße der vollständig verbundenen Schichten und das akkumulierte rezeptive Feld. Studie (D) war eine Untersuchung zur Kompensation der Klassenimbalance zur Erkennung von Schimpansenrufen in Langzeitaufnahmen. Ziel war es, herauszufinden, welche Kompensationsmethode aus einer Menge ausgewählter Methoden die Performance eines Deep Learning Systems am meisten verbessert. Die Ergebnisse zeigten, dass Spektrogrammentrauschen am effektivsten war, während Methoden zur Kompensation des relativen Ungleichgewichts die Performance entweder gleichhielten oder verringerten.:1. Introduction 2. Foundations in Automatic Recognition of Acoustic Communication 3. State of Research 4. Study (A): Investigation of the Assessment of Infant Vocalizations by Laypersons 5. Study (B): Comparison of Neural Network Types for Automatic Classification of Infant Vocalizations 6. Study (C): Detailed Investigation of CNNs for Automatic Classification of Infant Vocalizations 7. Study (D): Compensating Class Imbalance for Acoustic Chimpanzee Detection With Convolutional Recurrent Neural Networks 8. Conclusion and Collected Discussion 9. Appendi

    Integrating Across Conceptual Spaces

    Get PDF
    It has been shown that structure is shared across multiple modalities in the real world: if we speak about two items in similar ways, then they are also likely to appear in similar visual contexts. Such similarity relationships are recapitulated across modalities for entire systems of concepts. This provides a signal that can be used to identify the correct mapping between modalities without relying on event-based learning, by a process of systems alignment. Because it depends on relationships within a modality, systems alignment can operate asynchronously, meaning that learning may not require direct labelling events (e.g., seeing a truck and hearing someone say the word ‘truck’). Instead, learning can occur based on linguistic and visual information which is received at different points in time (e.g., having overheard a conversation about trucks, and seeing one on the road the next day). This thesis explores the value of alignment in learning to integrate between conceptual systems. It takes a joint experimental and computational approach, which simultaneously facilitates insights on alignment processes in controlled environments and at scale. The role of alignment in learning is explored from three perspectives, yielding three distinct contributions. In Chapter 2, signatures of alignment are identified in a real-world setting: children’s early concept learning. Moving to a controlled experimental setting, Chapter 3 demonstrates that humans benefit from alignment signals in cross-system learning, and finds that models which attempt the asynchronous alignment of systems best capture human behaviour. Chapter 4 implements these insights in machine-learning systems, using alignment to tackle cross-modal learning problems at scale. Alignment processes prove valuable to human learning across conceptual systems, providing a fresh perspective on learning that complements prevailing event-based accounts. This research opens doors for machine learning systems to harness alignment mechanisms for cross-modal learning, thus reducing their reliance on extensive supervision by drawing inspiration from both human learning and the structure of the environment

    Similarities and differences in the functional architecture of mother-infant communication in rhesus macaque and British mother-infant dyads

    Get PDF
    Similarly to humans, rhesus macaques engage in mother-infant face-to-face interactions. However, no previous studies have described the naturally occurring structure and development of mother-infant interactions in this population and used a comparative-developmental perspective to directly compare them to the ones reported in humans. Here, we investigate the development of infant communication, and maternal responsiveness in the two groups. We video-recorded mother-infant interactions in both groups in naturalistic settings and analysed them with the same micro-analytic coding scheme. Results show that infant social expressiveness and maternal responsiveness are similarly structured in humans and macaques. Both human and macaque mothers use specific mirroring responses to specific infant social behaviours (modified mirroring to communicative signals, enriched mirroring to affiliative gestures). However, important differences were identified in the development of infant social expressiveness, and in forms of maternal responsiveness, with vocal responses and marking behaviours being predominantly human. Results indicate a common functional architecture of mother-infant communication in humans and monkeys, and contribute to theories concerning the evolution of specific traits of human behaviour
    corecore