24,752 research outputs found
Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition
This paper presents a self-supervised method for visual detection of the
active speaker in a multi-person spoken interaction scenario. Active speaker
detection is a fundamental prerequisite for any artificial cognitive system
attempting to acquire language in social settings. The proposed method is
intended to complement the acoustic detection of the active speaker, thus
improving the system robustness in noisy conditions. The method can detect an
arbitrary number of possibly overlapping active speakers based exclusively on
visual information about their face. Furthermore, the method does not rely on
external annotations, thus complying with cognitive development. Instead, the
method uses information from the auditory modality to support learning in the
visual domain. This paper reports an extensive evaluation of the proposed
method using a large multi-person face-to-face interaction dataset. The results
show good performance in a speaker dependent setting. However, in a speaker
independent setting the proposed method yields a significantly lower
performance. We believe that the proposed method represents an essential
component of any artificial cognitive system or robotic platform engaging in
social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System
A summary of the 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition
We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.5 page(s
An exploration of sarcasm detection in children with Attention Deficit Hyperactivity Disorder
This document is the Accepted Manuscript version of the following article: Amanda K. Ludlow, Eleanor Chadwick, Alice Morey, Rebecca Edwards, and Roberto Gutierrez, ‘An exploration of sarcasm detection in children with Attention Deficit Hyperactivity Disorder’, Journal of Communication Disorders, Vol. 70: 25-34, November 2017. Under embargo. Embargo end date: 31 October 2019. The Version of Record is available at doi: https://doi.org/10.1016/j.jcomdis.2017.10.003.The present research explored the ability of children with ADHD to distinguish between sarcasm and sincerity. Twenty-two children with a clinical diagnosis of ADHD were compared with 22 age and verbal IQ matched typically developing children using the Social Inference–Minimal Test from The Awareness of Social Inference Test (TASIT, McDonald, Flanagan, & Rollins, 2002). This test assesses an individual’s ability to interpret naturalistic social interactions containing sincerity, simple sarcasm and paradoxical sarcasm. Children with ADHD demonstrated specific deficits in comprehending paradoxical sarcasm and they performed significantly less accurately than the typically developing children. While there were no significant differences between the children with ADHD and the typically developing children in their ability to comprehend sarcasm based on the speaker’s intentions and beliefs, the children with ADHD were found to be significantly less accurate when basing their decision on the feelings of the speaker, but also on what the speaker had said. Results are discussed in light of difficulties in their understanding of complex cues of social interactions, and non-literal language being symptomatic of children with a clinical diagnosis of ADHD. The importance of pragmatic language skills in their ability to detect social and emotional information is highlighted.Peer reviewe
Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization
Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
Effects of simultaneous speech and sign on infants’ attention to spoken language
Objectives: To examine the hypothesis that infants receiving a degraded auditory signal have more difficulty segmenting words from fluent speech if familiarized with the words presented in both speech and sign compared to familiarization with the words presented in speech only. Study Design: Experiment utilizing an infant-controlled visual preference procedure. Methods: Twenty 8.5-month-old normal-hearing infants completed testing. Infants were familiarized with repetitions of words in either the speech + sign (n = 10) or the speech only (n = 10) condition. Results: Infants were then presented with four six-sentence passages using an infant-controlled visual preference procedure. Every sentence in two of the passages contained the words presented in the familiarization phase, whereas none of the sentences in the other two passages contained familiar words.Infants exposed to the speech + sign condition looked at familiar word passages for 15.3 seconds and at nonfamiliar word passages for 15.6 seconds, t (9) = -0.130, p = .45. Infants exposed to the speech only condition looked at familiar word passages for 20.9 seconds and to nonfamiliar word passages for 15.9 seconds. This difference was statistically significant, t (9) = 2.076, p = .03. Conclusions: Infants\u27 ability to segment words from degraded speech is negatively affected when these words are initially presented in simultaneous speech and sign. The current study suggests that a decreased ability to segment words from fluent speech may contribute towards the poorer performance of pediatric cochlear implant recipients in total communication settings on a wide range of spoken language outcome measures
A Formal Framework for Linguistic Annotation
`Linguistic annotation' covers any descriptive or analytic notations applied
to raw language data. The basic data may be in the form of time functions --
audio, video and/or physiological recordings -- or it may be textual. The added
notations may include transcriptions of all sorts (from phonetic features to
discourse structures), part-of-speech and sense tagging, syntactic analysis,
`named entity' identification, co-reference annotation, and so on. While there
are several ongoing efforts to provide formats and tools for such annotations
and to publish annotated linguistic databases, the lack of widely accepted
standards is becoming a critical problem. Proposed standards, to the extent
they exist, have focussed on file formats. This paper focuses instead on the
logical structure of linguistic annotations. We survey a wide variety of
existing annotation formats and demonstrate a common conceptual core, the
annotation graph. This provides a formal framework for constructing,
maintaining and searching linguistic annotations, while remaining consistent
with many alternative data structures and file formats.Comment: 49 page
- …