2,496 research outputs found

    Auditory communication in domestic dogs: vocal signalling in the extended social environment of a companion animal

    Get PDF
    Domestic dogs produce a range of vocalisations, including barks, growls, and whimpers, which are shared with other canid species. The source–filter model of vocal production can be used as a theoretical and applied framework to explain how and why the acoustic properties of some vocalisations are constrained by physical characteristics of the caller, whereas others are more dynamic, influenced by transient states such as arousal or motivation. This chapter thus reviews how and why particular call types are produced to transmit specific types of information, and how such information may be perceived by receivers. As domestication is thought to have caused a divergence in the vocal behaviour of dogs as compared to the ancestral wolf, evidence of both dog–human and human–dog communication is considered. Overall, it is clear that domestic dogs have the potential to acoustically broadcast a range of information, which is available to conspecific and human receivers. Moreover, dogs are highly attentive to human speech and are able to extract speaker identity, emotional state, and even some types of semantic information

    Recognizing Speech in a Novel Accent: The Motor Theory of Speech Perception Reframed

    Get PDF
    The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory

    Language Identification Using Visual Features

    Get PDF
    Automatic visual language identification (VLID) is the technology of using information derived from the visual appearance and movement of the speech articulators to iden- tify the language being spoken, without the use of any audio information. This technique for language identification (LID) is useful in situations in which conventional audio processing is ineffective (very noisy environments), or impossible (no audio signal is available). Research in this field is also beneficial in the related field of automatic lip-reading. This paper introduces several methods for visual language identification (VLID). They are based upon audio LID techniques, which exploit language phonology and phonotactics to discriminate languages. We show that VLID is possible in a speaker-dependent mode by discrimi- nating different languages spoken by an individual, and we then extend the technique to speaker-independent operation, taking pains to ensure that discrimination is not due to artefacts, either visual (e.g. skin-tone) or audio (e.g. rate of speaking). Although the low accuracy of visual speech recognition currently limits the performance of VLID, we can obtain an error-rate of < 10% in discriminating between Arabic and English on 19 speakers and using about 30s of visual speech

    Acoustics and Resonance in Poetry: The Psychological Reality of Rhyme in Baudelaire’s “Les Chats”

    Get PDF
    This article uses the term “psychological reality” in this sense: the extent to which the constructs of linguistic theory can be taken to have a basis in the human mind, i.e., to somehow be reflected in human cognitive structures. This article explores the human cognitive structures in which the constructs of phonetic theory may be reflected. The last section is a critique of the psychological reality of sound patterns in Baudelaire’s “Les Chats”, as discussed in three earlier articles. In physical terms, it defines “resonant” as “tending to reinforce or prolong sounds, especially by synchronous vibration”. In phonetic terms it defines “resonant” as “where intense precategorical auditory information lingers in short-term memory”. The effect of rhyme in poetry is carried by similar overtones vibrating in the rhyme fellows, resonating like similar overtones on the piano. In either case, we do not compare overtones item by item, just hear their synchronous vibration. I contrast this conception to three approaches: one that points out similar sounds of “internal rhymes”, irrespective of whether they may be contained within the span of short-term memory (i.e., whether they may have psychological relit); one that claims that syntactic complexity may cancel the psychological reality of “internal rhymes” (whereas I claim that it merely backgrounds rhyme); and one that found through an eye-tracking experiment that readers fixate longer on verse-final rhymes than on other words, assuming regressive eye-movement (I claim that rhyme is an acoustic not visual phenomenon; and that there is a tendency to indicate discontinuation by prolonging the last sounds in ordinary speech and blank verse too, as well as in music — where no rhyme is involved)

    Visual units and confusion modelling for automatic lip-reading

    Get PDF
    Automatic lip-reading (ALR) is a challenging task because the visual speech signal is known to be missing some important information, such as voicing. We propose an approach to ALR that acknowledges that this information is missing but assumes that it is substituted or deleted in a systematic way that can be modelled. We describe a system that learns such a model and then incorporates it into decoding, which is realised as a cascade of weighted finite-state transducers. Our results show a small but statistically significant improvement in recognition accuracy. We also investigate the issue of suitable visual units for ALR, and show that visemes are sub-optimal, not but because they introduce lexical ambiguity, but because the reduction in modelling units entailed by their use reduces accuracy

    Articulatory features for robust visual speech recognition

    Full text link

    Music in the first days of life

    Get PDF
    In adults, specific neural systems with right-hemispheric weighting are necessary to process pitch, melody and harmony, as well as structure and meaning emerging from musical sequences. To which extent does this neural specialization result from exposure to music or from neurobiological predispositions? We used fMRI to measure brain activity in 1 to 3 days old newborns while listening to Western tonal music, and to the same excerpts altered, so as to include tonal violations or dissonance. Music caused predominant right hemisphere activations in primary and higher-order auditory cortex. For altered music, activations were seen in the left inferior frontal cortex and limbic structures. Thus, the newborn&#x27;s brain is able to plenty receive music and to figure out even small perceptual and structural differences in the music sequences. This neural architecture present at birth provides us the potential to process basic and complex aspects of music, a uniquely human capacity
    corecore