2,007 research outputs found

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State Vowel Identification

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. Such a transformation enables speech to be understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitchindependent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Importance of spike timing in touch: an analogy with hearing?

    Get PDF
    Touch is often conceived as a spatial sense akin to vision. However, touch also involves the transduction and processing of signals that vary rapidly over time, inviting comparisons with hearing. In both sensory systems, first order afferents produce spiking responses that are temporally precise and the timing of their responses carries stimulus information. The precision and informativeness of spike timing in the two systems invites the possibility that both implement similar mechanisms to extract behaviorally relevant information from these precisely timed responses. Here, we explore the putative roles of spike timing in touch and hearing and discuss common mechanisms that may be involved in processing temporal spiking patterns

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    The Role of Prosodic Stress and Speech Perturbation on the Temporal Synchronization of Speech and Deictic Gestures

    Get PDF
    Gestures and speech converge during spoken language production. Although the temporal relationship of gestures and speech is thought to depend upon factors such as prosodic stress and word onset, the effects of controlled alterations in the speech signal upon the degree of synchrony between manual gestures and speech is uncertain. Thus, the precise nature of the interactive mechanism of speech-gesture production, or lack thereof, is not agreed upon or even frequently postulated. In Experiment 1, syllable position and contrastive stress were manipulated during sentence production to investigate the synchronization of speech and pointing gestures. An additional aim of Experiment 2 was to investigate the temporal relationship of speech and pointing gestures when speech is perturbed with delayed auditory feedback (DAF). Comparisons between the time of gesture apex and vowel midpoint (GA-VM) for each of the conditions were made for both Experiment 1 and Experiment 2. Additional comparisons of the interval between gesture launch midpoint to vowel midpoint (GLM-VM), total gesture time, gesture launch time, and gesture return time were made for Experiment 2. The results for the first experiment indicated that gestures were more synchronized with first position syllables and neutral syllables as measured GA-VM intervals. The first position syllable effect was also found in the second experiment. However, the results from Experiment 2 supported an effect of contrastive pitch effect. GLM-VM was shorter for first position targets and accented syllables. In addition, gesture launch times and total gesture times were longer for contrastive pitch accented syllables, especially when in the second position of words. Contrary to the predictions, significantly longer GA-VM and GLM-VM intervals were observed when individuals responded under provided delayed auditory feedback (DAF). Vowel and sentence durations increased both with (DAF) and when a contrastive accented syllable was produced. Vowels were longest for accented, second position syllables. These findings provide evidence that the timing of gesture is adjusted based upon manipulations of the speech stream. A potential mechanism of entrainment of the speech and gesture system is offered as an explanation for the observed effects

    Using fundamental frequency of cancer survivors’ speech to investigate emotional distress in out-patient visits

    Get PDF
    Objective Emotions, are in part conveyed by varying levels of fundamental frequency of voice pitch (f0). This study tests the hypothesis that patients display heightened levels of emotional arousal (f0) during Verona Coding Definitions of Emotional Sequences (VR-CoDES) cues and concerns versus during neutral statements. Methods The audio recordings of sixteen head and neck cancer survivors’ follow-up consultations were coded for patients’ emotional distress. Pitch (f0) of coded cues and concerns, including neutral statements was extracted. These were compared using a hierarchical linear model, nested for patient and pitch range, controlling for statement speech length. Utterance content was also explored. Results Clustering by patient explained 30% of the variance in utterances f0. Cues and concerns were on average 13.07 Hz higher than neutral statements (p = 0.02). Cues and concerns in these consultations contained content with a high proportion of recurrence fears. Conclusion The present study highlights the benefits and challenges of adding f0 and potential other prosodic features to the toolkit of coding emotional distress in the health communication setting. Practice implications The assessment of f0 during clinical conversations can provide additional information for research into emotional expression.PostprintPeer reviewe

    Seeing sound: a new way to illustrate auditory objects and their neural correlates

    Full text link
    This thesis develops a new method for time-frequency signal processing and examines the relevance of the new representation in studies of neural coding in songbirds. The method groups together associated regions of the time-frequency plane into objects defined by time-frequency contours. By combining information about structurally stable contour shapes over multiple time-scales and angles, a signal decomposition is produced that distributes resolution adaptively. As a result, distinct signal components are represented in their own most parsimonious forms.  Next, through neural recordings in singing birds, it was found that activity in song premotor cortex is significantly correlated with the objects defined by this new representation of sound. In this process, an automated way of finding sub-syllable acoustic transitions in birdsongs was first developed, and then increased spiking probability was found at the boundaries of these acoustic transitions. Finally, a new approach to study auditory cortical sequence processing more generally is proposed. In this approach, songbirds were trained to discriminate Morse-code-like sequences of clicks, and the neural correlates of this behavior were examined in primary and secondary auditory cortex. It was found that a distinct transformation of auditory responses to the sequences of clicks exists as information transferred from primary to secondary auditory areas. Neurons in secondary auditory areas respond asynchronously and selectively -- in a manner that depends on the temporal context of the click. This transformation from a temporal to a spatial representation of sound provides a possible basis for the songbird's natural ability to discriminate complex temporal sequences

    Properties of vocalization- and gesture-combinations in the transition to ïŹrst words

    Full text link
    This article has been published in a revised form in Journal of Child Language http://dx.doi.org/10.1017/S0305000915000343. This version is free to view and download for private research and study only. Not for re-distribution, re-sale or use in derivative works. © Cambridge University Press 2015Gestures and vocal elements interact from the early stages of language development, but the role of this interaction in the language learning process is not yet completely understood. The aim of this study is to explore gestural accompaniment’sinïŹ‚uence on the acoustic properties of vocalizations in the transition to ïŹrst words. Eleven Spanish children aged 0; 9 to 1; 3 were observed longitudinally in a semi-structured play situation with an adult. Vocalizations were analyzed using several acoustic parameters based on those described by Oller et al. (2010). Results indicate that declarative vocalizations have fewer protosyllables than imperative ones, but only when they are produced with a gesture. Protosyllables duration and f(0) are more similar to those of mature speech when produced with pointing and declarative function than when produced with reaching gestures and imperative purposes. The proportion of canonical syllables produced increases with age, but only when combined with a gestur

    The Case of the Missing Pitch Templates: How Harmonic Templates Emerge in the Early Auditory System

    Get PDF
    Periodicity pitch is the most salient and important of all pitch percepts.Psycho-acoustical models of this percept have long postulated the existenceof internalized harmonic templates against which incoming resolved spectracan be compared, and pitch determined according to the best matchingtemplates cite{goldstein:pitch}. However, it has been a mystery where andhow such harmonic templates can come about. Here we present a biologicallyplausible model for how such templates can form in the early stages of theauditory system. The model demonstrates that {it any} broadband stimulussuch as noise or random click trains, suffices for generating thetemplates, and that there is no need for any delay-lines, oscillators, orother neural temporal structures. The model consists of two key stages:cochlear filtering followed by coincidence detection. The cochlear stageprovides responses analogous to those seen on the auditory-nerve andcochlear nucleus. Specifically, it performs moderately sharp frequencyanalysis via a filter-bank with tonotopically ordered center frequencies(CFs); the rectified and phase-locked filter responses are further enhancedtemporally to resemble the synchronized responses of cells in the cochlearnucleus. The second stage is a matrix of coincidence detectors thatcompute the average pair-wise instantaneous correlation (or product)between responses from all CFs across the channels. Model simulations showthat for any broadband stimulus, high coincidences occur between cochlearchannels that are exactly harmonic distances apart. Accumulatingcoincidences over time results in the formation of harmonic templates forall fundamental frequencies in the phase-locking frequency range. Themodel explains the critical role played by three subtle but importantfactors in cochlear function: the nonlinear transformations following thefiltering stage; the rapid phase-shifts of the traveling wave near itsresonance; and the spectral resolution of the cochlear filters. Finally, wediscuss the physiological correlates and location of such a process and itsresulting templates

    Multipitch Analysis and Tracking for Automatic Music Transcription

    Get PDF
    Music has always played a large role in human life. The technology behind the art has progressed and grown over time in many areas, for instance the instruments themselves, the recording equipment used in studios, and the reproduction through digital signal processing. One facet of music that has seen very little attention over time is the ability to transcribe audio files into musical notation. In this thesis, a method of multipitch analysis is used to track multiple simultaneous notes through time in an audio music file. The analysis method is based on autocorrelation and a specialized peak pruning method to identify only the fundamental frequencies present at any single moment in the sequence. A sliding Hamming window is used to step through the input sound file and track through time. Results show the tracking of nontrivial musical patterns over two octaves in range and varying tempos

    Idealized computational models for auditory receptive fields

    Full text link
    This paper presents a theory by which idealized models of auditory receptive fields can be derived in a principled axiomatic manner, from a set of structural properties to enable invariance of receptive field responses under natural sound transformations and ensure internal consistency between spectro-temporal receptive fields at different temporal and spectral scales. For defining a time-frequency transformation of a purely temporal sound signal, it is shown that the framework allows for a new way of deriving the Gabor and Gammatone filters as well as a novel family of generalized Gammatone filters, with additional degrees of freedom to obtain different trade-offs between the spectral selectivity and the temporal delay of time-causal temporal window functions. When applied to the definition of a second-layer of receptive fields from a spectrogram, it is shown that the framework leads to two canonical families of spectro-temporal receptive fields, in terms of spectro-temporal derivatives of either spectro-temporal Gaussian kernels for non-causal time or the combination of a time-causal generalized Gammatone filter over the temporal domain and a Gaussian filter over the logspectral domain. For each filter family, the spectro-temporal receptive fields can be either separable over the time-frequency domain or be adapted to local glissando transformations that represent variations in logarithmic frequencies over time. Within each domain of either non-causal or time-causal time, these receptive field families are derived by uniqueness from the assumptions. It is demonstrated how the presented framework allows for computation of basic auditory features for audio processing and that it leads to predictions about auditory receptive fields with good qualitative similarity to biological receptive fields measured in the inferior colliculus (ICC) and primary auditory cortex (A1) of mammals.Comment: 55 pages, 22 figures, 3 table
    • 

    corecore