8,395 research outputs found

    A Spectral Network Model of Pitch Perception

    Full text link
    A model of pitch perception, called the Spatial Pitch Network or SPINET model, is developed and analyzed. The model neurally instantiates ideas front the spectral pitch modeling literature and joins them to basic neural network signal processing designs to simulate a broader range of perceptual pitch data than previous spectral models. The components of the model arc interpreted as peripheral mechanical and neural processing stages, which arc capable of being incorporated into a larger network architecture for separating multiple sound sources in the environment. The core of the new model transforms a spectral representation of an acoustic source into a spatial distribution of pitch strengths. The SPINET model uses a weighted "harmonic sieve" whereby the strength of activation of a given pitch depends upon a weighted sum of narrow regions around the harmonics of the nominal pitch value, and higher harmonics contribute less to a pitch than lower ones. Suitably chosen harmonic weighting functions enable computer simulations of pitch perception data involving mistuned components, shifted harmonics, and various types of continuous spectra including rippled noise. It is shown how the weighting functions produce the dominance region, how they lead to octave shifts of pitch in response to ambiguous stimuli, and how they lead to a pitch region in response to the octave-spaced Shepard tone complexes and Deutsch tritones without the use of attentional mechanisms to limit pitch choices. An on-center off-surround network in the model helps to produce noise suppression, partial masking and edge pitch. Finally, it is shown how peripheral filtering and short term energy measurements produce a model pitch estimate that is sensitive to certain component phase relationships.Air Force Office of Scientific Research (F49620-92-J-0225); American Society for Engineering Educatio

    The temporal structure of urban soundscapes

    Get PDF

    Acoustic Correlates of Word Stress as A Cue to Accent Strength

    Get PDF
    Due to the clear interference of their mother tongue prosody, many Czech learners produce their English with a conspicuous foreign accent. The goal of the present study is to investigate the acoustic cues that differentiate stressed and unstressed syllabic nuclei and identify individual details concerning their contribution to the specific sound of Czech English. Speech production of sixteen female non-professional Czech and British speakers was analysed with the sounds segmented on a word and phone level and with both canonical and actual stress positions manually marked. Prior to analyses the strength of the foreign accent was assessed in a perception test. Subsequently, stressed and unstressed vowels were measured with respect to their duration, amplitude, fundamental frequency and spectral slope. Our results show that, in general, Czech speakers use much less acoustic marking of stress than the British subjects. The difference is most prominent in the domains of fundamental frequency and amplitude. The Czech speakers also deviate from the canonical placement of stress, shifting it frequently to the first syllable. On the other hand, they seem to approximate the needed durational difference quite successfully. These outcomes support the concept of language interference since they correspond with the existing linguistic knowledge about Czech and English word stress. The study adds specific details concerning the extent of this interference in four acoustic dimensions

    A Neural Network for Synthesizing the Pitch of an Acoustic Source

    Full text link
    This article describes a neural network model capable of generating a spatial representation of the pitch of an acoustic source. Pitch is one of several auditory percepts used by humans to separate multiple sound sources in the environment from each other. The model provides a neural instantiation of a type of "harmonic sieve". It is capable of quantitatively simulating a large body of psychoacoustical data, including new data on octave shift perception.Air Force Office of Scientific Research (90-0128, 90-0175); Defense Advanced Research Projects Agency (90-0083); National Science Foundation (IRI 90-24877); American Society for Engineering Educatio

    Investigating computational models of perceptual attack time

    Get PDF
    The perceptual attack time (PAT) is the compensation for differing attack components of sounds, in the case of seeking a perceptually isochronous presentation of sounds. It has applications in scheduling and is related to, but not necessarily the same as, the moment of perceptual onset. This paper describes a computational investigation of PAT over a set of 25 synthesised stimuli, and a larger database of 100 sounds equally divided into synthesised and ecological. Ground truth PATs for modeling were obtained by the alternating presentation paradigm, where subjects adjusted the relative start time of a reference click and the sound to be judged. Whilst fitting experimental data from the 25 sound set was plausible, difficulties with existing models were found in the case of the larger test set. A pragmatic solution was obtained using a neural net architecture. In general, learnt schema of sound classification may be implicated in resolving the multiple detection cues evoked by complex sounds

    Modulation-frequency acts as a primary cue for auditory stream segregation

    Get PDF
    In our surrounding acoustic world sounds are produced by different sources and interfere with each other before arriving to the ears. A key function of the auditory system is to provide consistent and robust descriptions of the coherent sound groupings and sequences (auditory objects), which likely correspond to the various sound sources in the environment. This function has been termed auditory stream segregation. In the current study we tested the effects of separation in the frequency of amplitude modulation on the segregation of concurrent sound sequences in the auditory stream-segregation paradigm (van Noorden 1975). The aim of the study was to assess 1) whether differential amplitude modulation would help in separating concurrent sound sequences and 2) whether this cue would interact with previously studied static cues (carrier frequency and location difference) in segregating concurrent streams of sound. We found that amplitude modulation difference is utilized as a primary cue for the stream segregation and it interacts with other primary cues such as frequency and location difference

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State Vowel Identification

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. Such a transformation enables speech to be understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitchindependent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
    • 

    corecore