2,012 research outputs found

    A Spectral Network Model of Pitch Perception

    Full text link
    A model of pitch perception, called the Spatial Pitch Network or SPINET model, is developed and analyzed. The model neurally instantiates ideas front the spectral pitch modeling literature and joins them to basic neural network signal processing designs to simulate a broader range of perceptual pitch data than previous spectral models. The components of the model arc interpreted as peripheral mechanical and neural processing stages, which arc capable of being incorporated into a larger network architecture for separating multiple sound sources in the environment. The core of the new model transforms a spectral representation of an acoustic source into a spatial distribution of pitch strengths. The SPINET model uses a weighted "harmonic sieve" whereby the strength of activation of a given pitch depends upon a weighted sum of narrow regions around the harmonics of the nominal pitch value, and higher harmonics contribute less to a pitch than lower ones. Suitably chosen harmonic weighting functions enable computer simulations of pitch perception data involving mistuned components, shifted harmonics, and various types of continuous spectra including rippled noise. It is shown how the weighting functions produce the dominance region, how they lead to octave shifts of pitch in response to ambiguous stimuli, and how they lead to a pitch region in response to the octave-spaced Shepard tone complexes and Deutsch tritones without the use of attentional mechanisms to limit pitch choices. An on-center off-surround network in the model helps to produce noise suppression, partial masking and edge pitch. Finally, it is shown how peripheral filtering and short term energy measurements produce a model pitch estimate that is sensitive to certain component phase relationships.Air Force Office of Scientific Research (F49620-92-J-0225); American Society for Engineering Educatio

    Probabilistic models of contextual effects in Auditory Pitch Perception

    Get PDF
    Perception was recognised by Helmholtz as an inferential process whereby learned expectations about the environment combine with sensory experience to give rise to percepts. Expectations are flexible, built from past experiences over multiple time-scales. What is the nature of perceptual expectations? How are they learned? How do they affect perception? These are the questions I propose to address in this thesis. I focus on two important yet simple perceptual attributes of sounds whose perception is widely regarded as effortless and automatic : pitch and frequency. In a first study, I aim to propose a definition of pitch as the solution of a computational goal. Pitch is a fundamental and salient perceptual attribute of many behaviourally important sounds including speech and music. The effortless nature of its perception has led to the search for a direct physical correlate of pitch and for mechanisms to extract pitch from peripheral neural responses. I propose instead that pitch is the outcome of a probabilistic inference of an underlying periodicity in sounds given a learned statistical prior over naturally pitch-evoking sounds, explaining in a single model a wide range of psychophysical results. In two other psychophysical studies I study how and at what time-scales recent sensory history affects the perception of frequency shifts and pitch shifts. (1) When subjects are presented with ambiguous pitch shifts (using octave ambiguous Shepard tone pairs), I show that sensory history is used to leverage the ambiguity in a way that reflects expectations of spectro-temporal continuity of auditory scenes. (2) In delayed 2 tone frequency discrimination tasks, I explore the contraction bias : when asked to report which of two tones separated by brief silence is higher, subjects behave as though they hear the earlier tone ’contracted’ in frequency towards a combination of recently presented stimulus frequencies, and the mean of the overall distribution of tones used in the experiment. I propose that expectations - the statistical learning of the sampled stimulus distribution - are built online and combined with sensory evidence in a statistically optimal fashion. Models derived in the thesis embody the concept of perception as unconscious inference. The results support the view that even apparently primitive acoustic percepts may derive from subtle statistical inference, suggesting that such inferential processes operate at all levels across our sensory systems

    Examining the Influence of Spectral Envelope Shape on Pitch

    Get PDF
    In the tritone paradox, there are many questions surrounding how listeners can make pitch judgments since Shepard tones are comprised of all octaves and this makes pitch ambiguous. The current study examined the influences from spectral envelope shape, spectral centroid, chroma, and musical training to identify how timbre and pitch interactions impacted pitch judgments to different tone types, including Shepard tones. Each trial consisted of a standard and a comparison tone differing by spectral envelope shape. Listeners were presented with these tone pairs and asked to judge whether the tone pairs were going up or down in pitch. For Shepard tones, sensitivity varied across centroid and chroma while acoustic analyses of the Shepard tones supported that pitch judgment performance was predicted by an aspect of spectral envelope shape, the relative amplitude of the F0. The current study suggests that listeners first try to use the F0 to make a pitch judgment, and when that component is not resolvable, systematically process the next components until they can judge pitch. This idea of a shared pitch processing mechanism is applicable to all tone types, including Shepard tones, and provides an explanation for the observed pattern of results seen in the tritone paradox. Future research should aim to confirm the relative amplitude of the F0 predicts pitch judgment using the tritone paradox procedure to determine how listeners process pitch for Shepard tones

    HARMONIC INTONATION AND IMPLICATION (ANALYSES AND COMPOSITIONS): Harmonic perception and intonation in the reception and performance of alternative tuning systems in contemporary composition

    Get PDF
    Most composers and theorists will acknowledge that some compromise is necessary when dealing with the limitations of human performance, perception, and the realities of acoustic theory. Identifying the thresholds for pitch discrimination and execution is an important point of departure for defining workable tuning schemes, and for training musicians to realise compositions in just intonation and other alternative tuning systems. The submitted paper 'HARMONIC INTONATION AND IMPLICATION (ANALYSES AND COMPOSITIONS): Harmonic perception and intonation in the reception and performance of alternative tuning systems in contemporary composition' is a phenomenological study of harmonic perception and intonation through the analysis of recordings, scores, theoretical papers, and discussion with practicing musicians. The examined repertoire covers western 'art' music of the late nineteenth to early twenty-first centuries. I approach my research from the composer's point of view though filtered through the ears and eyes of the performer, who is here considered 'expert listener'. lt is considered that intonation is a dynamic experience subject to influences beyond just intonation or equal temperament (the two poles for intonational reference)-the performance is assumed 'correct', rather than the idealised version of the composer. My goal is to relate the performance to the intentions of the composer and raise questions regarding the choice of notation, resolution of the tuning systems, the complexity of the harmonic concept, etc. and perhaps to suggest how to extend a general theory of harmony that embraces both musical practice and psychoacoustics. lt is with the understanding that harmonic implication affects intonation, but that intonation is subject to several other forces making intonation a complex system (and therefore not fully predictable)

    ARSTREAM: A Neural Network Model of Auditory Scene Analysis and Source Segregation

    Full text link
    Multiple sound sources often contain harmonics that overlap and may be degraded by environmental noise. The auditory system is capable of teasing apart these sources into distinct mental objects, or streams. Such an "auditory scene analysis" enables the brain to solve the cocktail party problem. A neural network model of auditory scene analysis, called the AIRSTREAM model, is presented to propose how the brain accomplishes this feat. The model clarifies how the frequency components that correspond to a give acoustic source may be coherently grouped together into distinct streams based on pitch and spatial cues. The model also clarifies how multiple streams may be distinguishes and seperated by the brain. Streams are formed as spectral-pitch resonances that emerge through feedback interactions between frequency-specific spectral representaion of a sound source and its pitch. First, the model transforms a sound into a spatial pattern of frequency-specific activation across a spectral stream layer. The sound has multiple parallel representations at this layer. A sound's spectral representation activates a bottom-up filter that is sensitive to harmonics of the sound's pitch. The filter activates a pitch category which, in turn, activate a top-down expectation that allows one voice or instrument to be tracked through a noisy multiple source environment. Spectral components are suppressed if they do not match harmonics of the top-down expectation that is read-out by the selected pitch, thereby allowing another stream to capture these components, as in the "old-plus-new-heuristic" of Bregman. Multiple simultaneously occuring spectral-pitch resonances can hereby emerge. These resonance and matching mechanisms are specialized versions of Adaptive Resonance Theory, or ART, which clarifies how pitch representations can self-organize durin learning of harmonic bottom-up filters and top-down expectations. The model also clarifies how spatial location cues can help to disambiguate two sources with similar spectral cures. Data are simulated from psychophysical grouping experiments, such as how a tone sweeping upwards in frequency creates a bounce percept by grouping with a downward sweeping tone due to proximity in frequency, even if noise replaces the tones at their interection point. Illusory auditory percepts are also simulated, such as the auditory continuity illusion of a tone continuing through a noise burst even if the tone is not present during the noise, and the scale illusion of Deutsch whereby downward and upward scales presented alternately to the two ears are regrouped based on frequency proximity, leading to a bounce percept. Since related sorts of resonances have been used to quantitatively simulate psychophysical data about speech perception, the model strengthens the hypothesis the ART-like mechanisms are used at multiple levels of the auditory system. Proposals for developing the model to explain more complex streaming data are also provided.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-92-J-0225); Office of Naval Research (N00014-01-1-0624); Advanced Research Projects Agency (N00014-92-J-4015); British Petroleum (89A-1204); National Science Foundation (IRI-90-00530); American Society of Engineering Educatio

    The effects of timbre on perceptual grouping in a melodic sequence

    Get PDF
    The current investigation sought to examine the effects of timbre on perceptual grouping in melodic sequences. While past research has shown that timbre shifts influence listeners’ pitch perception on a note-to-note basis (e.g., see Pitt, 2004; Russo & Thompson, 2005, & Creel, Newport, & Aslin, 2004), the current investigation extended this to timbre’s influence on pitch perception in the context of a melodic phrase. In Experiment 1, participants were presented with melodic sequences, made of sawtooth-like waves. Sequences, consisting of 6 tones, were followed by a target tone that had a static, dull, or bright timbre shift through the use of low-pass filters in order to shift the spectral centroid. Target tones were equally presented at ascending and descending interval sizes of a minor 3rd, perfect 4th, and minor 6th. These target tones were paired with timbre conditions, equally, to create timbre shifts that were static, where timbre did not change at all, congruent, where timbre and the pitch of the target moved in the same direction, and incongruent, where timbre and pitch moved in opposite directions. Participants were tasked with rating how well the target tone belonged to the sequence before it. Experiment 2 extended a similar approach to instrumental stimuli. Cello samples were filtered so that the corresponding impact on spectral centroids was similar to the timbre manipulation in Experiment 1. Contrary to hypotheses, participants rated target tones as being less likely to continue the initial melody if any form of timbre shift was present, regardless of interval size. This effect of timbre suggests that it was not subsumed by high-order processes of melody perception. As hypothesized, this effect was negatively related to musical training. Additionally, as expected, interval size influenced ratings regardless of timbre shifts, where larger intervals were less likely to be perceived as belonging to the initial melody. Thus, participants also appear to have used expectations about pitch intervals to make judgments. Finally, the direction of the initial interval within the sequence also influenced target judgments when the target tone constituted a shift in timbre, indicating that participants used directional information to create expectations for the target pitch. Taken together, the findings from the current investigation minimally indicate that, at least under conditions reflecting a single change in instrument source, timbre has the capacity to drastically impact the perception of melodic phrase structure

    ARSTREAM: A Neural Network Model of Auditory Scene Analysis and Source Segregation

    Full text link
    Multiple sound sources often contain harmonics that overlap and may be degraded by environmental noise. The auditory system is capable of teasing apart these sources into distinct mental objects, or streams. Such an "auditory scene analysis" enables the brain to solve the cocktail party problem. A neural network model of auditory scene analysis, called the AIRSTREAM model, is presented to propose how the brain accomplishes this feat. The model clarifies how the frequency components that correspond to a give acoustic source may be coherently grouped together into distinct streams based on pitch and spatial cues. The model also clarifies how multiple streams may be distinguishes and seperated by the brain. Streams are formed as spectral-pitch resonances that emerge through feedback interactions between frequency-specific spectral representaion of a sound source and its pitch. First, the model transforms a sound into a spatial pattern of frequency-specific activation across a spectral stream layer. The sound has multiple parallel representations at this layer. A sound's spectral representation activates a bottom-up filter that is sensitive to harmonics of the sound's pitch. The filter activates a pitch category which, in turn, activate a top-down expectation that allows one voice or instrument to be tracked through a noisy multiple source environment. Spectral components are suppressed if they do not match harmonics of the top-down expectation that is read-out by the selected pitch, thereby allowing another stream to capture these components, as in the "old-plus-new-heuristic" of Bregman. Multiple simultaneously occuring spectral-pitch resonances can hereby emerge. These resonance and matching mechanisms are specialized versions of Adaptive Resonance Theory, or ART, which clarifies how pitch representations can self-organize durin learning of harmonic bottom-up filters and top-down expectations. The model also clarifies how spatial location cues can help to disambiguate two sources with similar spectral cures. Data are simulated from psychophysical grouping experiments, such as how a tone sweeping upwards in frequency creates a bounce percept by grouping with a downward sweeping tone due to proximity in frequency, even if noise replaces the tones at their interection point. Illusory auditory percepts are also simulated, such as the auditory continuity illusion of a tone continuing through a noise burst even if the tone is not present during the noise, and the scale illusion of Deutsch whereby downward and upward scales presented alternately to the two ears are regrouped based on frequency proximity, leading to a bounce percept. Since related sorts of resonances have been used to quantitatively simulate psychophysical data about speech perception, the model strengthens the hypothesis the ART-like mechanisms are used at multiple levels of the auditory system. Proposals for developing the model to explain more complex streaming data are also provided.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-92-J-0225); Office of Naval Research (N00014-01-1-0624); Advanced Research Projects Agency (N00014-92-J-4015); British Petroleum (89A-1204); National Science Foundation (IRI-90-00530); American Society of Engineering Educatio

    Human response to aircraft noise

    Get PDF
    The human auditory system and the perception of sound are discussed. The major concentration is on the annnoyance response and methods for relating the physical characteristics of sound to those psychosociological attributes associated with human response. Results selected from the extensive laboratory and field research conducted on human response to aircraft noise over the past several decades are presented along with discussions of the methodology commonly used in conducting that research. Finally, some of the more common criteria, regulations, and recommended practices for the control or limitation of aircraft noise are examined in light of the research findings on human response
    • …
    corecore