865 research outputs found

    Modelling the Emergence and Dynamics of Perceptual Organisation in Auditory Streaming

    Get PDF
    Many sound sources can only be recognised from the pattern of sounds they emit, and not from the individual sound events that make up their emission sequences. Auditory scene analysis addresses the difficult task of interpreting the sound world in terms of an unknown number of discrete sound sources (causes) with possibly overlapping signals, and therefore of associating each event with the appropriate source. There are potentially many different ways in which incoming events can be assigned to different causes, which means that the auditory system has to choose between them. This problem has been studied for many years using the auditory streaming paradigm, and recently it has become apparent that instead of making one fixed perceptual decision, given sufficient time, auditory perception switches back and forth between the alternatives—a phenomenon known as perceptual bi- or multi-stability. We propose a new model of auditory scene analysis at the core of which is a process that seeks to discover predictable patterns in the ongoing sound sequence. Representations of predictable fragments are created on the fly, and are maintained, strengthened or weakened on the basis of their predictive success, and conflict with other representations. Auditory perceptual organisation emerges spontaneously from the nature of the competition between these representations. We present detailed comparisons between the model simulations and data from an auditory streaming experiment, and show that the model accounts for many important findings, including: the emergence of, and switching between, alternative organisations; the influence of stimulus parameters on perceptual dominance, switching rate and perceptual phase durations; and the build-up of auditory streaming. The principal contribution of the model is to show that a two-stage process of pattern discovery and competition between incompatible patterns can account for both the contents (perceptual organisations) and the dynamics of human perception in auditory streaming

    Predictive coding in auditory perception: challenges and unresolved questions.

    Get PDF
    Predictive coding is arguably the currently dominant theoretical framework for the study of perception. It has been employed to explain important auditory perceptual phenomena, and it has inspired theoretical, experimental and computational modelling efforts aimed at describing how the auditory system parses the complex sound input into meaningful units (auditory scene analysis). These efforts have uncovered some vital questions, addressing which could help to further specify predictive coding and clarify some of its basic assumptions. The goal of the current review is to motivate these questions and show how unresolved issues in explaining some auditory phenomena lead to general questions of the theoretical framework. We focus on experimental and computational modelling issues related to sequential grouping in auditory scene analysis (auditory pattern detection and bistable perception), as we believe that this is the research topic where predictive coding has the highest potential for advancing our understanding. In addition to specific questions, our analysis led us to identify three more general questions that require further clarification: (1) What exactly is meant by prediction in predictive coding? (2) What governs which generative models make the predictions? and (3) What (if it exists) is the correlate of perceptual experience within the predictive coding framework

    Computational models of auditory perception from feature extraction to stream segregation and behavior

    Get PDF
    This is the final version. Available on open access from Elsevier via the DOI in this recordData availability: This is a review study, and as such did not generate any new data.Audition is by nature dynamic, from brainstem processing on sub-millisecond time scales, to segregating and tracking sound sources with changing features, to the pleasure of listening to music and the satisfaction of getting the beat. We review recent advances from computational models of sound localization, of auditory stream segregation and of beat perception/generation. A wealth of behavioral, electrophysiological and imaging studies shed light on these processes, typically with synthesized sounds having regular temporal structure. Computational models integrate knowledge from different experimental fields and at different levels of description. We advocate a neuromechanistic modeling approach that incorporates knowledge of the auditory system from various fields, that utilizes plausible neural mechanisms, and that bridges our understanding across disciplines.Engineering and Physical Sciences Research Council (EPSRC

    Similar but separate systems underlie perceptual bistability in vision and audition

    Get PDF
    The dynamics of perceptual bistability, the phenomenon in which perception switches between different interpretations of an unchanging stimulus, are characterised by very similar properties across a wide range of qualitatively different paradigms. This suggests that perceptual switching may be triggered by some common source. However, it is also possible that perceptual switching may arise from a distributed system, whose components vary according to the specifics of the perceptual experiences involved. Here we used a visual and an auditory task to determine whether individuals show cross-modal commonalities in perceptual switching. We found that individual perceptual switching rates were significantly correlated across modalities. We then asked whether perceptual switching arises from some central (modality-) task-independent process or from a more distributed task-specific system. We found that a log-normal distribution best explained the distribution of perceptual phases in both modalities, suggestive of a combined set of independent processes causing perceptual switching. Modality- and/or task-dependent differences in these distributions, and lack of correlation with the modality-independent central factors tested (ego-resiliency, creativity, and executive function), also point towards perceptual switching arising from a distributed system of similar but independent processes

    Inhibition-excitation balance in the parietal cortex modulates volitional control for auditory and visual multistability

    Get PDF
    International audiencePerceptual organisation must select one interpretation from several alternatives to guide behaviour. Computational models suggest that this could be achieved through an interplay between inhibition and excitation across competing types of neural population coding for each interpretation. Here, to test for such models, we used magnetic resonance spectroscopy to measure non-invasively the concentrations of inhibitory γ-aminobutyric acid (GABA) and excitatory glutamate-glutamine (Glx) in several brain regions. Human participants first performed auditory and visual multistability tasks that produced spontaneous switching between percepts. Then, we observed that longer percept durations during behaviour were associated with higher GABA/Glx ratios in the sensory area coding for each modality. When participants were asked to voluntarily modulate their perception, a common factor across modalities emerged: the GABA/Glx ratio in the posterior parietal cortex tended to be positively correlated with the amount of effective volitional control. Our results provide direct evidence implicating that the balance between neural inhibition and excitation within sensory regions resolves perceptual competition. This powerful computational principle appears to be leveraged by both audition and vision, implemented independently across modalities, but modulated by an integrated control process. Perceptual multistability describes an intriguing situation, whereby an observer reports random changes in conscious perception for a physically unchanging stimulus 1,2. Multistability is a powerful tool with which to probe perceptual organisation, as it highlights perhaps the most fundamental issue faced by perception for any reasonably complex natural scene. And because the information encoded by sensory receptors is never sufficient to fully specify the state of the outside world 3 , at each instant perception must always choose between a number of competing alternatives. In realistic situations, the process produces a stable and useful representation of the world. In situations with intrinsically ambiguous information, the same process is revealed as multistable perception. A number of theoretical models have converged to pinpoint the generic computational principles likely to be required to explain multistability, and hence perceptual organisation 4-9. All of these models consider three core ingredients: inhibition between competing neural populations, adaptation within these populations, and neuronal noise. The precise role of each ingredient and their respective importance is still being debated. Noise is introduced to induce fluctuations in each population and initiate the stochastic perceptual switching in some models 7-9 , whereas switching dynamics are solely determined by inhibition in others 5,6. Functional brain imaging in humans has provided results qualitatively compatible with those computational principles at several levels of the visual processing hierarchy 10. But, for most functional imaging techniques in humans such as fMRI or MEG/EEG, change

    Predictive coding in auditory perception: challenges and unresolved questions

    Get PDF
    Predictive coding is arguably the currently dominant theoretical framework for the study of perception. It has been employed to explain important auditory perceptual phenomena, and it has inspired theoretical, experimental, and computational modelling efforts aimed at describing how the auditory system parses the complex sound input into meaningful units (auditory scene analysis). These efforts have uncovered some vital questions, addressing which could help to further specify predictive coding and clarify some of its basic assumptions. The goal of the current review is to motivate these questions, and show how unresolved issues in explaining some auditory phenomena lead to general questions of the theoretical framework. We focus on experimental and computational modelling issues related to sequential grouping in auditory scene analysis (auditory pattern detection and bistable perception), as we believe that this is the research topic where predictive coding has the highest potential for advancing our understanding. In addition to specific questions, our analysis led us to identify three more general questions that require further clarification: 1) What exactly is meant by prediction in predictive coding? 2)What governs which generative models make the predictions? and, 3) What (if it exists) is the correlate of perceptual experience within the predictive coding framework

    Computational Models of Auditory Scene Analysis: A Review

    Get PDF
    Auditory scene analysis (ASA) refers to the process(es) of parsing the complex acoustic input into auditory perceptual objects representing either physical sources or temporal sound patterns, such as melodies, which contributed to the sound waves reaching the ears. A number of new computational models accounting for some of the perceptual phenomena of ASA have been published recently. Here we provide a theoretically motivated review of these computational models, aiming to relate their guiding principles to the central issues of the theoretical framework of ASA. Specifically, we ask how they achieve the grouping and separation of sound elements and whether they implement some form of competition between alternative interpretations of the sound input. We consider the extent to which they include predictive processes, as important current theories suggest that perception is inherently predictive, and also how they have been evaluated. We conclude that current computational models of ASA are fragmentary in the sense that rather than providing general competing interpretations of ASA, they focus on assessing the utility of specific processes (or algorithms) for finding the causes of the complex acoustic signal. This leaves open the possibility for integrating complementary aspects of the models into a more comprehensive theory of ASA

    Stable individual characteristics in the perception of multiple embedded patterns in multistable auditory stimuli

    Get PDF
    The ability of the auditory system to parse complex scenes into component objects in order to extract information from the environment is very robust, yet the processing principles underlying this ability are still not well understood. This study was designed to investigate the proposal that the auditory system constructs multiple interpretations of the acoustic scene in parallel, based on the finding that when listening to a long repetitive sequence listeners report switching between different perceptual organizations. Using the ‘ABA-’ auditory streaming paradigm we trained listeners until they could reliably recognise all possible embedded patterns of length four which could in principle be extracted from the sequence, and in a series of test sessions investigated their spontaneous reports of those patterns. With the training allowing them to identify and mark a wider variety of possible patterns, participants spontaneously reported many more patterns than the ones traditionally assumed (Integrated vs. Segregated). Despite receiving consistent training and despite the apparent randomness of perceptual switching, we found individual switching patterns were idiosyncratic; i.e. the perceptual switching patterns of each participant were more similar to their own switching patterns in different sessions than to those of other participants. These individual differences were found to be preserved even between test sessions held a year after the initial experiment. Our results support the idea that the auditory system attempts to extract an exhaustive set of embedded patterns which can be used to generate expectations of future events and which by competing for dominance give rise to (changing) perceptual awareness, with the characteristics of pattern discovery and perceptual competition having a strong idiosyncratic component. Perceptual multistability thus provides a means for characterizing both general mechanisms and individual differences in human perception

    Stimulus pauses and perturbations differentially delay or promote the segregation of auditory objects: psychoacoustics and modeling

    Get PDF
    This Provisional PDF corresponds to the article as it appeared upon acceptance, after peer-review. The final publisher formatted version is available from the publisher via the DOI in this record.Segregating distinct sound sources is fundamental for auditory perception, as in the cocktail party problem. In a process called the build-up of stream segregation, distinct sound sources that are perceptually integrated initially can be segregated into separate streams after several seconds. Previous research concluded that abrupt changes in the incoming sounds during build-up --- for example, a step change in location, loudness or timing --- reset the percept to integrated. Following this reset, the multisecond build-up process begins again. Neurophysiological recordings in auditory cortex (A1) show fast (subsecond) adaptation, but unified mechanistic explanations for the bias toward integration, multisecond build-up and resets remain elusive. Combining psychoacoustics and modeling, we show that initial unadapted A1 responses bias integration, that the slowness of build-up arises naturally from competition downstream, and that recovery of adaptation can explain resets. An early bias toward integrated perceptual interpretations arising from primary cortical stages that encode low-level features and feed into competition downstream could also explain similar phenomena in vision. Further, we report a previously overlooked class of perturbations that promote segregation rather than integration. Our results challenge current understanding for perturbation effects on the emergence of sound source segregation, leading to a new hypothesis for differential processing downstream of A1. Transient perturbations can momentarily redirect A1 responses as input to downstream competition units that favor segregation

    Feature predictability flexibly supports auditory stream segregation or integration

    Get PDF
    Many sound sources emit series of discrete sounds. Auditory perception must bind these sounds together (stream integration) while separating them from sounds emitted by other sources (stream segregation). One cue for identifying successive sounds that belong together is the predictability between their feature values. Previous studies have demonstrated that independent predictable patterns appearing separately in two interleaved sound sequences support perceptual segregation. The converse case, whether a joint predictable pattern in a mixture of interleaved sequences supports perceptual integration, has not yet been put to a rigorous empirical test. This was mainly due to difficulties in manipulating the predictability of the full sequence independently of the predictability of the interleaved subsequences. The present study implemented such an independent manipulation. Listeners continuously indicated whether they perceived a tone sequence as integrated or segregated, while predictable patterns set up to support one or the other percept were manipulated without the participants’ knowledge. Perceptual reports demonstrate that predictability supports stream segregation or integration depending on the type of predictable pattern that is present in the sequence. The effects of predictability were so pronounced as to qualitatively flip perception from predominantly (62%) integrated to predominantly (73%) segregated. These results suggest that auditory perception flexibly responds to encountered regular patterns, favoring predictable perceptual organizations over unpredictable ones. Besides underlining the role of predictability as a cue within auditory scene analysis, the present design also provides a general framework that accommodates previous investigations focusing on sub-comparisons within the present set of experimental manipulations. Results of intermediate conditions shed light on why some previous studies have obtained little to no effects of predictability on auditory scene analysis
    corecore