323 research outputs found

    Representations of specific acoustic patterns in the auditory cortex and hippocampus

    Get PDF
    Previous behavioural studies have shown that repeated presentation of a randomly chosen acoustic pattern leads to the unsupervised learning of some of its specific acoustic features. The objective of our study was to determine the neural substrate for the representation of freshly learnt acoustic patterns. Subjects first performed a behavioural task that resulted in the incidental learning of three different noise-like acoustic patterns. During subsequent high-resolution functional magnetic resonance imaging scanning, subjects were then exposed again to these three learnt patterns and to others that had not been learned. Multi-voxel pattern analysis was used to test if the learnt acoustic patterns could be 'decoded' from the patterns of activity in the auditory cortex and medial temporal lobe. We found that activity in planum temporale and the hippocampus reliably distinguished between the learnt acoustic patterns. Our results demonstrate that these structures are involved in the neural representation of specific acoustic patterns after they have been learnt

    Inhibition-excitation balance in the parietal cortex modulates volitional control for auditory and visual multistability

    Get PDF
    International audiencePerceptual organisation must select one interpretation from several alternatives to guide behaviour. Computational models suggest that this could be achieved through an interplay between inhibition and excitation across competing types of neural population coding for each interpretation. Here, to test for such models, we used magnetic resonance spectroscopy to measure non-invasively the concentrations of inhibitory Îł-aminobutyric acid (GABA) and excitatory glutamate-glutamine (Glx) in several brain regions. Human participants first performed auditory and visual multistability tasks that produced spontaneous switching between percepts. Then, we observed that longer percept durations during behaviour were associated with higher GABA/Glx ratios in the sensory area coding for each modality. When participants were asked to voluntarily modulate their perception, a common factor across modalities emerged: the GABA/Glx ratio in the posterior parietal cortex tended to be positively correlated with the amount of effective volitional control. Our results provide direct evidence implicating that the balance between neural inhibition and excitation within sensory regions resolves perceptual competition. This powerful computational principle appears to be leveraged by both audition and vision, implemented independently across modalities, but modulated by an integrated control process. Perceptual multistability describes an intriguing situation, whereby an observer reports random changes in conscious perception for a physically unchanging stimulus 1,2. Multistability is a powerful tool with which to probe perceptual organisation, as it highlights perhaps the most fundamental issue faced by perception for any reasonably complex natural scene. And because the information encoded by sensory receptors is never sufficient to fully specify the state of the outside world 3 , at each instant perception must always choose between a number of competing alternatives. In realistic situations, the process produces a stable and useful representation of the world. In situations with intrinsically ambiguous information, the same process is revealed as multistable perception. A number of theoretical models have converged to pinpoint the generic computational principles likely to be required to explain multistability, and hence perceptual organisation 4-9. All of these models consider three core ingredients: inhibition between competing neural populations, adaptation within these populations, and neuronal noise. The precise role of each ingredient and their respective importance is still being debated. Noise is introduced to induce fluctuations in each population and initiate the stochastic perceptual switching in some models 7-9 , whereas switching dynamics are solely determined by inhibition in others 5,6. Functional brain imaging in humans has provided results qualitatively compatible with those computational principles at several levels of the visual processing hierarchy 10. But, for most functional imaging techniques in humans such as fMRI or MEG/EEG, change

    Prior context in audition informs binding and shapes simple features

    Get PDF
    A perceptual phenomenon is reported, whereby prior acoustic context has a large, rapid and long-lasting effect on a basic auditory judgement. Pairs of tones were devised to include ambiguous transitions between frequency components, such that listeners were equally likely to report an upward or downward ‘pitch’ shift between tones. We show that presenting context tones before the ambiguous pair almost fully determines the perceived direction of shift. The context effect generalizes to a wide range of temporal and spectral scales, encompassing the characteristics of most realistic auditory scenes. Magnetoencephalographic recordings show that a relative reduction in neural responsivity is correlated to the behavioural effect. Finally, a computational model reproduces behavioural results, by implementing a simple constraint of continuity for binding successive sounds in a probabilistic manner. Contextual processing, mediated by ubiquitous neural mechanisms such as adaptation, may be crucial to track complex sound sources over time

    Acoustic, psychophysical, and neuroimaging measurements of the effectiveness of active cancellation during auditory functional magnetic resonance imaging

    Get PDF
    Functional magnetic resonance imaging (fMRI) is one of the principal neuroimaging techniques for studying human audition, but it generates an intense background sound which hinders listening performance and confounds measures of the auditory response. This paper reports the perceptual effects of an active noise control (ANC) system that operates in the electromagnetically hostile and physically compact neuroimaging environment to provide significant noise reduction, without interfering with image quality. Cancellation was first evaluated at 600 Hz, corresponding to the dominant peak in the power spectrum of the background sound and at which cancellation is maximally effective. Microphone measurements at the ear demonstrated 35 dB of acoustic attenuation [from 93 to 58 dB sound pressure level (SPL)], while masked detection thresholds improved by 20 dB (from 74 to 54 dB SPL). Considerable perceptual benefits were also obtained across other frequencies, including those corresponding to dips in the spectrum of the background sound. Cancellation also improved the statistical detection of sound-related cortical activation, especially for sounds presented at low intensities. These results confirm that ANC offers substantial benefits for fMRI research

    The human 'pitch center' responds differently to iterated noise and Huggins pitch

    Get PDF
    A magnetoencephalographic marker for pitch analysis (the pitch onset response) has been reported for different types of pitch-evoking stimuli, irrespective of whether the acoustic cues for pitch are monaurally or binaurally produced. It is claimed that the pitch onset response reflects a common cortical representation for pitch, putatively in lateral Heschl's gyrus. The result of this functional MRI study sheds doubt on this assertion. We report a direct comparison between iterated ripple noise and Huggins pitch in which we reveal a different pattern of auditory cortical activation associated with each pitch stimulus, even when individual variability in structure-function relations is accounted for. Our results suggest it may be premature to assume that lateral Heschl's gyrus is a universal pitch center

    Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network

    Get PDF
    Identification and extraction of singing voice from within musical mixtures is a key challenge in source separation and machine audition. Recently, deep neural networks (DNN) have been used to estimate 'ideal' binary masks for carefully controlled cocktail party speech separation problems. However, it is not yet known whether these methods are capable of generalizing to the discrimination of voice and non-voice in the context of musical mixtures. Here, we trained a convolutional DNN (of around a billion parameters) to provide probabilistic estimates of the ideal binary mask for separation of vocal sounds from real-world musical mixtures. We contrast our DNN results with more traditional linear methods. Our approach may be useful for automatic removal of vocal sounds from musical mixtures for 'karaoke' type applications

    The CHiME-7 UDASE task: Unsupervised domain adaptation for conversational speech enhancement

    Full text link
    Supervised speech enhancement models are trained using artificially generated mixtures of clean speech and noise signals, which may not match real-world recording conditions at test time. This mismatch can lead to poor performance if the test domain significantly differs from the synthetic training domain. In this paper, we introduce the unsupervised domain adaptation for conversational speech enhancement (UDASE) task of the 7th CHiME challenge. This task aims to leverage real-world noisy speech recordings from the target test domain for unsupervised domain adaptation of speech enhancement models. The target test domain corresponds to the multi-speaker reverberant conversational speech recordings of the CHiME-5 dataset, for which the ground-truth clean speech reference is not available. Given a CHiME-5 recording, the task is to estimate the clean, potentially multi-speaker, reverberant speech, removing the additive background noise. We discuss the motivation for the CHiME-7 UDASE task and describe the data, the task, and the baseline system

    On the Emergence and Awareness of Auditory Objects

    Get PDF
    How do humans successfully navigate the sounds of music and the voice of a friend in the midst of a noisy cocktail party? Two recent articles inPLoS Biology provide psychoacoustic and neuronal clues about where to search for the answers

    Event-related potential correlates of sound organization: Early sensory and late cognitive effects

    Get PDF
    We tested whether incoming sounds are processed differently depending on how the preceding sound sequence has been interpreted by the brain. Sequences of a regularly repeating three-tone pattern, the perceived organization of which spontaneously switched back and forth between two alternative interpretations, were delivered to listeners. Occasionally, a regular tone was exchanged for a slightly or moderately lower one (deviants). The electroencephalogram (EEG) was recorded while listeners continuously marked their perception of the sound sequence. We found that for both the regular and the deviant tones, the early exogenous P1 and N1 amplitudes varied together with the perceived sound organization. Percept dependent effects on the late endogenous N2 and P3a amplitudes were only found for deviant tones. These results suggest that the perceived sound organization affects sound processing both by modulating what information is extracted from incoming sounds as well as by influencing how deviant sound events are evaluated for further processing

    Sound Synthesis with Auditory Distortion Products

    Get PDF
    This article describes methods of sound synthesis based on auditory distortion products, often called combination tones. In 1856, Helmholtz was the first to identify sum and difference tones as products of auditory distortion. Today this phenomenon is well studied in the context of otoacoustic emissions, and the “distortion” is understood as a product of what is termed the cochlear amplifier. These tones have had a rich history in the music of improvisers and drone artists. Until now, the use of distortion tones in technological music has largely been rudimentary and dependent on very high amplitudes in order for the distortion products to be heard by audiences. Discussed here are synthesis methods to render these tones more easily audible and lend them the dynamic properties of traditional acoustic sound, thus making auditory distortion a practical domain for sound synthesis. An adaptation of single-sideband synthesis is particularly effective for capturing the dynamic properties of audio inputs in real time. Also presented is an analytic solution for matching up to four harmonics of a target spectrum. Most interestingly, the spatial imagery produced by these techniques is very distinctive, and over loudspeakers the normal assumptions of spatial hearing do not apply. Audio examples are provided that illustrate the discussion
    • 

    corecore