123 research outputs found

    Acoustic, psychophysical, and neuroimaging measurements of the effectiveness of active cancellation during auditory functional magnetic resonance imaging

    Get PDF
    Functional magnetic resonance imaging (fMRI) is one of the principal neuroimaging techniques for studying human audition, but it generates an intense background sound which hinders listening performance and confounds measures of the auditory response. This paper reports the perceptual effects of an active noise control (ANC) system that operates in the electromagnetically hostile and physically compact neuroimaging environment to provide significant noise reduction, without interfering with image quality. Cancellation was first evaluated at 600 Hz, corresponding to the dominant peak in the power spectrum of the background sound and at which cancellation is maximally effective. Microphone measurements at the ear demonstrated 35 dB of acoustic attenuation [from 93 to 58 dB sound pressure level (SPL)], while masked detection thresholds improved by 20 dB (from 74 to 54 dB SPL). Considerable perceptual benefits were also obtained across other frequencies, including those corresponding to dips in the spectrum of the background sound. Cancellation also improved the statistical detection of sound-related cortical activation, especially for sounds presented at low intensities. These results confirm that ANC offers substantial benefits for fMRI research

    Prior context in audition informs binding and shapes simple features

    Get PDF
    A perceptual phenomenon is reported, whereby prior acoustic context has a large, rapid and long-lasting effect on a basic auditory judgement. Pairs of tones were devised to include ambiguous transitions between frequency components, such that listeners were equally likely to report an upward or downward ‘pitch’ shift between tones. We show that presenting context tones before the ambiguous pair almost fully determines the perceived direction of shift. The context effect generalizes to a wide range of temporal and spectral scales, encompassing the characteristics of most realistic auditory scenes. Magnetoencephalographic recordings show that a relative reduction in neural responsivity is correlated to the behavioural effect. Finally, a computational model reproduces behavioural results, by implementing a simple constraint of continuity for binding successive sounds in a probabilistic manner. Contextual processing, mediated by ubiquitous neural mechanisms such as adaptation, may be crucial to track complex sound sources over time

    Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network

    Get PDF
    Identification and extraction of singing voice from within musical mixtures is a key challenge in source separation and machine audition. Recently, deep neural networks (DNN) have been used to estimate 'ideal' binary masks for carefully controlled cocktail party speech separation problems. However, it is not yet known whether these methods are capable of generalizing to the discrimination of voice and non-voice in the context of musical mixtures. Here, we trained a convolutional DNN (of around a billion parameters) to provide probabilistic estimates of the ideal binary mask for separation of vocal sounds from real-world musical mixtures. We contrast our DNN results with more traditional linear methods. Our approach may be useful for automatic removal of vocal sounds from musical mixtures for 'karaoke' type applications

    Heart-Kidney Interaction: Epidemiology of Cardiorenal Syndromes

    Get PDF
    Cardiac and kidney diseases are common, increasingly encountered, and often coexist. Recently, the Acute Dialysis Quality Initiative (ADQI) Working Group convened a consensus conference to develop a classification scheme for the CRS and for five discrete subtypes. These CRS subtypes likely share pathophysiologic mechanisms, however, also have distinguishing clinical features, in terms of precipitating events, risk identification, natural history, and outcomes. Knowledge of the epidemiology of heart-kidney interaction stratified by the proposed CRS subtypes is increasingly important for understanding the overall burden of disease for each CRS subtype, along with associated morbidity, mortality, and health resource utilization. Likewise, an understanding of the epidemiology of CRS is necessary for characterizing whether there exists important knowledge gaps and to aid in the design of clinical studies. This paper will provide a summary of the epidemiology of the cardiorenal syndrome and its subtypes

    Sound Synthesis with Auditory Distortion Products

    Get PDF
    This article describes methods of sound synthesis based on auditory distortion products, often called combination tones. In 1856, Helmholtz was the first to identify sum and difference tones as products of auditory distortion. Today this phenomenon is well studied in the context of otoacoustic emissions, and the “distortion” is understood as a product of what is termed the cochlear amplifier. These tones have had a rich history in the music of improvisers and drone artists. Until now, the use of distortion tones in technological music has largely been rudimentary and dependent on very high amplitudes in order for the distortion products to be heard by audiences. Discussed here are synthesis methods to render these tones more easily audible and lend them the dynamic properties of traditional acoustic sound, thus making auditory distortion a practical domain for sound synthesis. An adaptation of single-sideband synthesis is particularly effective for capturing the dynamic properties of audio inputs in real time. Also presented is an analytic solution for matching up to four harmonics of a target spectrum. Most interestingly, the spatial imagery produced by these techniques is very distinctive, and over loudspeakers the normal assumptions of spatial hearing do not apply. Audio examples are provided that illustrate the discussion

    The Sound Sensation of Apical Electric Stimulation in Cochlear Implant Recipients with Contralateral Residual Hearing

    Get PDF
    BACKGROUND: Studies using vocoders as acoustic simulators of cochlear implants have generally focused on simulation of speech understanding, gender recognition, or music appreciation. The aim of the present experiment was to study the auditory sensation perceived by cochlear implant (CI) recipients with steady electrical stimulation on the most-apical electrode. METHODOLOGY/PRINCIPAL FINDINGS: Five unilateral CI users with contralateral residual hearing were asked to vary the parameters of an acoustic signal played to the non-implanted ear, in order to match its sensation to that of the electric stimulus. They also provided a rating of similarity between each acoustic sound they selected and the electric stimulus. On average across subjects, the sound rated as most similar was a complex signal with a concentration of energy around 523 Hz. This sound was inharmonic in 3 out of 5 subjects with a moderate, progressive increase in the spacing between the frequency components. CONCLUSIONS/SIGNIFICANCE: For these subjects, the sound sensation created by steady electric stimulation on the most-apical electrode was neither a white noise nor a pure tone, but a complex signal with a progressive increase in the spacing between the frequency components in 3 out of 5 subjects. Knowing whether the inharmonic nature of the sound was related to the fact that the non-implanted ear was impaired has to be explored in single-sided deafened patients with a contralateral CI. These results may be used in the future to better understand peripheral and central auditory processing in relation to cochlear implants

    Effect of stimulus type and pitch salience on pitch-sequence processing

    Get PDF
    Using a same-different discrimination task, it has been shown that discrimination performance for sequences of complex tones varying just detectably in pitch is less dependent on sequence length (1, 2, or 4 elements) when the tones contain resolved harmonics than when they do not [Cousineau, Demany, and Pessnitzer (2009). J. Acoust. Soc. Am. 126, 3179-3187]. This effect had been attributed to the activation of automatic frequency-shift detectors (FSDs) by the shifts in resolved harmonics. The present study provides evidence against this hypothesis by showing that the sequence-processing advantage found for complex tones with resolved harmonics is not found for pure tones or other sounds supposed to activate FSDs (narrow bands of noise and wide-band noises eliciting pitch sensations due to interaural phase shifts). The present results also indicate that for pitch sequences, processing performance is largely unrelated to pitch salience per se: for a fixed level of discriminability between sequence elements, sequences of elements with salient pitches are not necessarily better processed than sequences of elements with less salient pitches. An ideal-observer model for the same-different binary-sequence discrimination task is also developed in the present study. The model allows the computation of d' for this task using numerical methods

    Active Learning for Auditory Hierarchy

    Get PDF
    Much audio content today is rendered as a static stereo mix: fundamentally a fixed single entity. Object-based audio envisages the delivery of sound content using a collection of individual sound ‘objects’ controlled by accompanying metadata. This offers potential for audio to be delivered in a dynamic manner providing enhanced audio for consumers. One example of such treatment is the concept of applying varying levels of data compression to sound objects thereby reducing the volume of data to be transmitted in limited bandwidth situations. This application motivates the ability to accurately classify objects in terms of their ‘hierarchy’. That is, whether or not an object is a foreground sound, which should be reproduced at full quality if possible, or a background sound, which can be heavily compressed without causing a deterioration in the listening experience. Lack of suitably labelled data is an acknowledged problem in the domain. Active Learning is a method that can greatly reduce the manual effort required to label a large corpus by identifying the most effective instances to train a model to high accuracy levels. This paper compares a number of Active Learning methods to investigate which is most effective in the context of a hierarchical labelling task on an audio dataset. Results show that the number of manual labels required can be reduced to 1.7% of the total dataset while still retaining high prediction accuracy

    Insights on the Neuromagnetic Representation of Temporal Asymmetry in Human Auditory Cortex.

    Get PDF
    Communication sounds are typically asymmetric in time and human listeners are highly sensitive to this short-term temporal asymmetry. Nevertheless, causal neurophysiological correlates of auditory perceptual asymmetry remain largely elusive to our current analyses and models. Auditory modelling and animal electrophysiological recordings suggest that perceptual asymmetry results from the presence of multiple time scales of temporal integration, central to the auditory periphery. To test this hypothesis we recorded auditory evoked fields (AEF) elicited by asymmetric sounds in humans. We found a strong correlation between perceived tonal salience of ramped and damped sinusoids and the AEFs, as quantified by the amplitude of the N100m dynamics. The N100m amplitude increased with stimulus half-life time, showing a maximum difference between the ramped and damped stimulus for a modulation half-life time of 4 ms which is greatly reduced at 0.5 ms and 32 ms. This behaviour of the N100m closely parallels psychophysical data in a manner that: i) longer half-life times are associated with a stronger tonal percept, and ii) perceptual differences between damped and ramped are maximal at 4 ms half-life time. Interestingly, differences in evoked fields were significantly stronger in the right hemisphere, indicating some degree of hemispheric specialisation. Furthermore, the N100m magnitude was successfully explained by a pitch perception model using multiple scales of temporal integration of auditory nerve activity patterns. This striking correlation between AEFs, perception, and model predictions suggests that the physiological mechanisms involved in the processing of pitch evoked by temporal asymmetric sounds are reflected in the N100m
    corecore