30,447 research outputs found

    Acoustic Space Learning for Sound Source Separation and Localization on Binaural Manifolds

    Get PDF
    In this paper we address the problems of modeling the acoustic space generated by a full-spectrum sound source and of using the learned model for the localization and separation of multiple sources that simultaneously emit sparse-spectrum sounds. We lay theoretical and methodological grounds in order to introduce the binaural manifold paradigm. We perform an in-depth study of the latent low-dimensional structure of the high-dimensional interaural spectral data, based on a corpus recorded with a human-like audiomotor robot head. A non-linear dimensionality reduction technique is used to show that these data lie on a two-dimensional (2D) smooth manifold parameterized by the motor states of the listener, or equivalently, the sound source directions. We propose a probabilistic piecewise affine mapping model (PPAM) specifically designed to deal with high-dimensional data exhibiting an intrinsic piecewise linear structure. We derive a closed-form expectation-maximization (EM) procedure for estimating the model parameters, followed by Bayes inversion for obtaining the full posterior density function of a sound source direction. We extend this solution to deal with missing data and redundancy in real world spectrograms, and hence for 2D localization of natural sound sources such as speech. We further generalize the model to the challenging case of multiple sound sources and we propose a variational EM framework. The associated algorithm, referred to as variational EM for source separation and localization (VESSL) yields a Bayesian estimation of the 2D locations and time-frequency masks of all the sources. Comparisons of the proposed approach with several existing methods reveal that the combination of acoustic-space learning with Bayesian inference enables our method to outperform state-of-the-art methods.Comment: 19 pages, 9 figures, 3 table

    Kolmogorov turbulence, Anderson localization and KAM integrability

    Full text link
    The conditions for emergence of Kolmogorov turbulence, and related weak wave turbulence, in finite size systems are analyzed by analytical methods and numerical simulations of simple models. The analogy between Kolmogorov energy flow from large to small spacial scales and conductivity in disordered solid state systems is proposed. It is argued that the Anderson localization can stop such an energy flow. The effects of nonlinear wave interactions on such a localization are analyzed. The results obtained for finite size system models show the existence of an effective chaos border between the Kolmogorov-Arnold-Moser (KAM) integrability at weak nonlinearity, when energy does not flow to small scales, and developed chaos regime emerging above this border with the Kolmogorov turbulent energy flow from large to small scales.Comment: 8 pages, 6 figs, EPJB style

    Effects of feedback, mobility and index of difficulty on deictic spatial audio target acquisition in the horizontal plane

    Get PDF
    We present the results of an empirical study investigating the effect of feedback, mobility and index of difficulty on a deictic spatial audio target acquisition task in the horizontal plane in front of a user. With audio feedback, spatial audio display elements are found to enable usable deictic interac-tion that can be described using Fitts law. Feedback does not affect perceived workload or preferred walking speed compared to interaction without feedback. Mobility is found to degrade interaction speed and accuracy by 20%. Participants were able to perform deictic spatial audio target acquisition when mobile while walking at 73% of their pre-ferred walking speed. The proposed feedback design is ex-amined in detail and the effects of variable target widths are quantified. Deictic interaction with a spatial audio display is found to be a feasible solution for future interface designs

    2D to 3D ambience upmixing based on perceptual band allocation

    Get PDF
    3D multichannel audio systems employ additional elevated loudspeakers in order to provide listeners with a vertical dimension to their auditory experience. Listening tests were conducted to evaluate the feasibility of a novel vertical upmixing technique called “perceptual band allocation (PBA),” which is based on a psychoacoustic principle of vertical sound localization, the “pitch height” effect. The practical feasibility of the method was investigated using 4-channel ambience signals recorded in a reverberant concert hall using the Hamasaki-Square microphone technique. Results showed that the PBA-upmixed 3D stimuli were significantly stronger than or similar to 9-channel 3D stimuli in 3D listener-envelopment (LEV), depending on the sound source and the crossover frequency of PBA. They also significantly produced greater 3D LEV than the 7-channel 3D stimuli. For the preference tests, the PBA stimuli were significantly preferred over the original 9-channel stimuli
    • 

    corecore