33 research outputs found

    Binaural Sound Localization Based on Reverberation Weighting and Generalized Parametric Mapping

    Full text link

    Complex Neural Networks for Audio

    Get PDF
    Audio is represented in two mathematically equivalent ways: the real-valued time domain (i.e., waveform) and the complex-valued frequency domain (i.e., spectrum). There are advantages to the frequency-domain representation, e.g., the human auditory system is known to process sound in the frequency-domain. Furthermore, linear time-invariant systems are convolved with sources in the time-domain, whereas they may be factorized in the frequency-domain. Neural networks have become rather useful when applied to audio tasks such as machine listening and audio synthesis, which are related by their dependencies on high quality acoustic models. They ideally encapsulate fine-scale temporal structure, such as that encoded in the phase of frequency-domain audio, yet there are no authoritative deep learning methods for complex audio. This manuscript is dedicated to addressing the shortcoming. Chapter 2 motivates complex networks by their affinity with complex-domain audio, while Chapter 3 contributes methods for building and optimizing complex networks. We show that the naive implementation of Adam optimization is incorrect for complex random variables and show that selection of input and output representation has a significant impact on the performance of a complex network. Experimental results with novel complex neural architectures are provided in the second half of this manuscript. Chapter 4 introduces a complex model for binaural audio source localization. We show that, like humans, the complex model can generalize to different anatomical filters, which is important in the context of machine listening. The complex model\u27s performance is better than that of the real-valued models, as well as real- and complex-valued baselines. Chapter 5 proposes a two-stage method for speech enhancement. In the first stage, a complex-valued stochastic autoencoder projects complex vectors to a discrete space. In the second stage, long-term temporal dependencies are modeled in the discrete space. The autoencoder raises the performance ceiling for state of the art speech enhancement, but the dynamic enhancement model does not outperform other baselines. We discuss areas for improvement and note that the complex Adam optimizer improves training convergence over the naive implementation

    Informed Sound Source Localization for Hearing Aid Applications

    Get PDF

    A robotic framework for semantic concept learning.

    Full text link

    Perceptually Driven Interactive Sound Propagation for Virtual Environments

    Get PDF
    Sound simulation and rendering can significantly augment a user‘s sense of presence in virtual environments. Many techniques for sound propagation have been proposed that predict the behavior of sound as it interacts with the environment and is received by the user. At a broad level, the propagation algorithms can be classified into reverberation filters, geometric methods, and wave-based methods. In practice, heuristic methods based on reverberation filters are simple to implement and have a low computational overhead, while wave-based algorithms are limited to static scenes and involve extensive precomputation. However, relatively little work has been done on the psychoacoustic characterization of different propagation algorithms, and evaluating the relationship between scientific accuracy and perceptual benefits.In this dissertation, we present perceptual evaluations of sound propagation methods and their ability to model complex acoustic effects for virtual environments. Our results indicate that scientifically accurate methods for reverberation and diffraction do result in increased perceptual differentiation. Based on these evaluations, we present two novel hybrid sound propagation methods that combine the accuracy of wave-based methods with the speed of geometric methods for interactive sound propagation in dynamic scenes.Our first algorithm couples modal sound synthesis with geometric sound propagation using wave-based sound radiation to perform mode-aware sound propagation. We introduce diffraction kernels of rigid objects,which encapsulate the sound diffraction behaviors of individual objects in the free space and are then used to simulate plausible diffraction effects using an interactive path tracing algorithm. Finally, we present a novel perceptual driven metric that can be used to accelerate the computation of late reverberation to enable plausible simulation of reverberation with a low runtime overhead. We highlight the benefits of our novel propagation algorithms in different scenarios.Doctor of Philosoph

    Spatial hearing rendering in wireless microphone systems for binaural hearing aids

    Get PDF
    In 2015, 360 million people, including 32 million children, were suffering from hearing impairment all over the world. This makes hearing disability a major worldwide issue. In the US, the prevalence of hearing loss increased by 160% over the past generations. However, 72% of the 34 million impaired American persons (11% of the population) still have an untreated hearing loss. Among the various current solutions alleviating hearing disability, hearing aid is the only non-invasive and the most widespread medical apparatus. Combined with hearing aids, assisting listening devices are a powerful answer to address the degraded speech understanding observed in hearing-impaired subjects, especially in noisy and reverberant environments. Unfortunately, the conventional devices do not accurately render the spatial hearing property of the human auditory system, weakening their benefits. Spatial hearing is an attribute of the auditory system relying on binaural hearing. With 2 ears, human beings are able to localize sounds in space, to get information about the acoustic surroundings, to feel immersed in environments... Furthermore, it strongly contributes to speech intelligibility. It is hypothesized that recreating an artificial spatial perception through the hearing aids of impaired people might allow for recovering a part of these subjects' hearing performance. This thesis investigates and supports the aforementioned hypothesis with both technological and clinical approaches. It reveals how certain well-established signal processing methods can be integrated in some assisting listening devices. These techniques are related to sound localization and spatialization. Taking into consideration the technical constraints of current hearing aids, as well as the characteristics of the impaired auditory system, the thesis proposes a novel solution to restore a spatial perception for users of certain types of assisting listening devices. The achieved results demonstrate the feasibility and the possible implementation of such a functionality on conventional systems. Additionally, this thesis examines the relevance and the efficiency of the proposed spatialization feature towards the enhancement of speech perception. Via a clinical trial involving a large number of patients, the artificial spatial hearing shows to be well appreciated by disabled persons, while improving or preserving their current hearing abilities. This can be considered as a prominent contribution to the current scientific and technological knowledge in the domain of hearing impairment

    Sensors and Systems for Indoor Positioning

    Get PDF
    This reprint is a reprint of the articles that appeared in Sensors' (MDPI) Special Issue on “Sensors and Systems for Indoor Positioning". The published original contributions focused on systems and technologies to enable indoor applications

    Perceptual evaluation of personal, location-aware spatial audio

    Full text link
    This thesis entails an analysis, synthesis and evaluation of the medium of personal, location aware spatial audio (PLASA). The PLASA medium is a specialisation of locative audio—the presentation of audio in relation to the listener’s position. It also intersects with audio augmented reality—the presentation of a virtual audio reality, superimposed on the real world. A PLASA system delivers binaural (personal) spa- tial audio to mobile listeners, with body-position and head-orientation interactivity, so that simulated sound source positions seem fixed in the world reference frame. PLASA technical requirements were analysed and three system architectures identified, employing mobile, remote or distributed rendering. Knowledge of human spatial hearing was reviewed to ascertain likely perceptual effects of the unique factors of PLASA compared to static spatial audio. Human factors identified were multimodal perception of body-motion interaction and coincident visual stimuli. Technical limitations identified were rendering method, individual binaural rendering, and accuracy and latency of position- and orientation-tracking. An experimental PLASA system was built and evaluated technically, then four perceptual experiments were conducted to investigate task-related perceptual per- formance. These experiments tested the identified human factors and technical limitations against performance measures related to localisation and navigation tasks, under conditions designed to be ecologically valid to PLASA application scenarios. A final experiment assessed navigation task performance with real sound sources and un-mediated spatial hearing for comparison with virtual source performance. Results found that body-motion interaction facilitated correction of front–back confusions. Body-motion and the multi-modal stimuli of virtual–audible and real–visible objects supported lower azimuth errors than stationary, mono-modal localisation of the same audio-only stimuli. PLASA users navigated efficiently to stationary virtual sources, despite varied rendering quality and head-turn latencies between 176 ms and 976 ms. Factors of rendering method, individualisation and head-turn latency showed interaction effects such as greater sensitivity to latency for some rendering methods than others. In general, PLASA task performance levels agreed with expectations from static or technical performance tests, and some results demonstrated similar performance levels to those achieved in the real-source baseline test

    Single channel signal separation using pseudo-stereo model and time-freqency masking

    Get PDF
    PhD ThesisIn many practical applications, one sensor is only available to record a mixture of a number of signals. Single-channel blind signal separation (SCBSS) is the research topic that addresses the problem of recovering the original signals from the observed mixture without (or as little as possible) any prior knowledge of the signals. Given a single mixture, a new pseudo-stereo mixing model is developed. A “pseudo-stereo” mixture is formulated by weighting and time-shifting the original single-channel mixture. This creates an artificial resemblance of a stereo signal given by one location which results in the same time-delay but different attenuation of the source signals. The pseudo-stereo mixing model relaxes the underdetermined ill-conditions associated with monaural source separation and begets the advantage of the relationship of the signals between the readily observed mixture and the pseudo-stereo mixture. This research proposes three novel algorithms based on the pseudo-stereo mixing model and the binary time-frequency (TF) mask. Firstly, the proposed SCBSS algorithm estimates signals’ weighted coefficients from a ratio of the pseudo-stereo mixing model and then constructs a binary maximum likelihood TF masking for separating the observed mixture. Secondly, a mixture in noisy background environment is considered. Thus, a mixture enhancement algorithm has been developed and the proposed SCBSS algorithm is reformulated using an adaptive coefficients estimator. The adaptive coefficients estimator computes the signal characteristics for each time frame. This property is desirable for both speech and audio signals as they are aptly characterized as non-stationary AR processes. Finally, a multiple-time delay (MTD) pseudo-stereo SINGLE CHANNEL SIGNAL SEPARATION ii mixture is developed. The MTD mixture enhances the flexibility as well as the separability over the originally proposed pseudo-stereo mixing model. The separation algorithm of the MTD mixture has also been derived. Additionally, comparison analysis between the MTD mixture and the pseudo-stereo mixture has also been identified. All algorithms have been demonstrated by synthesized and real-audio signals. The performance of source separation has been assessed by measuring the distortion between original source and the estimated one according to the signal-to-distortion (SDR) ratio. Results show that all proposed SCBSS algorithms yield a significantly better separation performance with an average SDR improvement that ranges from 2.4dB to 5dB per source and they are computationally faster over the benchmarked algorithms.Payap University
    corecore