34 research outputs found

    A Wearable Platform for Research in Augmented Hearing.

    Get PDF
    We have previously reported a realtime, open-source speech-processing platform (OSP) for hearing aids (HAs) research. In this contribution, we describe a wearable version of this platform to facilitate audiological studies in the lab and in the field. The system is based on smartphone chipsets to leverage power efficiency in terms of FLOPS/watt and economies of scale. We present the system architecture and discuss salient design elements in support of HA research. The ear-level assemblies support up to 4 microphones on each ear, with 96 kHz, 24 bit codecs. The wearable unit runs OSP Release 2018c on top of 64-bit Debian Linux for binaural HA with an overall latency of 5.6 ms. The wearable unit also hosts an embedded web server (EWS) to monitor and control the HA state in realtime. We describe three example web apps in support of typical audiological studies they enable. Finally, we describe a baseline speech enhancement module included with Release 2018c, and describe extensions to the algorithms as future work

    Wiener Filter and Deep Neural Networks: A Well-Balanced Pair for Speech Enhancement

    Get PDF
    This paper proposes a Deep Learning (DL) based Wiener filter estimator for speech enhancement in the framework of the classical spectral-domain speech estimator algorithm. According to the characteristics of the intermediate steps of the speech enhancement algorithm, i.e., the SNR estimation and the gain function, there is determined the best usage of the network at learning a robust instance of the Wiener filter estimator. Experiments show that the use of data-driven learning of the SNR estimator provides robustness to the statistical-based speech estimator algorithm and achieves performance on the state-of-the-art. Several objective quality metrics show the performance of the speech enhancement and beyond them, there are examples of noisy vs. enhanced speech available for listening to demonstrate in practice the skills of the method in simulated and real audio

    A Two-Step Adaptive Noise Cancellation System for Dental-Drill Noise Reduction

    Get PDF
    This paper introduces a two-step dental-drill Noise Reduction (NR) technique based upon the Adaptive Noise Cancellation (ANC) system. The proposed technique is particularly designed for the NR headphone, which the patients should be wearing while having their dental treatment. In the first step, a tone-frequency extraction algorithm is proposed to estimate the main sinusoidal frequency of the dental-drill noise. The estimated sinusoidal signal is therefore removed significantly from the dental-drill noise by the use of the ANC system. Then, by using another ANC system and a high-pass filter in the second step, the residual high-frequency components of the dental-drill noise are eliminated sufficiently. Computer simulations based on recorded dental-drill sounds and real speech signals demonstrate the efficiency of the proposed two-step ANC system for dental-drill noise reduction, both in terms of the noise attenuation performance and the speech quality of the enhanced speech signal, as compared to the conventional two-microphone ANC system under ideal situation. Moreover, results of a subjective listening test with 15 listeners are also given to guarantee satisfied speech quality of the enhanced speech signal employing the proposed two-step dental-drill NR technique

    Efficient Noise Suppression for Robust Speech Recognition

    Get PDF
    Electrical EngineeringThis thesis addresses the issues of single microphone based noise estimation technique for speech recognition in noise environments. A lot of researches have been performed on the environmental noise estimation, however most of them require voice activity detector (VAD) for accurate estimation of noise characteristics. I propose two approaches for efficient noise estimation without VAD. The first approach aims at improving the conventional quantile-based noise estimation (QBNE). I fostered the QBNE by adjusting the quantile level (QL) according to the relative amount of added noise to the target speech. Basically, we assign two different QLs, i.e., binary levels, according to the measured statistical moment of log scale power spectrum at each frequency. The second approach is applying dual mixture parametric model in computing likelihoods of speech and non-speech classes. I used dual Gaussian mixture model (GMM) and Rayleigh mixture model (RMM) for the likelihoods. From the assumption that speech is generally uncorrelated to the environmental noises, the noise power spectrum can be estimated by using each mixture model parameter of speech absence class. I compared the proposed methods with the conventional QBNE and minimum statistics based method on a simple speech recognition task in various signal-to-noise ratio (SNR) levels. Based on the experimental results, the proposed methods are shown to be superior to the conventional methods.ope

    Data utility modelling for mismatch reduction

    Get PDF
    In the "missing data" (MD) approach to noise robust automatic speech recognition (ASR), speech models are trained on clean data, and during recognition sections of spectral data dominated by noise are detected and treated as "missing". However, this all-or-nothing hard decision about which data is missing does not accurately reflect the probabilistic nature of missing data detection. Recent work has shown greatly improved performance by the "soft missing data" (SMD) approach, in which the "missing" status of each data value is represented by a continuous probability rather than a 0/1 value. This probability is then used to weight between the different likelihood contributions which the MD model normally assigns to each spectral observation according to its "missing" status. This article presents an analysis which shows that the SMD approach effectively implements a Maximum A-Posteriori (MAP) decoding strategy with missing or uncertain data, subject to the interpretation that the missing/not-missing probabilities are weights for a mixture pdf which models the pdf for each hidden clean data input, after conditioning by the noisy data input, a local noise estimate, and any information which may be available. An important feature of this "soft data" model is that control over the "evidence pdf" can provide a principled framework not only for ignoring unreliable data, but also for focusing attention on more discriminative features, and for data enhancement

    Data utility modelling for mismatch reduction

    Get PDF
    In the "missing data" (MD) approach to noise robust automatic speech recognition (ASR), speech models are trained on clean data, and during recognition sections of spectral data dominated by noise are detected and treated as "missing". However, this all-or-nothing hard decision about which data is missing does not accurately reflect the probabilistic nature of missing data detection. Recent work has shown greatly improved performance by the "soft missing data" (SMD) approach, in which the "missing" status of each data value is represented by a continuous probability rather than a 0/1 value. This probability is then used to weight between the different likelihood contributions which the MD model normally assigns to each spectral observation according to its "missing" status. This article presents an analysis which shows that the SMD approach effectively implements a Maximum A-Posteriori (MAP) decoding strategy with missing or uncertain data, subject to the interpretation that the missing/not-missing probabilities are weights for a mixture pdf which models the pdf for each hidden clean data input, after conditioning by the noisy data input, a local noise estimate, and any information which may be available. An important feature of this "soft data" model is that control over the "evidence pdf" can provide a principled framework not only for ignoring unreliable data, but also for focusing attention on more discriminative features, and for data enhancement
    corecore