4,950 research outputs found

    Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

    Get PDF
    Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks

    Codebook-based Bayesian speech enhancement for nonstationary environments

    Get PDF
    In this paper, we propose a Bayesian minimum mean squared error approach for the joint estimation of the short-term predictor parameters of speech and noise, from the noisy observation. We use trained codebooks of speech and noise linear predictive coefficients to model the a priori information required by the Bayesian scheme. In contrast to current Bayesian estimation approaches that consider the excitation variances as part of the a priori information, in the proposed method they are computed online for each short-time segment, based on the observation at hand. Consequently, the method performs well in nonstationary noise conditions. The resulting estimates of the speech and noise spectra can be used in a Wiener filter or any state-of-the-art speech enhancement system. We develop both memoryless (using information from the current frame alone) and memory-based (using information from the current and previous frames) estimators. Estimation of functions of the short-term predictor parameters is also addressed, in particular one that leads to the minimum mean squared error estimate of the clean speech signal. Experiments indicate that the scheme proposed in this paper performs significantly better than competing method

    Military Radio Communications Research in Australia

    Get PDF
    An overview of recent research by the Australian Defence Science and Technology Organisation in the field of military radio communications is presented. A philosophy for improving digital radio system performance over complex, variable channels is outlined. A key breakthrough, called PDF-directed adaptive radio, which can provide substantially greater throughput over HF channels whilst minimising bit-error rate and delay, is described. Simulation results for fast adaptive Schemes applied to both serial-tone and parallel-tone HF modems are presented and shown to significantly out-perform fixed rate modems and modems employing hybrid automatic-repeat-request schemes. A new detector scheme is discussed which has superior performance to conventional detectors for digital traffic in the presence of inter-symbol interference and impulsive noise

    Variability and coding efficiency of noisy neural spike encoders

    Get PDF
    Encoding synaptic inputs as a train of action potentials is a fundamental function of nerve cells. Although spike trains recorded in vivo have been shown to be highly variable, it is unclear whether variability in spike timing represents faithful encoding of temporally varying synaptic inputs or noise inherent in the spike encoding mechanism. It has been reported that spike timing variability is more pronounced for constant, unvarying inputs than for inputs with rich temporal structure. This could have significant implications for the nature of neural coding, particularly if precise timing of spikes and temporal synchrony between neurons is used to represent information in the nervous system. To study the potential functional role of spike timing variability, we estimate the fraction of spike timing variability which conveys information about the input for two types of noisy spike encoders — an integrate and fire model with randomly chosen thresholds and a model of a patch of neuronal membrane containing stochastic Na+ and K+ channels obeying Hodgkin–Huxley kinetics. The quality of signal encoding is assessed by reconstructing the input stimuli from the output spike trains using optimal linear mean square estimation. A comparison of the estimation performance of noisy neuronal models of spike generation enables us to assess the impact of neuronal noise on the efficacy of neural coding. The results for both models suggest that spike timing variability reduces the ability of spike trains to encode rapid time-varying stimuli. Moreover, contrary to expectations based on earlier studies, we find that the noisy spike encoding models encode slowly varying stimuli more effectively than rapidly varying ones

    LMS Based Adaptive Channel Estimation for LTE Uplink

    Get PDF
    In this paper, a variable step size based least mean squares (LMS) channel estimation (CE) algorithm is presented for a single carrier frequency division multiple access(SC-FDMA) system under the umbrella of the long term evolution (LTE). This unbiased CE method can automatically adapts the weighting coefficients on the channel condition. Therefore, it does not require knowledge of channel,and noise statistics. Furthermore, it uses a phase weighting scheme to eliminate the signal fluctuations due to noise and decision errors. Such approaches can guarantee the convergence towards the true channel coefficient. The mean and mean square behaviors of the proposed CE algorithm are also analyzed. With the help of theoretical analysis and simulation results, we prove that the proposed algorithm outperforms the existing algorithms in terms of mean square error (MSE) and bit error rate (BER) by more than around 2.5dB

    Model-Based Speech Enhancement

    Get PDF
    Abstract A method of speech enhancement is developed that reconstructs clean speech from a set of acoustic features using a harmonic plus noise model of speech. This is a significant departure from traditional filtering-based methods of speech enhancement. A major challenge with this approach is to estimate accurately the acoustic features (voicing, fundamental frequency, spectral envelope and phase) from noisy speech. This is achieved using maximum a-posteriori (MAP) estimation methods that operate on the noisy speech. In each case a prior model of the relationship between the noisy speech features and the estimated acoustic feature is required. These models are approximated using speaker-independent GMMs of the clean speech features that are adapted to speaker-dependent models using MAP adaptation and for noise using the Unscented Transform. Objective results are presented to optimise the proposed system and a set of subjective tests compare the approach with traditional enhancement methods. Threeway listening tests examining signal quality, background noise intrusiveness and overall quality show the proposed system to be highly robust to noise, performing significantly better than conventional methods of enhancement in terms of background noise intrusiveness. However, the proposed method is shown to reduce signal quality, with overall quality measured to be roughly equivalent to that of the Wiener filter
    corecore