239 research outputs found

    Pre-processing of Speech Signals for Robust Parameter Estimation

    Get PDF

    Analysis of very low quality speech for mask-based enhancement

    Get PDF
    The complexity of the speech enhancement problem has motivated many different solutions. However, most techniques address situations in which the target speech is fully intelligible and the background noise energy is low in comparison with that of the speech. Thus while current enhancement algorithms can improve the perceived quality, the intelligibility of the speech is not increased significantly and may even be reduced. Recent research shows that intelligibility of very noisy speech can be improved by the use of a binary mask, in which a binary weight is applied to each time-frequency bin of the input spectrogram. There are several alternative goals for the binary mask estimator, based either on the Signal-to-Noise Ratio (SNR) of each time-frequency bin or on the speech signal characteristics alone. Our approach to the binary mask estimation problem aims to preserve the important speech cues independently of the noise present by identifying time-frequency regions that contain significant speech energy. The speech power spectrum varies greatly for different types of speech sound. The energy of voiced speech sounds is concentrated in the harmonics of the fundamental frequency while that of unvoiced sounds is, in contrast, distributed across a broad range of frequencies. To identify the presence of speech energy in a noisy speech signal we have therefore developed two detection algorithms. The first is a robust algorithm that identifies voiced speech segments and estimates their fundamental frequency. The second detects the presence of sibilants and estimates their energy distribution. In addition, we have developed a robust algorithm to estimate the active level of the speech. The outputs of these algorithms are combined with other features estimated from the noisy speech to form the input to a classifier which estimates a mask that accurately reflects the time-frequency distribution of speech energy even at low SNR levels. We evaluate a mask-based speech enhancer on a range of speech and noise signals and demonstrate a consistent increase in an objective intelligibility measure with respect to noisy speech.Open Acces

    Model-based speech enhancement for hearing aids

    Get PDF

    Statistical Properties and Applications of Empirical Mode Decomposition

    Get PDF
    Signal analysis is key to extracting information buried in noise. The decomposition of signal is a data analysis tool for determining the underlying physical components of a processed data set. However, conventional signal decomposition approaches such as wavelet analysis, Wagner-Ville, and various short-time Fourier spectrograms are inadequate to process real world signals. Moreover, most of the given techniques require \emph{a prior} knowledge of the processed signal, to select the proper decomposition basis, which makes them improper for a wide range of practical applications. Empirical Mode Decomposition (EMD) is a non-parametric and adaptive basis driver that is capable of breaking-down non-linear, non-stationary signals into an intrinsic and finite components called Intrinsic Mode Functions (IMF). In addition, EMD approximates a dyadic filter that isolates high frequency components, e.g. noise, in higher index IMFs. Despite of being widely used in different applications, EMD is an ad hoc solution. The adaptive performance of EMD comes at the expense of formulating a theoretical base. Therefore, numerical analysis is usually adopted in literature to interpret the behavior. This dissertation involves investigating statistical properties of EMD and utilizing the outcome to enhance the performance of signal de-noising and spectrum sensing systems. The novel contributions can be broadly summarized in three categories: a statistical analysis of the probability distributions of the IMFs and a suggestion of Generalized Gaussian distribution (GGD) as a best fit distribution; a de-noising scheme based on a null-hypothesis of IMFs utilizing the unique filter behavior of EMD; and a novel noise estimation approach that is used to shift semi-blind spectrum sensing techniques into fully-blind ones based on the first IMF. These contributions are justified statistically and analytically and include comparison with other state of art techniques

    Some New Results on the Estimation of Sinusoids in Noise

    Get PDF

    Codebook-based Bayesian speech enhancement for nonstationary environments

    Get PDF
    In this paper, we propose a Bayesian minimum mean squared error approach for the joint estimation of the short-term predictor parameters of speech and noise, from the noisy observation. We use trained codebooks of speech and noise linear predictive coefficients to model the a priori information required by the Bayesian scheme. In contrast to current Bayesian estimation approaches that consider the excitation variances as part of the a priori information, in the proposed method they are computed online for each short-time segment, based on the observation at hand. Consequently, the method performs well in nonstationary noise conditions. The resulting estimates of the speech and noise spectra can be used in a Wiener filter or any state-of-the-art speech enhancement system. We develop both memoryless (using information from the current frame alone) and memory-based (using information from the current and previous frames) estimators. Estimation of functions of the short-term predictor parameters is also addressed, in particular one that leads to the minimum mean squared error estimate of the clean speech signal. Experiments indicate that the scheme proposed in this paper performs significantly better than competing method
    • …
    corecore