12,556 research outputs found

    Bandwidth extension of narrowband speech

    Get PDF
    Recently, 4G mobile phone systems have been designed to process wideband speech signals whose sampling frequency is 16 kHz. However, most part of mobile and classical phone network, and current 3G mobile phones, still process narrowband speech signals whose sampling frequency is 8 kHz. During next future, all these systems must be living together. Therefore, sometimes a wideband speech signal (with a bandwidth up to 7,2 kHz) should be estimated from an available narrowband one (whose frequency band is 300-3400 Hz). In this work, different techniques of audio bandwidth extension have been implemented and evaluated. First, a simple non-model-based algorithm (interpolation algorithm) has been implemented. Second, a model-based algorithm (linear mapping) have been designed and evaluated in comparison to previous one. Several CMOS (Comparison Mean Opinion Score) [6] listening tests show that performance of Linear Mapping algorithm clearly overcomes the other one. Results of these tests are very close to those corresponding to original wideband speech signal.Postprint (published version

    Kalman tracking of linear predictor and harmonic noise models for noisy speech enhancement

    Get PDF
    This paper presents a speech enhancement method based on the tracking and denoising of the formants of a linear prediction (LP) model of the spectral envelope of speech and the parameters of a harmonic noise model (HNM) of its excitation. The main advantages of tracking and denoising the prominent energy contours of speech are the efficient use of the spectral and temporal structures of successive speech frames and a mitigation of processing artefact known as the ‘musical noise’ or ‘musical tones’.The formant-tracking linear prediction (FTLP) model estimation consists of three stages: (a) speech pre-cleaning based on a spectral amplitude estimation, (b) formant-tracking across successive speech frames using the Viterbi method, and (c) Kalman filtering of the formant trajectories across successive speech frames.The HNM parameters for the excitation signal comprise; voiced/unvoiced decision, the fundamental frequency, the harmonics’ amplitudes and the variance of the noise component of excitation. A frequency-domain pitch extraction method is proposed that searches for the peak signal to noise ratios (SNRs) at the harmonics. For each speech frame several pitch candidates are calculated. An estimate of the pitch trajectory across successive frames is obtained using a Viterbi decoder. The trajectories of the noisy excitation harmonics across successive speech frames are modeled and denoised using Kalman filters.The proposed method is used to deconstruct noisy speech, de-noise its model parameters and then reconstitute speech from its cleaned parts. Experimental evaluations show the performance gains of the formant tracking, pitch extraction and noise reduction stages

    The spectral analysis of nonstationary categorical time series using local spectral envelope

    Get PDF
    Most classical methods for the spectral analysis are based on the assumption that the time series is stationary. However, many time series in practical problems shows nonstationary behaviors. The data from some fields are huge and have variance and spectrum which changes over time. Sometimes,we are interested in the cyclic behavior of the categorical-valued time series such as EEG sleep state data or DNA sequence, the general method is to scale the data, that is, assign numerical values to the categories and then use the periodogram to find the cyclic behavior. But there exists numerous possible scaling. If we arbitrarily assign the numerical values to the categories and proceed with a spectral analysis, then the results will depend on the particular assignment. We would like to find the all possible scaling that bring out all of the interesting features in the data. To overcome these problems, there have been many approaches in the spectral analysis. Our goal is to develop a statistical methodology for analyzing nonstationary categorical time series in the frequency domain. In this dissertation, the spectral envelope methodology is introduced for spectral analysis of categorical time series. This provides the general framework for the spectral analysis of the categorical time series and summarizes information from the spectrum matrix. To apply this method to nonstationary process, I used the TBAS(Tree-Based Adaptive Segmentation) and local spectral envelope based on the piecewise stationary process. In this dissertation,the TBAS(Tree-Based Adpative Segmentation) using distance function based on the Kullback-Leibler divergence was proposed to find the best segmentation

    Estimation of Severity of Speech Disability through Speech Envelope

    Full text link
    In this paper, envelope detection of speech is discussed to distinguish the pathological cases of speech disabled children. The speech signal samples of children of age between five to eight years are considered for the present study. These speech signals are digitized and are used to determine the speech envelope. The envelope is subjected to ratio mean analysis to estimate the disability. This analysis is conducted on ten speech signal samples which are related to both place of articulation and manner of articulation. Overall speech disability of a pathological subject is estimated based on the results of above analysis.Comment: 8 pages,4 Figures,Signal & Image Processing Journal AIRC
    • 

    corecore