12,556 research outputs found
Bandwidth extension of narrowband speech
Recently, 4G mobile phone systems have been
designed to process wideband speech signals whose
sampling frequency is 16 kHz. However, most part of
mobile and classical phone network, and current 3G
mobile phones, still process narrowband speech signals
whose sampling frequency is 8 kHz. During next future,
all these systems must be living together. Therefore,
sometimes a wideband speech signal (with a bandwidth up
to 7,2 kHz) should be estimated from an available
narrowband one (whose frequency band is 300-3400 Hz).
In this work, different techniques of audio bandwidth
extension have been implemented and evaluated. First, a
simple non-model-based algorithm (interpolation
algorithm) has been implemented. Second, a model-based
algorithm (linear mapping) have been designed and
evaluated in comparison to previous one. Several CMOS
(Comparison Mean Opinion Score) [6] listening tests show
that performance of Linear Mapping algorithm clearly
overcomes the other one. Results of these tests are very
close to those corresponding to original wideband speech
signal.Postprint (published version
Kalman tracking of linear predictor and harmonic noise models for noisy speech enhancement
This paper presents a speech enhancement method based on the tracking and denoising of the formants of a linear prediction (LP) model of the spectral envelope of speech and the parameters of a harmonic noise model (HNM) of its excitation. The main advantages of tracking and denoising the prominent energy contours of speech are the efficient use of the spectral and temporal structures of successive speech frames and a mitigation of processing artefact known as the âmusical noiseâ or âmusical tonesâ.The formant-tracking linear prediction (FTLP) model estimation consists of three stages: (a) speech pre-cleaning based on a spectral amplitude estimation, (b) formant-tracking across successive speech frames using the Viterbi method, and (c) Kalman filtering of the formant trajectories across successive speech frames.The HNM parameters for the excitation signal comprise; voiced/unvoiced decision, the fundamental frequency, the harmonicsâ amplitudes and the variance of the noise component of excitation. A frequency-domain pitch extraction method is proposed that searches for the peak signal to noise ratios (SNRs) at the harmonics. For each speech frame several pitch candidates are calculated. An estimate of the pitch trajectory across successive frames is obtained using a Viterbi decoder. The trajectories of the noisy excitation harmonics across successive speech frames are modeled and denoised using Kalman filters.The proposed method is used to deconstruct noisy speech, de-noise its model parameters and then reconstitute speech from its cleaned parts. Experimental evaluations show the performance gains of the formant tracking, pitch extraction and noise reduction stages
The spectral analysis of nonstationary categorical time series using local spectral envelope
Most classical methods for the spectral analysis are based on the assumption that the time
series is stationary. However, many time series in practical problems shows nonstationary
behaviors. The data from some fields are huge and have variance and spectrum which changes
over time. Sometimes,we are interested in the cyclic behavior of the categorical-valued time
series such as EEG sleep state data or DNA sequence, the general method is to scale the
data, that is, assign numerical values to the categories and then use the periodogram to find
the cyclic behavior. But there exists numerous possible scaling. If we arbitrarily assign the
numerical values to the categories and proceed with a spectral analysis, then the results will
depend on the particular assignment. We would like to find the all possible scaling that
bring out all of the interesting features in the data. To overcome these problems, there have
been many approaches in the spectral analysis.
Our goal is to develop a statistical methodology for analyzing nonstationary categorical
time series in the frequency domain. In this dissertation, the spectral envelope methodology
is introduced for spectral analysis of categorical time series. This provides the general
framework for the spectral analysis of the categorical time series and summarizes information
from the spectrum matrix. To apply this method to nonstationary process, I used the
TBAS(Tree-Based Adaptive Segmentation) and local spectral envelope based on the piecewise
stationary process. In this dissertation,the TBAS(Tree-Based Adpative Segmentation)
using distance function based on the Kullback-Leibler divergence was proposed to find the
best segmentation
Estimation of Severity of Speech Disability through Speech Envelope
In this paper, envelope detection of speech is discussed to distinguish the
pathological cases of speech disabled children. The speech signal samples of
children of age between five to eight years are considered for the present
study. These speech signals are digitized and are used to determine the speech
envelope. The envelope is subjected to ratio mean analysis to estimate the
disability. This analysis is conducted on ten speech signal samples which are
related to both place of articulation and manner of articulation. Overall
speech disability of a pathological subject is estimated based on the results
of above analysis.Comment: 8 pages,4 Figures,Signal & Image Processing Journal AIRC
- âŠ