Search CORE

12,556 research outputs found

Bandwidth extension of narrowband speech

Author: Expósito Pérez Miquel
Salavedra Molí Josep
Publication venue: Universidad Politécnica de Valencia
Publication date: 01/01/2014
Field of study

Recently, 4G mobile phone systems have been designed to process wideband speech signals whose sampling frequency is 16 kHz. However, most part of mobile and classical phone network, and current 3G mobile phones, still process narrowband speech signals whose sampling frequency is 8 kHz. During next future, all these systems must be living together. Therefore, sometimes a wideband speech signal (with a bandwidth up to 7,2 kHz) should be estimated from an available narrowband one (whose frequency band is 300-3400 Hz). In this work, different techniques of audio bandwidth extension have been implemented and evaluated. First, a simple non-model-based algorithm (interpolation algorithm) has been implemented. Second, a model-based algorithm (linear mapping) have been designed and evaluated in comparison to previous one. Several CMOS (Comparison Mean Opinion Score) [6] listening tests show that performance of Linear Mapping algorithm clearly overcomes the other one. Results of these tests are very close to those corresponding to original wideband speech signal.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Kalman tracking of linear predictor and harmonic noise models for noisy speech enhancement

Author: Ben Milner
Boll
Chen
Deller
Ephraim
Ephraim
Ephraim
Ephraim
Esfandiar Zavarehei
Friedman
Griffin
Hansen
Ioannis Andrianakis
Jonathan Darch
Kalman
Lim
Lim
Paul White
Qin Yan
Rentzos
Saeed Vaseghi
Sameti
Secrest
Seltzer
Stylianou
Stylianou
Tucker
Turunen
Vaseghi
Weber
Yan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

This paper presents a speech enhancement method based on the tracking and denoising of the formants of a linear prediction (LP) model of the spectral envelope of speech and the parameters of a harmonic noise model (HNM) of its excitation. The main advantages of tracking and denoising the prominent energy contours of speech are the efficient use of the spectral and temporal structures of successive speech frames and a mitigation of processing artefact known as the ‘musical noise’ or ‘musical tones’.The formant-tracking linear prediction (FTLP) model estimation consists of three stages: (a) speech pre-cleaning based on a spectral amplitude estimation, (b) formant-tracking across successive speech frames using the Viterbi method, and (c) Kalman filtering of the formant trajectories across successive speech frames.The HNM parameters for the excitation signal comprise; voiced/unvoiced decision, the fundamental frequency, the harmonics’ amplitudes and the variance of the noise component of excitation. A frequency-domain pitch extraction method is proposed that searches for the peak signal to noise ratios (SNRs) at the harmonics. For each speech frame several pitch candidates are calculated. An estimate of the pitch trajectory across successive frames is obtained using a Viterbi decoder. The trajectories of the noisy excitation harmonics across successive speech frames are modeled and denoised using Kalman filters.The proposed method is used to deconstruct noisy speech, de-noise its model parameters and then reconstitute speech from its cleaned parts. Experimental evaluations show the performance gains of the formant tracking, pitch extraction and noise reduction stages

Crossref

Southampton (e-Prints Soton)

University of East Anglia digital repository

The spectral analysis of nonstationary categorical time series using local spectral envelope

Author: Jeong Hyewook
Jeong Hyewook
Publication venue
Publication date: 27/09/2012
Field of study

Most classical methods for the spectral analysis are based on the assumption that the time series is stationary. However, many time series in practical problems shows nonstationary behaviors. The data from some fields are huge and have variance and spectrum which changes over time. Sometimes,we are interested in the cyclic behavior of the categorical-valued time series such as EEG sleep state data or DNA sequence, the general method is to scale the data, that is, assign numerical values to the categories and then use the periodogram to find the cyclic behavior. But there exists numerous possible scaling. If we arbitrarily assign the numerical values to the categories and proceed with a spectral analysis, then the results will depend on the particular assignment. We would like to find the all possible scaling that bring out all of the interesting features in the data. To overcome these problems, there have been many approaches in the spectral analysis. Our goal is to develop a statistical methodology for analyzing nonstationary categorical time series in the frequency domain. In this dissertation, the spectral envelope methodology is introduced for spectral analysis of categorical time series. This provides the general framework for the spectral analysis of the categorical time series and summarizes information from the spectrum matrix. To apply this method to nonstationary process, I used the TBAS(Tree-Based Adaptive Segmentation) and local spectral envelope based on the piecewise stationary process. In this dissertation,the TBAS(Tree-Based Adpative Segmentation) using distance function based on the Kullback-Leibler divergence was proposed to find the best segmentation

D-Scholarship@Pitt

Estimation of Severity of Speech Disability through Speech Envelope

Author: Gudi Anandthirtha B.
Nagaraj H. C.
Shreedhar H. K.
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 21/07/2011
Field of study

In this paper, envelope detection of speech is discussed to distinguish the pathological cases of speech disabled children. The speech signal samples of children of age between five to eight years are considered for the present study. These speech signals are digitized and are used to determine the speech envelope. The envelope is subjected to ratio mean analysis to estimate the disability. This analysis is conducted on ten speech signal samples which are related to both place of articulation and manner of articulation. Overall speech disability of a pathological subject is estimated based on the results of above analysis.Comment: 8 pages,4 Figures,Signal & Image Processing Journal AIRC

arXiv.org e-Print Archive

Crossref