28,974 research outputs found
AN EFFICIENT SPEECH GENERATIVE MODEL BASED ON DETERMINISTIC/STOCHASTIC SEPARATION OF SPECTRAL ENVELOPES
The paper presents a speech generative model that provides an efficient way of generating speech waveform from its amplitude spectral envelopes. The model is based on hybrid speech representation that includes deterministic (harmonic) and stochastic (noise) components. The main idea behind the approach originates from the fact that speech signal has a determined spectral structure that is statistically bound with deterministic/stochastic energy distribution in the spectrum. The performance of the model is evaluated using an experimental low-bitrate wide-band speech coder. The quality of reconstructed speech is evaluated using objective and subjective methods. Two objective quality characteristics were calculated: Modified Bark Spectral Distortion (MBSD) and Perceptual Evaluation of Speech Quality (PESQ). Narrow-band and wide-band versions of the proposed solution were compared with MELP (Mixed Excitation Linear Prediction) speech coder and AMR (Adaptive Multi-Rate) speech coder, respectively. The speech base of two female and two male speakers were used for testing. The performed tests show that overall performance of the proposed approach is speaker-dependent and it is better for male voices. Supposedly, this difference indicates the influence of pitch highness on separation accuracy. In that way, using the proposed approach in experimental speech compression system provides decent MBSD values and comparable PESQ values with AMR speech coder at 6,6 kbit/s. Additional subjective listening testsdemonstrate that the implemented coding system retains phonetic content and speaker’s identity. It proves consistency of the proposed approach.The paper presents a speech generative model that provides an efficient way of generating speech waveform from its amplitude spectral envelopes. The model is based on hybrid speech representation that includes deterministic (harmonic) and stochastic (noise) components. The main idea behind the approach originates from the fact that speech signal has a determined spectral structure that is statistically bound with deterministic/stochastic energy distribution in the spectrum. The performance of the model is evaluated using an experimental low-bitrate wide-band speech coder. The quality of reconstructed speech is evaluated using objective and subjective methods. Two objective quality characteristics were calculated: Modified Bark Spectral Distortion (MBSD) and Perceptual Evaluation of Speech Quality (PESQ). Narrow-band and wide-band versions of the proposed solution were compared with MELP (Mixed Excitation Linear Prediction) speech coder and AMR (Adaptive Multi-Rate) speech coder, respectively. The speech base of two female and two male speakers were used for testing. The performed tests show that overall performance of the proposed approach is speaker-dependent and it is better for male voices. Supposedly, this difference indicates the influence of pitch highness on separation accuracy. In that way, using the proposed approach in experimental speech compression system provides decent MBSD values and comparable PESQ values with AMR speech coder at 6,6 kbit/s. Additional subjective listening testsdemonstrate that the implemented coding system retains phonetic content and speaker’s identity. It proves consistency of the proposed approach
A unified wavelet-based modelling framework for non-linear system identification: the WANARX model structure
A new unified modelling framework based on the superposition of additive submodels, functional components, and
wavelet decompositions is proposed for non-linear system identification. A non-linear model, which is often represented
using a multivariate non-linear function, is initially decomposed into a number of functional components via the wellknown
analysis of variance (ANOVA) expression, which can be viewed as a special form of the NARX (non-linear
autoregressive with exogenous inputs) model for representing dynamic input–output systems. By expanding each functional
component using wavelet decompositions including the regular lattice frame decomposition, wavelet series and
multiresolution wavelet decompositions, the multivariate non-linear model can then be converted into a linear-in-theparameters
problem, which can be solved using least-squares type methods. An efficient model structure determination
approach based upon a forward orthogonal least squares (OLS) algorithm, which involves a stepwise orthogonalization
of the regressors and a forward selection of the relevant model terms based on the error reduction ratio (ERR), is
employed to solve the linear-in-the-parameters problem in the present study. The new modelling structure is referred to
as a wavelet-based ANOVA decomposition of the NARX model or simply WANARX model, and can be applied to
represent high-order and high dimensional non-linear systems
A new class of wavelet networks for nonlinear system identification
A new class of wavelet networks (WNs) is proposed for nonlinear system identification. In the new networks, the model structure for a high-dimensional system is chosen to be a superimposition of a number of functions with fewer variables. By expanding each function using truncated wavelet decompositions, the multivariate nonlinear networks can be converted into linear-in-the-parameter regressions, which can be solved using least-squares type methods. An efficient model term selection approach based upon a forward orthogonal least squares (OLS) algorithm and the error reduction ratio (ERR) is applied to solve the linear-in-the-parameters problem in the present study. The main advantage of the new WN is that it exploits the attractive features of multiscale wavelet decompositions and the capability of traditional neural networks. By adopting the analysis of variance (ANOVA) expansion, WNs can now handle nonlinear identification problems in high dimensions
- …