28,974 research outputs found

    AN EFFICIENT SPEECH GENERATIVE MODEL BASED ON DETERMINISTIC/STOCHASTIC SEPARATION OF SPECTRAL ENVELOPES

    Get PDF
    The paper presents a speech generative model that provides an efficient way of generating speech waveform from its amplitude spectral envelopes. The model is based on hybrid speech representation that includes deterministic (harmonic) and stochastic (noise) components. The main idea behind the approach originates from the fact that speech signal has a determined spectral structure that is statistically bound with deterministic/stochastic energy distribution in the spectrum. The performance of the model is evaluated using an experimental low-bitrate wide-band speech coder. The quality of reconstructed speech is evaluated using objective and subjective methods. Two objective quality characteristics were calculated: Modified Bark Spectral Distortion (MBSD) and Perceptual Evaluation of Speech Quality (PESQ). Narrow-band and wide-band versions of the proposed solution were compared with MELP (Mixed Excitation Linear Prediction) speech coder and AMR (Adaptive Multi-Rate) speech coder, respectively. The speech base of two female and two male speakers were used for testing. The performed tests show that overall performance of the proposed approach is speaker-dependent and it is better for male voices. Supposedly, this difference indicates the influence of pitch highness on separation accuracy. In that way, using the proposed approach in experimental speech compression system provides decent MBSD values and comparable PESQ values with AMR speech coder at 6,6 kbit/s. Additional subjective listening testsdemonstrate that the implemented coding system retains phonetic content and speaker’s identity. It proves consistency of the proposed approach.The paper presents a speech generative model that provides an efficient way of generating speech waveform from its amplitude spectral envelopes. The model is based on hybrid speech representation that includes deterministic (harmonic) and stochastic (noise) components. The main idea behind the approach originates from the fact that speech signal has a determined spectral structure that is statistically bound with deterministic/stochastic energy distribution in the spectrum. The performance of the model is evaluated using an experimental low-bitrate wide-band speech coder. The quality of reconstructed speech is evaluated using objective and subjective methods. Two objective quality characteristics were calculated: Modified Bark Spectral Distortion (MBSD) and Perceptual Evaluation of Speech Quality (PESQ). Narrow-band and wide-band versions of the proposed solution were compared with MELP (Mixed Excitation Linear Prediction) speech coder and AMR (Adaptive Multi-Rate) speech coder, respectively. The speech base of two female and two male speakers were used for testing. The performed tests show that overall performance of the proposed approach is speaker-dependent and it is better for male voices. Supposedly, this difference indicates the influence of pitch highness on separation accuracy. In that way, using the proposed approach in experimental speech compression system provides decent MBSD values and comparable PESQ values with AMR speech coder at 6,6 kbit/s. Additional subjective listening testsdemonstrate that the implemented coding system retains phonetic content and speaker’s identity. It proves consistency of the proposed approach

    A unified wavelet-based modelling framework for non-linear system identification: the WANARX model structure

    Get PDF
    A new unified modelling framework based on the superposition of additive submodels, functional components, and wavelet decompositions is proposed for non-linear system identification. A non-linear model, which is often represented using a multivariate non-linear function, is initially decomposed into a number of functional components via the wellknown analysis of variance (ANOVA) expression, which can be viewed as a special form of the NARX (non-linear autoregressive with exogenous inputs) model for representing dynamic input–output systems. By expanding each functional component using wavelet decompositions including the regular lattice frame decomposition, wavelet series and multiresolution wavelet decompositions, the multivariate non-linear model can then be converted into a linear-in-theparameters problem, which can be solved using least-squares type methods. An efficient model structure determination approach based upon a forward orthogonal least squares (OLS) algorithm, which involves a stepwise orthogonalization of the regressors and a forward selection of the relevant model terms based on the error reduction ratio (ERR), is employed to solve the linear-in-the-parameters problem in the present study. The new modelling structure is referred to as a wavelet-based ANOVA decomposition of the NARX model or simply WANARX model, and can be applied to represent high-order and high dimensional non-linear systems

    A new class of wavelet networks for nonlinear system identification

    Get PDF
    A new class of wavelet networks (WNs) is proposed for nonlinear system identification. In the new networks, the model structure for a high-dimensional system is chosen to be a superimposition of a number of functions with fewer variables. By expanding each function using truncated wavelet decompositions, the multivariate nonlinear networks can be converted into linear-in-the-parameter regressions, which can be solved using least-squares type methods. An efficient model term selection approach based upon a forward orthogonal least squares (OLS) algorithm and the error reduction ratio (ERR) is applied to solve the linear-in-the-parameters problem in the present study. The main advantage of the new WN is that it exploits the attractive features of multiscale wavelet decompositions and the capability of traditional neural networks. By adopting the analysis of variance (ANOVA) expansion, WNs can now handle nonlinear identification problems in high dimensions
    corecore