5,308 research outputs found

    On the Computation of the Kullback-Leibler Measure for Spectral Distances

    Get PDF
    Efficient algorithms for the exact and approximate computation of the symmetrical Kullback-Leibler (1998) measure for spectral distances are presented for linear predictive coding (LPC) spectra. A interpretation of this measure is given in terms of the poles of the spectra. The performances of the algorithms in terms of accuracy and computational complexity are assessed for the application of computing concatenation costs in unit-selection-based speech synthesis. With the same complexity and storage requirements, the exact method is superior in terms of accuracy

    Speaker recognition using frequency filtered spectral energies

    Get PDF
    The spectral parameters that result from filtering the frequency sequence of log mel-scaled filter-bank energies with a simple first or second order FIR filter have proved to be an efficient speech representation in terms of both speech recognition rate and computational load. Recently, the authors have shown that this frequency filtering can approximately equalize the cepstrum variance enhancing the oscillations of the spectral envelope curve that are most effective for discrimination between speakers. Even better speaker identification results than using melcepstrum have been obtained on the TIMIT database, especially when white noise was added. On the other hand, the hybridization of both linear prediction and filter-bank spectral analysis using either cepstral transformation or the alternative frequency filtering has been explored for speaker verification. The combination of hybrid spectral analysis and frequency filtering, that had shown to be able to outperform the conventional techniques in clean and noisy word recognition, has yield good text-dependent speaker verification results on the new speaker-oriented telephone-line POLYCOST database.Peer ReviewedPostprint (published version

    Digital waveguide modeling for wind instruments: building a state-space representation based on the Webster-Lokshin model

    Get PDF
    This paper deals with digital waveguide modeling of wind instruments. It presents the application of state-space representations for the refined acoustic model of Webster-Lokshin. This acoustic model describes the propagation of longitudinal waves in axisymmetric acoustic pipes with a varying cross-section, visco-thermal losses at the walls, and without assuming planar or spherical waves. Moreover, three types of discontinuities of the shape can be taken into account (radius, slope and curvature). The purpose of this work is to build low-cost digital simulations in the time domain based on the Webster-Lokshin model. First, decomposing a resonator into independent elementary parts and isolating delay operators lead to a Kelly-Lochbaum network of input/output systems and delays. Second, for a systematic assembling of elements, their state-space representations are derived in discrete time. Then, standard tools of automatic control are used to reduce the complexity of digital simulations in the time domain. The method is applied to a real trombone, and results of simulations are presented and compared with measurements. This method seems to be a promising approach in term of modularity, complexity of calculation and accuracy, for any acoustic resonators based on tubes

    <strong>Non-Gaussian, Non-stationary and Nonlinear Signal Processing Methods - with Applications to Speech Processing and Channel Estimation</strong>

    Get PDF

    A sample selective linear predictive analysis of speech signals

    Full text link
    The Linear Prediction Analysis is one of the popular methods of processing speech. But it has problems in estimating the vocal tract characteristics of voiced sounds uttered by females and children. This is because the conventional linear prediction method assumes that all the sample values in each analysis frame are to be approximated by a linear combination of a definite number of the previous samples whether the previous samples include excitation periods or not. Also, the Linear Prediction analysis is easily affected by source excitation; The vocal tract characteristics of signals of short pitch period can be estimated more accurately by the Sample Selective Linear Prediction (SSLP). The first stage of a SSLP analysis is the conventional linear predictive analysis and in the second stage, only those samples which are under a specified threshold are used for further analysis; This work outlines a numerically stable algorithm for performing the SSLP using the Autocorrelation method. (Abstract shortened by UMI.)

    Cepstral peak prominence: a comprehensive analysis

    Full text link
    An analytical study of cepstral peak prominence (CPP) is presented, intended to provide an insight into its meaning and relation with voice perturbation parameters. To carry out this analysis, a parametric approach is adopted in which voice production is modelled using the traditional source-filter model and the first cepstral peak is assumed to have Gaussian shape. It is concluded that the meaning of CPP is very similar to that of the first rahmonic and some insights are provided on its dependence with fundamental frequency and vocal tract resonances. It is further shown that CPP integrates measures of voice waveform and periodicity perturbations, be them either amplitude, frequency or noise

    Novel Pitch Detection Algorithm With Application to Speech Coding

    Get PDF
    This thesis introduces a novel method for accurate pitch detection and speech segmentation, named Multi-feature, Autocorrelation (ACR) and Wavelet Technique (MAWT). MAWT uses feature extraction, and ACR applied on Linear Predictive Coding (LPC) residuals, with a wavelet-based refinement step. MAWT opens the way for a unique approach to modeling: although speech is divided into segments, the success of voicing decisions is not crucial. Experiments demonstrate the superiority of MAWT in pitch period detection accuracy over existing methods, and illustrate its advantages for speech segmentation. These advantages are more pronounced for gain-varying and transitional speech, and under noisy conditions

    GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES

    Get PDF
    The goal of this dissertation is to develop methods to recover glottal flow pulses, which contain biometrical information about the speaker. The excitation information estimated from an observed speech utterance is modeled as the source of an inverse problem. Windowed linear prediction analysis and inverse filtering are first used to deconvolve the speech signal to obtain a rough estimate of glottal flow pulses. Linear prediction and its inverse filtering can largely eliminate the vocal-tract response which is usually modeled as infinite impulse response filter. Some remaining vocal-tract components that reside in the estimate after inverse filtering are next removed by maximum-phase and minimum-phase decomposition which is implemented by applying the complex cepstrum to the initial estimate of the glottal pulses. The additive and residual errors from inverse filtering can be suppressed by higher-order statistics which is the method used to calculate cepstrum representations. Some features directly provided by the glottal source\u27s cepstrum representation as well as fitting parameters for estimated pulses are used to form feature patterns that were applied to a minimum-distance classifier to realize a speaker identification system with very limited subjects
    • 

    corecore