18,672 research outputs found

    Objective measures for predicting the intelligibility of spectrally smoothed speech with artificial excitation

    Get PDF
    A study is presented on how well objective measures of speech quality and intelligibility can predict the subjective in- telligibility of speech that has undergone spectral envelope smoothing and simplification of its excitation. Speech modi- fications are made by resynthesising speech that has been spec- trally smoothed. Objective measures are applied to the mod- ified speech and include measures of speech quality, signal- to-noise ratio and intelligibility, as well as proposing the nor- malised frequency-weighted spectral distortion (NFD) measure. The measures are compared to subjective intelligibility scores where it is found that several have high correlation (|r| ≄ 0.7), with NFD achieving the highest correlation (r = −0.81

    Voice morphing using the generative topographic mapping

    Get PDF
    In this paper we address the problem of Voice Morphing. We attempt to transform the spectral characteristics of a source speaker's speech signal so that the listener would believe that the speech was uttered by a target speaker. The voice morphing system transforms the spectral envelope as represented by a Linear Prediction model. The transformation is achieved by codebook mapping using the Generative Topographic Mapping, a non-linear, latent variable, parametrically constrained, Gaussian Mixture Model

    Estimation d'enveloppes spectrales contraintes temporellement pour la conversion de voix

    No full text
    National audienceThis paper presents a new approach to estimating the speech spectral envelope that is adapted for Voice Conversion (VC). In particular, we represent the spectral envelope as a sum of peaks that evolve smoothly in time, within a phoneme. We highlight important properties of our proposed spectral envelope estimation and illustrate its potential for use in a VC context. We analyse natural speech using the proposed methods and we compare results with those from a more traditional frame-by-frame cepstrum-based analysis. Subjective comparisons of synthesized speech quality, as well as implications of this work in future research are also discussed

    DEVELOPMENT AND EVALUATION OF ENVELOPE, SPECTRAL AND TIME ENHANCEMENT ALGORITHMS FOR AUDITORY NEUROPATHY

    Get PDF
    Auditory neuropathy (AN) is a hearing disorder that reduces the ability to detect temporal cues in speech, thus leading to deprived speech perception. Traditional amplification and frequency shifting techniques used in modern hearing aids are not suitable to assist individuals with AN due to the unique symptoms that result from the disorder. This study proposes a method for combining both speech envelope enhancement and time scaling to combine the proven benefits of each algorithm. In addition, spectral enhancement is cascaded with envelope and time enhancement to address the poor frequency discrimination in AN. The proposed speech enhancement strategy was evaluated using an AN simulator with normal hearing listeners under varying degrees of AN severity. The results showed a significant increase in word recognition scores for time scaling and envelope enhancement over envelope enhancement alone. Furthermore, the addition of spectral enhancement resulted in further increase in word recognition at profound AN severity

    Kalman tracking of linear predictor and harmonic noise models for noisy speech enhancement

    Get PDF
    This paper presents a speech enhancement method based on the tracking and denoising of the formants of a linear prediction (LP) model of the spectral envelope of speech and the parameters of a harmonic noise model (HNM) of its excitation. The main advantages of tracking and denoising the prominent energy contours of speech are the efficient use of the spectral and temporal structures of successive speech frames and a mitigation of processing artefact known as the ‘musical noise’ or ‘musical tones’.The formant-tracking linear prediction (FTLP) model estimation consists of three stages: (a) speech pre-cleaning based on a spectral amplitude estimation, (b) formant-tracking across successive speech frames using the Viterbi method, and (c) Kalman filtering of the formant trajectories across successive speech frames.The HNM parameters for the excitation signal comprise; voiced/unvoiced decision, the fundamental frequency, the harmonics’ amplitudes and the variance of the noise component of excitation. A frequency-domain pitch extraction method is proposed that searches for the peak signal to noise ratios (SNRs) at the harmonics. For each speech frame several pitch candidates are calculated. An estimate of the pitch trajectory across successive frames is obtained using a Viterbi decoder. The trajectories of the noisy excitation harmonics across successive speech frames are modeled and denoised using Kalman filters.The proposed method is used to deconstruct noisy speech, de-noise its model parameters and then reconstitute speech from its cleaned parts. Experimental evaluations show the performance gains of the formant tracking, pitch extraction and noise reduction stages
    • 

    corecore