39 research outputs found

    Auditory-inspired morphological processing of speech spectrograms: applications in automatic speech recognition and speech enhancement

    Get PDF
    New auditory-inspired speech processing methods are presented in this paper, combining spectral subtraction and two-dimensional non-linear filtering techniques originally conceived for image processing purposes. In particular, mathematical morphology operations, like erosion and dilation, are applied to noisy speech spectrograms using specifically designed structuring elements inspired in the masking properties of the human auditory system. This is effectively complemented with a pre-processing stage including the conventional spectral subtraction procedure and auditory filterbanks. These methods were tested in both speech enhancement and automatic speech recognition tasks. For the first, time-frequency anisotropic structuring elements over grey-scale spectrograms were found to provide a better perceptual quality than isotropic ones, revealing themselves as more appropriate—under a number of perceptual quality estimation measures and several signal-to-noise ratios on the Aurora database—for retaining the structure of speech while removing background noise. For the second, the combination of Spectral Subtraction and auditory-inspired Morphological Filtering was found to improve recognition rates in a noise-contaminated version of the Isolet database.This work has been partially supported by the Spanish Ministry of Science and Innovation CICYT Project No. TEC2008-06382/TEC.Publicad

    Phase estimation with application to speech analysis-synthesis

    No full text
    Thesis (Sc.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1980.MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING.Vita.Includes bibliographical references.by Thomas F. Quatieri, Jr.Sc.D

    The design of two-dimensional digital filters by generalized McClellan transformations

    No full text
    Thesis. 1975. M.S.--Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.Includes bibliographical references.by Thomas Francis Quatieri, Jr.M.S

    Speaker Recognition Using G.729 Speech Codec Parameters

    No full text
    Experiments in Gaussian-mixture-model speaker recognition from mel-cepstra, derived from mel-filter bank energies (MFBs) of the G.729 codec all-pole spectral envelope, showed significant performance loss relative to the standard mel-cepstral coefficients of G.729 synthesized (coded) speech [5]. In this paper, we investigate two approaches to recover speaker recognition performance from G.729 parameters. The first is a parametric approach that makes explicit use of G.729 parameters, rather than deriving cepstra from MFBs of an all-pole spectrum. Specifically, the G.729 LSFs are converted to "direct" cepstral coefficients for which there exists a one-to-one correspondence with the LSFs. The G.729 residual is also considered# in particular, appending G.729 pitch as a single parameter to the direct cepstral coefficients gives further performance gain. The second nonparametric approach uses the original MFB paradigm, but adds harmonic striations to the G.729 all-pole spectral envelope. Although obtaining considerable performance gains with these methods, we have yet to match the performance of G.729 synthesized speech, motivating the need for representing additional fine structure of the G.729 residual
    corecore