469 research outputs found

    Subspace-based Fundamental Frequency Estimation

    Get PDF
    Publication in the conference proceedings of EUSIPCO, Viena, Austria, 200

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    Model-Based Speech Enhancement

    Get PDF
    Abstract A method of speech enhancement is developed that reconstructs clean speech from a set of acoustic features using a harmonic plus noise model of speech. This is a significant departure from traditional filtering-based methods of speech enhancement. A major challenge with this approach is to estimate accurately the acoustic features (voicing, fundamental frequency, spectral envelope and phase) from noisy speech. This is achieved using maximum a-posteriori (MAP) estimation methods that operate on the noisy speech. In each case a prior model of the relationship between the noisy speech features and the estimated acoustic feature is required. These models are approximated using speaker-independent GMMs of the clean speech features that are adapted to speaker-dependent models using MAP adaptation and for noise using the Unscented Transform. Objective results are presented to optimise the proposed system and a set of subjective tests compare the approach with traditional enhancement methods. Threeway listening tests examining signal quality, background noise intrusiveness and overall quality show the proposed system to be highly robust to noise, performing significantly better than conventional methods of enhancement in terms of background noise intrusiveness. However, the proposed method is shown to reduce signal quality, with overall quality measured to be roughly equivalent to that of the Wiener filter

    Cognitive Information Processing

    Get PDF
    Contains reports on six research projects.National Institutes of Health (Grant 5 PO1 GM14940-04)National Institutes of Health (Grant 5 PO1 GM15006-03)Joint Services Electronics Programs (U. S. Army, U.S. Navy, and U.S. Air Force) under Contract DA 28-043-AMC-02536(E

    Detailed versus gross spectro-temporal cues for the perception of stop consonants

    Get PDF
    x+182hlm.;24c

    An investigation into glottal waveform based speech coding

    Get PDF
    Coding of voiced speech by extraction of the glottal waveform has shown promise in improving the efficiency of speech coding systems. This thesis describes an investigation into the performance of such a system. The effect of reverberation on the radiation impedance at the lips is shown to be negligible under normal conditions. Also, the accuracy of the Image Method for adding artificial reverberation to anechoic speech recordings is established. A new algorithm, Pre-emphasised Maximum Likelihood Epoch Detection (PMLED), for Glottal Closure Instant detection is proposed. The algorithm is tested on natural speech and is shown to be both accurate and robust. Two techniques for giottai waveform estimation, Closed Phase Inverse Filtering (CPIF) and Iterative Adaptive Inverse Filtering (IAIF), are compared. In tandem with an LF model fitting procedure, both techniques display a high degree of accuracy However, IAIF is found to be slightly more robust. Based on these results, a Glottal Excited Linear Predictive (GELP) coding system for voiced speech is proposed and tested. Using a differential LF parameter quantisation scheme, the system achieves speech quality similar to that of U S Federal Standard 1016 CELP at a lower mean bit rate while incurring no extra delay

    Pitch and spectral analysis of speech based on an auditory synchrony model

    Get PDF
    Also issued as Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1985.Includes bibliographical references (p. 228-235).Supported in part by the National Institutes of Health. 5 T32 NS07040Stephanie Seneff

    Defining Fundamental Frequency for Almost Harmonic Signals

    Full text link
    In this work, we consider the modeling of signals that are almost, but not quite, harmonic, i.e., composed of sinusoids whose frequencies are close to being integer multiples of a common frequency. Typically, in applications, such signals are treated as perfectly harmonic, allowing for the estimation of their fundamental frequency, despite the signals not actually being periodic. Herein, we provide three different definitions of a concept of fundamental frequency for such inharmonic signals and study the implications of the different choices for modeling and estimation. We show that one of the definitions corresponds to a misspecified modeling scenario, and provides a theoretical benchmark for analyzing the behavior of estimators derived under a perfectly harmonic assumption. The second definition stems from optimal mass transport theory and yields a robust and easily interpretable concept of fundamental frequency based on the signals' spectral properties. The third definition interprets the inharmonic signal as an observation of a randomly perturbed harmonic signal. This allows for computing a hybrid information theoretical bound on estimation performance, as well as for finding an estimator attaining the bound. The theoretical findings are illustrated using numerical examples.Comment: Accepted for publication in IEEE Transactions on Signal Processin
    corecore