52 research outputs found

    Speech enhancement using voice source models

    Get PDF

    Analysis of nonmodal glottal event patterns with application to automatic speaker recognition

    Get PDF
    Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2008.Includes bibliographical references (p. 211-215).Regions of phonation exhibiting nonmodal characteristics are likely to contain information about speaker identity, language, dialect, and vocal-fold health. As a basis for testing such dependencies, we develop a representation of patterns in the relative timing and height of nonmodal glottal pulses. To extract the timing and height of candidate pulses, we investigate a variety of inverse-filtering schemes including maximum-entropy deconvolution that minimizes predictability of a signal and minimum-entropy deconvolution that maximizes pulse-likeness. Hybrid formulations of these methods are also considered. we then derive a theoretical framework for understanding frequency- and time-domain properties of a pulse sequence, a process that sheds light on the transformation of nonmodal pulse trains into useful parameters. In the frequency domain, we introduce the first comprehensive mathematical derivation of the effect of deterministic and stochastic source perturbation on the short-time spectrum. We also propose a pitch representation of nonmodality that provides an alternative viewpoint on the frequency content that does not rely on Fourier bases. In developing time-domain properties, we use projected low-dimensional histograms of feature vectors derived from pulse timing and height parameters. For these features, we have found clusters of distinct pulse patterns, reflecting a wide variety of glottal-pulse phenomena including near-modal phonation, shimmer and jitter, diplophonia and triplophonia, and aperiodicity. Using temporal relationships between successive feature vectors, an algorithm by which to separate these different classes of glottal-pulse characteristics has also been developed.(cont.) We have used our glottal-pulse-pattern representation to automatically test for one signal dependency: speaker dependence of glottal-pulse sequences. This choice is motivated by differences observed between talkers in our separated feature space. Using an automatic speaker verification experiment, we investigate tradeoffs in speaker dependency for short-time pulse patterns, reflecting local irregularity, as well as long-time patterns related to higher-level cyclic variations. Results, using speakers with a broad array of modal and nonmodal behaviors, indicate a high accuracy in speaker recognition performance, complementary to the use of conventional mel-cepstral features. These results suggest that there is rich structure to the source excitation that provides information about a particular speaker's identity.by Nicolas Malyska.Ph.D

    Registers in Singing. Empirical and Systematic Studies in the Theory of the Singing Voice

    Get PDF

    Vocal qualities in female singing.

    Get PDF

    Development of acoustic analysis techniques for use in diagnosis of vocal pathology

    Get PDF
    Acoustic analysis as used in the vocal pathology literature has come to mean any spectrum or waveform measurement taken from the digitised speech signal. The purpose of the work as set out in the present thesis is to investigate the currently available acoustic measures, to test their validity and to introduce new measures. More specifically, pitch extraction techniques and perturbation measures have been tested, several harmonic to noise ratio techniques have been implemented and thoroughly investigated (three of which are new) and cepstral and other spectral measures have been examined. Also, ratios relevant to voice source characteristics and perceptual correlation have been considered in addition to the tradition harmonic to noise ratios. A study of these approaches has revealed that many measurement problems arise and that the separation of the indices into independent measures is not a simple issue. The most commonly used acoustic measures for diagnosis o f vocal pathology are jitter, shimmer and the harmonic to noise ratio. However, several researchers have shown that these measures are not independent and therefore may give ambiguous information. For example, the addition of random noise causes increased jitter measurements and the introduction of jitter causes a reduced harmonic to noise ratio. Recent studies have shown that the glottal waveform and hence vibratory pattern of the vocal folds may be estimated in terms of spectral measurements. However, in order to provide spectral characterisation of the vibratory pattern in pathological voice types the effects of jitter and shimmer on the speech spectrum must firstly be removed. These issues are thoroughly addressed in this thesis. The foundation has been laid for future studies that will investigate the vibratory pattern of the vocal folds based on spectral evaluation of tape recorded data. All analysis techniques are tested by initially running them on specially designed synthesis data files and on a group of 13 patients with varying pathologies and a group of twelve normals. Finally, the possibility of using digital spectrograms for speaker identification purposes has been addressed

    Broadcast speech and the effect of voice quality on the listener : a study of the various components which categorise listener perception by vocal characteristics.

    Get PDF
    Voice quality is crucial to the art of the broadcast speaker. Acceptable voice quality is a necessity for an acceptable microphone voice and essential therefore for employment as a broadcaster. This thesis investigates the characteristics of the voice which provide that acceptability; and categorises the features which lead the listener to make judgements about their vocal likes and dislikes. These subjective judgements are explored by investigating the psychological, medical, and innate features contributing to the vocal perceptions of the listener. Voice quality is related to the efficiency of the larynx and its importance to voice production; and to the various vocal disorders which can affect the broadcaster. It becomes evident throughout the thesis that each listener receives a clear impression of the personality of the speaker through the features present in the voice. Many of these impressions however are based on stereotypes. The thesis relates these stereotypical judgements to accents, investigating their relationship to the 'BBC' voice, the 'World Service' voice, the 'ILR' voice and the 'reporter's voice' . It is shown that the listener's subjective impression of the voice and the broadcaster personality is formed by the presentational and physical aspects of voice quality. Listener perceptions of voice acceptability are tested and discussed. The data is analysed to provide a set of dominant characteristics from which are drawn voice histograms and frequency polygons. The result is a set of preferred voice characteristics which apply specifically to the broadcast speaker and which can be sought during the selection process

    Text-Independent Voice Conversion

    Get PDF
    This thesis deals with text-independent solutions for voice conversion. It first introduces the use of vocal tract length normalization (VTLN) for voice conversion. The presented variants of VTLN allow for easily changing speaker characteristics by means of a few trainable parameters. Furthermore, it is shown how VTLN can be expressed in time domain strongly reducing the computational costs while keeping a high speech quality. The second text-independent voice conversion paradigm is residual prediction. In particular, two proposed techniques, residual smoothing and the application of unit selection, result in essential improvement of both speech quality and voice similarity. In order to apply the well-studied linear transformation paradigm to text-independent voice conversion, two text-independent speech alignment techniques are introduced. One is based on automatic segmentation and mapping of artificial phonetic classes and the other is a completely data-driven approach with unit selection. The latter achieves a performance very similar to the conventional text-dependent approach in terms of speech quality and similarity. It is also successfully applied to cross-language voice conversion. The investigations of this thesis are based on several corpora of three different languages, i.e., English, Spanish, and German. Results are also presented from the multilingual voice conversion evaluation in the framework of the international speech-to-speech translation project TC-Star

    Prediction of room acoustical parameters (A)

    Get PDF
    corecore