Search CORE

52 research outputs found

Speech enhancement using voice source models

Author: Yasmin Anisa
Publication venue: 'University of Waterloo'
Publication date: 01/01/1999
Field of study

University of Waterloo's Institutional Repository

Analysis of nonmodal glottal event patterns with application to automatic speaker recognition

Author: Malyska Nicolas, 1977-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2008
Field of study

Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2008.Includes bibliographical references (p. 211-215).Regions of phonation exhibiting nonmodal characteristics are likely to contain information about speaker identity, language, dialect, and vocal-fold health. As a basis for testing such dependencies, we develop a representation of patterns in the relative timing and height of nonmodal glottal pulses. To extract the timing and height of candidate pulses, we investigate a variety of inverse-filtering schemes including maximum-entropy deconvolution that minimizes predictability of a signal and minimum-entropy deconvolution that maximizes pulse-likeness. Hybrid formulations of these methods are also considered. we then derive a theoretical framework for understanding frequency- and time-domain properties of a pulse sequence, a process that sheds light on the transformation of nonmodal pulse trains into useful parameters. In the frequency domain, we introduce the first comprehensive mathematical derivation of the effect of deterministic and stochastic source perturbation on the short-time spectrum. We also propose a pitch representation of nonmodality that provides an alternative viewpoint on the frequency content that does not rely on Fourier bases. In developing time-domain properties, we use projected low-dimensional histograms of feature vectors derived from pulse timing and height parameters. For these features, we have found clusters of distinct pulse patterns, reflecting a wide variety of glottal-pulse phenomena including near-modal phonation, shimmer and jitter, diplophonia and triplophonia, and aperiodicity. Using temporal relationships between successive feature vectors, an algorithm by which to separate these different classes of glottal-pulse characteristics has also been developed.(cont.) We have used our glottal-pulse-pattern representation to automatically test for one signal dependency: speaker dependence of glottal-pulse sequences. This choice is motivated by differences observed between talkers in our separated feature space. Using an automatic speaker verification experiment, we investigate tradeoffs in speaker dependency for short-time pulse patterns, reflecting local irregularity, as well as long-time patterns related to higher-level cyclic variations. Results, using speakers with a broad array of modal and nonmodal behaviors, indicate a high accuracy in speaker recognition performance, complementary to the use of conventional mel-cepstral features. These results suggest that there is rich structure to the source excitation that provides information about a particular speaker's identity.by Nicolas Malyska.Ph.D

DSpace@MIT

Registers in Singing. Empirical and Systematic Studies in the Theory of the Singing Voice

Author: Miller Donald Gray
Publication venue: Ponsen & Looijen BV, Wageningen
Publication date: 01/01/2000
Field of study

University of Groningen

Vocal qualities in female singing.

Author: Evans Michelle
Publication venue: University of York
Publication date: 01/01/1995
Field of study

White Rose E-theses Online

Development of acoustic analysis techniques for use in diagnosis of vocal pathology

Author: Murphy Peter
Publication venue: Dublin City University. School of Physical Sciences
Publication date: 01/01/1997
Field of study

Acoustic analysis as used in the vocal pathology literature has come to mean any spectrum or waveform measurement taken from the digitised speech signal. The purpose of the work as set out in the present thesis is to investigate the currently available acoustic measures, to test their validity and to introduce new measures. More specifically, pitch extraction techniques and perturbation measures have been tested, several harmonic to noise ratio techniques have been implemented and thoroughly investigated (three of which are new) and cepstral and other spectral measures have been examined. Also, ratios relevant to voice source characteristics and perceptual correlation have been considered in addition to the tradition harmonic to noise ratios. A study of these approaches has revealed that many measurement problems arise and that the separation of the indices into independent measures is not a simple issue. The most commonly used acoustic measures for diagnosis o f vocal pathology are jitter, shimmer and the harmonic to noise ratio. However, several researchers have shown that these measures are not independent and therefore may give ambiguous information. For example, the addition of random noise causes increased jitter measurements and the introduction of jitter causes a reduced harmonic to noise ratio. Recent studies have shown that the glottal waveform and hence vibratory pattern of the vocal folds may be estimated in terms of spectral measurements. However, in order to provide spectral characterisation of the vibratory pattern in pathological voice types the effects of jitter and shimmer on the speech spectrum must firstly be removed. These issues are thoroughly addressed in this thesis. The foundation has been laid for future studies that will investigate the vibratory pattern of the vocal folds based on spectral evaluation of tape recorded data. All analysis techniques are tested by initially running them on specially designed synthesis data files and on a group of 13 patients with varying pathologies and a group of twelve normals. Finally, the possibility of using digital spectrograms for speaker identification purposes has been addressed

DCU Online Research Access Service

On the relations among temporal integration for loudness, loudness discrimination, and the form of the loudness function. (A)

Author: Buus Søren
Florentine M
Poulsen Torben
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1996
Field of study

Crossref

Online Research Database In Technology

Broadcast speech and the effect of voice quality on the listener : a study of the various components which categorise listener perception by vocal characteristics.

Author: Herbert John Charles
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/01/1989
Field of study

Voice quality is crucial to the art of the broadcast speaker. Acceptable voice quality is a necessity for an acceptable microphone voice and essential therefore for employment as a broadcaster. This thesis investigates the characteristics of the voice which provide that acceptability; and categorises the features which lead the listener to make judgements about their vocal likes and dislikes. These subjective judgements are explored by investigating the psychological, medical, and innate features contributing to the vocal perceptions of the listener. Voice quality is related to the efficiency of the larynx and its importance to voice production; and to the various vocal disorders which can affect the broadcaster. It becomes evident throughout the thesis that each listener receives a clear impression of the personality of the speaker through the features present in the voice. Many of these impressions however are based on stereotypes. The thesis relates these stereotypical judgements to accents, investigating their relationship to the 'BBC' voice, the 'World Service' voice, the 'ILR' voice and the 'reporter's voice' . It is shown that the listener's subjective impression of the voice and the broadcaster personality is formed by the presentational and physical aspects of voice quality. Listener perceptions of voice acceptability are tested and discussed. The data is analysed to provide a set of dominant characteristics from which are drawn voice histograms and frequency polygons. The result is a set of preferred voice characteristics which apply specifically to the broadcast speaker and which can be sought during the selection process

White Rose E-theses Online

Text-Independent Voice Conversion

Author: Sündermann David
Publication venue: Universität der Bundeswehr München, Fakultät für Elektrotechnik und Informationstechnik
Publication date: 01/01/2008
Field of study

This thesis deals with text-independent solutions for voice conversion. It first introduces the use of vocal tract length normalization (VTLN) for voice conversion. The presented variants of VTLN allow for easily changing speaker characteristics by means of a few trainable parameters. Furthermore, it is shown how VTLN can be expressed in time domain strongly reducing the computational costs while keeping a high speech quality. The second text-independent voice conversion paradigm is residual prediction. In particular, two proposed techniques, residual smoothing and the application of unit selection, result in essential improvement of both speech quality and voice similarity. In order to apply the well-studied linear transformation paradigm to text-independent voice conversion, two text-independent speech alignment techniques are introduced. One is based on automatic segmentation and mapping of artificial phonetic classes and the other is a completely data-driven approach with unit selection. The latter achieves a performance very similar to the conventional text-dependent approach in terms of speech quality and similarity. It is also successfully applied to cross-language voice conversion. The investigations of this thesis are based on several corpora of three different languages, i.e., English, Spanish, and German. Results are also presented from the multilingual voice conversion evaluation in the framework of the international speech-to-speech translation project TC-Star

Universität der Bundeswehr München: AtheneForschung