52 research outputs found

    Acoustics of the Vowel - Preliminaries

    Get PDF
    It seems as if the fundamentals of how we produce vowels and how they are acoustically represented have been clarified: we phonate and articulate. Using our vocal chords, we produce a vocal sound or noise which is then shaped into a specific vowel sound by the resonances of the pharyngeal, oral, and nasal cavities, that is, the vocal tract. Accordingly, the acoustic description of vowels relates to vowel-specific patterns of relative energy maxima in the sound spectra, known as patterns of formants. The intellectual and empirical reasoning presented in this treatise, however, gives rise to scepticism with respect to this understanding of the sound of the vowel. The reflections and materials presented provide reason to argue that, up to now, a comprehensible theory of the acoustics of the voice and of voiced speech sounds is lacking, and consequently, no satisfying understanding of vowels as an achievement and particular formal accomplishment of the voice exists. Thus, the question of the acoustics of the vowel—and with it the question of the acoustics of the voice itself—proves to be an unresolved fundamental problem

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    Pitch representations in the auditory nerve : two concurrent complex tones

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 39-43).Pitch differences between concurrent sounds are important cues used in auditory scene analysis and also play a major role in music perception. To investigate the neural codes underlying these perceptual abilities, we recorded from single fibers in the cat auditory nerve in response to two concurrent harmonic complex tones with missing fundamentals and equal-amplitude harmonics. We investigated the efficacy of rate-place and interspike-interval codes to represent both pitches of the two tones, which had fundamental frequency (FO) ratios of 15/14 or 11/9. We relied on the principle of scaling invariance in cochlear mechanics to infer the spatiotemporal response patterns to a given stimulus from a series of measurements made in a single fiber as a function of FO. Templates created by a peripheral auditory model were used to estimate the FOs of double complex tones from the inferred distribution of firing rate along the tonotopic axis. This rate-place representation was accurate for FOs above about 900 Hz. Surprisingly, rate-based FO estimates were accurate even when the two-tone mixture contained no resolved harmonics, so long as some harmonics were resolved prior to mixing. We also extended methods used previously for single complex tones to estimate the FOs of concurrent complex tones from interspike-interval distributions pooled over the tonotopic axis. The interval-based representation was accurate for FOs below about 900 Hz, where the two-tone mixture contained no resolved harmonics. Together, the rate-place and interval-based representations allow accurate pitch perception for concurrent sounds over the entire range of human voice and cat vocalizations.by Erik Larsen.S.M

    Pitch and spectral analysis of speech based on an auditory synchrony model

    Get PDF
    Also issued as Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1985.Includes bibliographical references (p. 228-235).Supported in part by the National Institutes of Health. 5 T32 NS07040Stephanie Seneff

    Perceptual models in speech quality assessment and coding

    Get PDF
    The ever-increasing demand for good communications/toll quality speech has created a renewed interest into the perceptual impact of rate compression. Two general areas are investigated in this work, namely speech quality assessment and speech coding. In the field of speech quality assessment, a model is developed which simulates the processing stages of the peripheral auditory system. At the output of the model a "running" auditory spectrum is obtained. This represents the auditory (spectral) equivalent of any acoustic sound such as speech. Auditory spectra from coded speech segments serve as inputs to a second model. This model simulates the information centre in the brain which performs the speech quality assessment. [Continues.

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

    Optimizing acoustic and perceptual assessment of voice quality in children with vocal nodules

    Get PDF
    Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 105-109).Few empirically-derived guidelines exist for optimizing the assessment of vocal function in children with voice disorders. The goal of this investigation was to identify a minimal set of speech tasks and associated acoustic analysis methods that are most salient in characterizing the impact of vocal nodules on vocal function in children. Hence, a pediatric assessment protocol was developed based on the standardized Consensus Auditory Perceptual Evaluation of Voice (CAPE-V) used to evaluate adult voices. Adult and pediatric versions of the CAPE-V protocols were used to gather recordings of vowels and sentences from adult females and children (4-6 and 8-10 year olds) with normal voices and vocal nodules, and these recordings were subjected to perceptual and acoustic analyses. Results showed that perceptual ratings for breathiness best characterized the presence of nodules in children's voices, and ratings for the production of sentences best differentiated normal voices and voices with nodules for both children and adults. Selected voice quality-related acoustic algorithms designed to quantitatively evaluate acoustic measures of vowels and sentences, were modified to be pitch-independent for use in analyzing children's voices. Synthesized vowels for children and adults were used to validate the modified algorithms by systematically assessing the effects of manipulating the periodicity and spectral characteristics of the synthesizer's voicing source.(cont.) In applying the validated algorithms to the recordings of subjects with normal voices and vocal nodules, the acoustic measure tended to differentiate normal voices and voices with nodules in children and adults, and some displayed significant correlations with the perceptual attributes of overall severity of dysphonia, roughness, and/or breathiness. None of the acoustic measures correlated significantly with the perceptual attribute of strain. Limitations in the strength of the correlations between acoustic measures and perceptual attributes were attributed to factors that can be addressed in future investigations, which can now utilize the algorithms that were developed in this investigation for children's voices. Preliminary recommendations are made for the clinical assessment of pediatric voice disorders.by Asako Masaki.Ph.D
    • …
    corecore