8,643 research outputs found

    LaDIVA: A neurocomputational model providing laryngeal motor control for speech acquisition and production

    Get PDF
    Many voice disorders are the result of intricate neural and/or biomechanical impairments that are poorly understood. The limited knowledge of their etiological and pathophysiological mechanisms hampers effective clinical management. Behavioral studies have been used concurrently with computational models to better understand typical and pathological laryngeal motor control. Thus far, however, a unified computational framework that quantitatively integrates physiologically relevant models of phonation with the neural control of speech has not been developed. Here, we introduce LaDIVA, a novel neurocomputational model with physiologically based laryngeal motor control. We combined the DIVA model (an established neural network model of speech motor control) with the extended body-cover model (a physics-based vocal fold model). The resulting integrated model, LaDIVA, was validated by comparing its model simulations with behavioral responses to perturbations of auditory vocal fundamental frequency (fo) feedback in adults with typical speech. LaDIVA demonstrated capability to simulate different modes of laryngeal motor control, ranging from short-term (i.e., reflexive) and long-term (i.e., adaptive) auditory feedback paradigms, to generating prosodic contours in speech. Simulations showed that LaDIVA’s laryngeal motor control displays properties of motor equivalence, i.e., LaDIVA could robustly generate compensatory responses to reflexive vocal fo perturbations with varying initial laryngeal muscle activation levels leading to the same output. The model can also generate prosodic contours for studying laryngeal motor control in running speech. LaDIVA can expand the understanding of the physiology of human phonation to enable, for the first time, the investigation of causal effects of neural motor control in the fine structure of the vocal signal.Fil: Weerathunge, Hasini R.. Boston University; Estados UnidosFil: Alzamendi, Gabriel Alejandro. Universidad Nacional de Entre RĂ­os. Instituto de InvestigaciĂłn y Desarrollo en BioingenierĂ­a y BioinformĂĄtica - Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - Santa Fe. Instituto de InvestigaciĂłn y Desarrollo en BioingenierĂ­a y BioinformĂĄtica; ArgentinaFil: Cler, Gabriel J.. University of Washington; Estados UnidosFil: Guenther, Frank H.. Boston University; Estados UnidosFil: Stepp, Cara E.. Boston University; Estados UnidosFil: Zañartu, MatĂ­as. Universidad TĂ©cnica Federico Santa MarĂ­a; Chil

    Modeling and frequency tracking of marine mammal whistle calls

    Get PDF
    Submitted in partial fulfillment of the requirements for the degree of Master of Science at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution February 2009Marine mammal whistle calls present an attractive medium for covert underwater communications. High quality models of the whistle calls are needed in order to synthesize natural-sounding whistles with embedded information. Since the whistle calls are composed of frequency modulated harmonic tones, they are best modeled as a weighted superposition of harmonically related sinusoids. Previous research with bottlenose dolphin whistle calls has produced synthetic whistles that sound too “clean” for use in a covert communications system. Due to the sensitivity of the human auditory system, watermarking schemes that slightly modify the fundamental frequency contour have good potential for producing natural-sounding whistles embedded with retrievable watermarks. Structured total least squares is used with linear prediction analysis to track the time-varying fundamental frequency and harmonic amplitude contours throughout a whistle call. Simulation and experimental results demonstrate the capability to accurately model bottlenose dolphin whistle calls and retrieve embedded information from watermarked synthetic whistle calls. Different fundamental frequency watermarking schemes are proposed based on their ability to produce natural sounding synthetic whistles and yield suitable watermark detection and retrieval

    Sound and noise

    Get PDF
    Sound and noise problems in space environment and human tolerance criteria at varying frequencies and intensitie

    Engineering data compendium. Human perception and performance. User's guide

    Get PDF
    The concept underlying the Engineering Data Compendium was the product of a research and development program (Integrated Perceptual Information for Designers project) aimed at facilitating the application of basic research findings in human performance to the design and military crew systems. The principal objective was to develop a workable strategy for: (1) identifying and distilling information of potential value to system design from the existing research literature, and (2) presenting this technical information in a way that would aid its accessibility, interpretability, and applicability by systems designers. The present four volumes of the Engineering Data Compendium represent the first implementation of this strategy. This is the first volume, the User's Guide, containing a description of the program and instructions for its use

    Towards an integrated formal model of fundamental frequency in overall downtrends

    Get PDF
    Although there are major differences in the various conceptual models of Fo scaling, we suggest that the corresponding mathematical formulations may be compatible and that the theoretical differences need not hinder the empirical aspects and practical uses of the theories as demonstrated in speech synthesis. The method follows standard practice in Mathematical Logics: combining and “rounding off” the formalisms of the different models, then allowing for a consistent interpretation of the new unified theory. The approach is applied to two current models of decay in intonation curves. The models and then the conflicts between them are described. These latter were used to construct the integrated model. Our short term objective is to validate the application of our approach by testing and implementing empirical instrumental data obtained independently

    Infant prosodic expressions in mother-infant communication

    Get PDF
    Prosody, generally defined as any perceivable modulation of duration, pitch or loudness in the voice that conveys meaning, has been identified as part of the linguistic system, or compared with the sound system of Western classical music. This thesis proposes a different conception, namely that prosody is a phenomenon of human expression that precedes, and to a certain extent determines the form and function of utterances in any particular language or music system. Findings from studies of phylogenesis and ontogenesis are presented in favour of this definition. Consequently, prosody of infant vocal expressions, which are made by individuals who have not yet developed either language or musical skills, is investigated as a phenomenon in itself, with its own rules. Recognising theoretical and methodological deficiencies in the linguistic and the Piagetian approaches to the development of infant prosodic expressions, this thesis supports the view that the origins of language are to be sought in the expressive dialogues between the mother and her prelinguistic child that are generated by intuitive motives for communication. Furthermore, infant vocalisations are considered as part of a system of communication constituted by all expressive modalities. Thus, the aim is to investigate the role of infant prosodic expressions in conveying emotions and communicative functions in relation to the accompanying non vocal-behaviours. A crossectional Pilot Study involving 16 infants aged 26 to 56 weeks and their mothers was undertaken to help in the design of the Main Study. The Main Study became a case description of two first born infants and their mothers; a boy (Robin) and a girl (Julie) both aged 30 weeks at the beginning of the study. The infants were filmed in their home every fortnight for five months in a structured naturalistic setting which included the following conditions: mother-infant free-play with their own toys, mother-infant play without using objects, the infant playing alone, motherinfant play with objects provided by the researcher, a 'car task' for eliciting cooperative play, and the mother staying unresponsive. Each filming session lasted approximately thirty minutes. In order to get an insight into the infants' 'meaning potential' expressed in their vocalisations, the mothers were asked to visit the department sometime in the interval between two filming sessions and, while watching the most recent video, to report what they felt their infant was conveyingif anything- in each vocalisation. Three types of analysis were carried out: a) An Analysis of Prosody - An attempt was made to obtain an objective, and not linguistically based account of infant prosodic features. First measurements were obtained of the duration and the fundamental frequency curve of each vocalisation by means of a computer programme for sound analysis. The values of fundamental frequency were then logarithmically transformed into a semitone scale in order to obtain measurements more sensitive to the mother's perception. b) A Functional Micro-Analysis of Non-Vocal Behaviours from Videos - The non vocal behaviours of mother and infant related with each vocalisation were codified without sound to examine to what extent the mothers relied for their interpretations on non-vocal behaviours accompanying vocalisations. c) An Analysis of the Mothers' Interpretations - The infants' messages were defined as perceived by their mother. The corpus comprised 713 vocalisations (322 for the boy and 391 for the girl) selected from a corpus of 864, and 143 minutes of video recording (64 for the boy and 79 for the girl). Correlations between the above three assessments were specified through statistical analysis. The findings from both infants indicate that between seven and eleven months prosodic patterns are not related one to one with particular messages. Rather, prosody distinguishes between groups of messages conveying features of psychological motivation, such as 'emotional', 'interpersonal', 'referential', 'assertive' or 'receptive'. Individual messages belonging to the same message group according to the analysis of prosody, are distinguished on the basis of the accompanying nonvocal behaviours. Before nine months, 'interpersonal' vocalisations display more 'alerting' prosodic patterns than 'referential' vocalisations. After nine months prosodic patterns in Robin's vocalisations differentiate between 'assertive' and 'receptive' messages, the former being expressed by more 'alerting' prosodic patterns than the latter. This distinction reflects a better Self-Other awareness. On the other hand, Julie's vocalisations occurring in situations of 'Joint Interest' display different prosodic patterns from her vocalisations uttered in situations of 'Converging Interest'. These changes in the role infant prosody reflect developments in the infants' motivational organisation which will lead to a more efficient control of intersubjective orientation and shared attention to the environment. Moreover, it was demonstrated that new forms of prosodic expression occur in psychologically mature situations, while the psychologically novel situations are expressed by mature prosodic forms. The above results suggest that at the threshold to language, prosody does not primarily serve identifiable linguistic functions. Rather, in spite of individual differences in form of their vocalisations, both infants use prosody in combination with other modalities as part of an expressive system, that conveys information about their motives. In this way prosody facilitates intersubjective and later cooperative communication, on which language development is built. To what extent such prelinguistic prosodic patterns are similar in form to those of the target language is a crucial issue for further investigation

    Auditory communication in domestic dogs: vocal signalling in the extended social environment of a companion animal

    Get PDF
    Domestic dogs produce a range of vocalisations, including barks, growls, and whimpers, which are shared with other canid species. The source–filter model of vocal production can be used as a theoretical and applied framework to explain how and why the acoustic properties of some vocalisations are constrained by physical characteristics of the caller, whereas others are more dynamic, influenced by transient states such as arousal or motivation. This chapter thus reviews how and why particular call types are produced to transmit specific types of information, and how such information may be perceived by receivers. As domestication is thought to have caused a divergence in the vocal behaviour of dogs as compared to the ancestral wolf, evidence of both dog–human and human–dog communication is considered. Overall, it is clear that domestic dogs have the potential to acoustically broadcast a range of information, which is available to conspecific and human receivers. Moreover, dogs are highly attentive to human speech and are able to extract speaker identity, emotional state, and even some types of semantic information

    Audio Processing and Loudness Estimation Algorithms with iOS Simulations

    Get PDF
    abstract: The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters from human auditory models, such as auditory patterns and loudness, involves computationally intensive operations which can strain device resources. Hence, strategies for implementing computationally efficient human auditory models for loudness estimation have been studied in this thesis. Existing algorithms for reducing computations in auditory pattern and loudness estimation have been examined and improved algorithms have been proposed to overcome limitations of these methods. In addition, real-time applications such as perceptual loudness estimation and loudness equalization using auditory models have also been implemented. A software implementation of loudness estimation on iOS devices is also reported in this thesis. In addition to the loudness estimation algorithms and software, in this thesis project we also created new illustrations of speech and audio processing concepts for research and education. As a result, a new suite of speech/audio DSP functions was developed and integrated as part of the award-winning educational iOS App 'iJDSP." These functions are described in detail in this thesis. Several enhancements in the architecture of the application have also been introduced for providing the supporting framework for speech/audio processing. Frame-by-frame processing and visualization functionalities have been developed to facilitate speech/audio processing. In addition, facilities for easy sound recording, processing and audio rendering have also been developed to provide students, practitioners and researchers with an enriched DSP simulation tool. Simulations and assessments have been also developed for use in classes and training of practitioners and students.Dissertation/ThesisM.S. Electrical Engineering 201

    Ultrasound cleaning of microfilters

    Get PDF

    The interaction between articulation and tones in Cantonese

    Get PDF
    "A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, June 30, 2009."Thesis (B.Sc)--University of Hong Kong, 2009.Includes bibliographical references (p. 27-30).published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science
    • 

    corecore