2,798 research outputs found

    On the Assessment of Stability and Patterning of Speech Movements

    Get PDF
    Speech requires the control of complex movements of orofacial structures to produce dynamic variations in the vocal tract transfer function. The nature of the underlying motor control processes has traditionally been investigated by employing measures of articulatory movements, including movement amplitude, velocity, and duration, at selected points in time. An alternative approach, first used in the study of limb motion, is to examine the entire movement trajectory over time. A new approach to speech movement trajectory analysis was introduced in earlier work from this laboratory. In this method, trajectories from multiple movement sequences are time- and amplitude-normalized, and the STI (spatiotemporal index) is computed to capture the degree of convergence of a set of trajectories onto a single, underlying movement template. This research note describes the rationale for this analysis and provides a detailed description of the signal processing involved. Alternative interpolation procedures for time-normalization of kinematic data are also considered

    I hear you eat and speak: automatic recognition of eating condition and food type, use-cases, and impact on ASR performance

    Get PDF
    We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient

    Paralinguistic vocal control of interactive media: how untapped elements of voice might enhance the role of non-speech voice input in the user's experience of multimedia.

    Get PDF
    Much interactive media development, especially commercial development, implies the dominance of the visual modality, with sound as a limited supporting channel. The development of multimedia technologies such as augmented reality and virtual reality has further revealed a distinct partiality to visual media. Sound, however, and particularly voice, have many aspects which have yet to be adequately investigated. Exploration of these aspects may show that sound can, in some respects, be superior to graphics in creating immersive and expressive interactive experiences. With this in mind, this thesis investigates the use of non-speech voice characteristics as a complementary input mechanism in controlling multimedia applications. It presents a number of projects that employ the paralinguistic elements of voice as input to interactive media including both screen-based and physical systems. These projects are used as a means of exploring the factors that seem likely to affect users’ preferences and interaction patterns during non-speech voice control. This exploration forms the basis for an examination of potential roles for paralinguistic voice input. The research includes the conceptual and practical development of the projects and a set of evaluative studies. The work submitted for Ph.D. comprises practical projects (50 percent) and a written dissertation (50 percent). The thesis aims to advance understanding of how voice can be used both on its own and in combination with other input mechanisms in controlling multimedia applications. It offers a step forward in the attempts to integrate the paralinguistic components of voice as a complementary input mode to speech input applications in order to create a synergistic combination that might let the strengths of each mode overcome the weaknesses of the other

    Speaking Rate Effects on Normal Aspects of Articulation: Outcomes and Issues

    Get PDF
    The articulatory effects of speaking rate have been a point of focus for a substantial literature in speech science. The normal aspects of speaking rate variation have influenced theories and models of speech production and perception in the literature pertaining to both normal and disordered speech. While the body of literature pertaining to the articulatory effects of speaking rate change is reasonably large, few speaker-general outcomes have emerged. The purpose of this paper is to review outcomes of the existing literature and address problems related to the study of speaking rate that may be germane to the recurring theme that speaking rate effects are largely idiosyncratic

    Speech intelligibility and prosody production in children with cochlear implants

    Get PDF
    Objectives—The purpose of the current study was to examine the relation between speech intelligibility and prosody production in children who use cochlear implants. Methods—The Beginner\u27s Intelligibility Test (BIT) and Prosodic Utterance Production (PUP) task were administered to 15 children who use cochlear implants and 10 children with normal hearing. Adult listeners with normal hearing judged the intelligibility of the words in the BIT sentences, identified the PUP sentences as one of four grammatical or emotional moods (i.e., declarative, interrogative, happy, or sad), and rated the PUP sentences according to how well they thought the child conveyed the designated mood. Results—Percent correct scores were higher for intelligibility than for prosody and higher for children with normal hearing than for children with cochlear implants. Declarative sentences were most readily identified and received the highest ratings by adult listeners; interrogative sentences were least readily identified and received the lowest ratings. Correlations between intelligibility and all mood identification and rating scores except declarative were not significant. Discussion—The findings suggest that the development of speech intelligibility progresses ahead of prosody in both children with cochlear implants and children with normal hearing; however, children with normal hearing still perform better than children with cochlear implants on measures of intelligibility and prosody even after accounting for hearing age. Problems with interrogative intonation may be related to more general restrictions on rising intonation, and th

    Infant prosodic expressions in mother-infant communication

    Get PDF
    Prosody, generally defined as any perceivable modulation of duration, pitch or loudness in the voice that conveys meaning, has been identified as part of the linguistic system, or compared with the sound system of Western classical music. This thesis proposes a different conception, namely that prosody is a phenomenon of human expression that precedes, and to a certain extent determines the form and function of utterances in any particular language or music system. Findings from studies of phylogenesis and ontogenesis are presented in favour of this definition. Consequently, prosody of infant vocal expressions, which are made by individuals who have not yet developed either language or musical skills, is investigated as a phenomenon in itself, with its own rules. Recognising theoretical and methodological deficiencies in the linguistic and the Piagetian approaches to the development of infant prosodic expressions, this thesis supports the view that the origins of language are to be sought in the expressive dialogues between the mother and her prelinguistic child that are generated by intuitive motives for communication. Furthermore, infant vocalisations are considered as part of a system of communication constituted by all expressive modalities. Thus, the aim is to investigate the role of infant prosodic expressions in conveying emotions and communicative functions in relation to the accompanying non vocal-behaviours. A crossectional Pilot Study involving 16 infants aged 26 to 56 weeks and their mothers was undertaken to help in the design of the Main Study. The Main Study became a case description of two first born infants and their mothers; a boy (Robin) and a girl (Julie) both aged 30 weeks at the beginning of the study. The infants were filmed in their home every fortnight for five months in a structured naturalistic setting which included the following conditions: mother-infant free-play with their own toys, mother-infant play without using objects, the infant playing alone, motherinfant play with objects provided by the researcher, a 'car task' for eliciting cooperative play, and the mother staying unresponsive. Each filming session lasted approximately thirty minutes. In order to get an insight into the infants' 'meaning potential' expressed in their vocalisations, the mothers were asked to visit the department sometime in the interval between two filming sessions and, while watching the most recent video, to report what they felt their infant was conveyingif anything- in each vocalisation. Three types of analysis were carried out: a) An Analysis of Prosody - An attempt was made to obtain an objective, and not linguistically based account of infant prosodic features. First measurements were obtained of the duration and the fundamental frequency curve of each vocalisation by means of a computer programme for sound analysis. The values of fundamental frequency were then logarithmically transformed into a semitone scale in order to obtain measurements more sensitive to the mother's perception. b) A Functional Micro-Analysis of Non-Vocal Behaviours from Videos - The non vocal behaviours of mother and infant related with each vocalisation were codified without sound to examine to what extent the mothers relied for their interpretations on non-vocal behaviours accompanying vocalisations. c) An Analysis of the Mothers' Interpretations - The infants' messages were defined as perceived by their mother. The corpus comprised 713 vocalisations (322 for the boy and 391 for the girl) selected from a corpus of 864, and 143 minutes of video recording (64 for the boy and 79 for the girl). Correlations between the above three assessments were specified through statistical analysis. The findings from both infants indicate that between seven and eleven months prosodic patterns are not related one to one with particular messages. Rather, prosody distinguishes between groups of messages conveying features of psychological motivation, such as 'emotional', 'interpersonal', 'referential', 'assertive' or 'receptive'. Individual messages belonging to the same message group according to the analysis of prosody, are distinguished on the basis of the accompanying nonvocal behaviours. Before nine months, 'interpersonal' vocalisations display more 'alerting' prosodic patterns than 'referential' vocalisations. After nine months prosodic patterns in Robin's vocalisations differentiate between 'assertive' and 'receptive' messages, the former being expressed by more 'alerting' prosodic patterns than the latter. This distinction reflects a better Self-Other awareness. On the other hand, Julie's vocalisations occurring in situations of 'Joint Interest' display different prosodic patterns from her vocalisations uttered in situations of 'Converging Interest'. These changes in the role infant prosody reflect developments in the infants' motivational organisation which will lead to a more efficient control of intersubjective orientation and shared attention to the environment. Moreover, it was demonstrated that new forms of prosodic expression occur in psychologically mature situations, while the psychologically novel situations are expressed by mature prosodic forms. The above results suggest that at the threshold to language, prosody does not primarily serve identifiable linguistic functions. Rather, in spite of individual differences in form of their vocalisations, both infants use prosody in combination with other modalities as part of an expressive system, that conveys information about their motives. In this way prosody facilitates intersubjective and later cooperative communication, on which language development is built. To what extent such prelinguistic prosodic patterns are similar in form to those of the target language is a crucial issue for further investigation

    On the design of visual feedback for the rehabilitation of hearing-impaired speech

    Get PDF
    • …
    corecore