2,800 research outputs found

    Continuous Interaction with a Virtual Human

    Get PDF
    Attentive Speaking and Active Listening require that a Virtual Human be capable of simultaneous perception/interpretation and production of communicative behavior. A Virtual Human should be able to signal its attitude and attention while it is listening to its interaction partner, and be able to attend to its interaction partner while it is speaking – and modify its communicative behavior on-the-fly based on what it perceives from its partner. This report presents the results of a four week summer project that was part of eNTERFACE’10. The project resulted in progress on several aspects of continuous interaction such as scheduling and interrupting multimodal behavior, automatic classification of listener responses, generation of response eliciting behavior, and models for appropriate reactions to listener responses. A pilot user study was conducted with ten participants. In addition, the project yielded a number of deliverables that are released for public access

    Evaluating automatic speaker recognition systems: an overview of the nist speaker recognition evaluations (1996-2014)

    Get PDF
    2014 CSIC. Manuscripts published in this Journal are the property of the Consejo Superior de Investigaciones Científicas, and quoting this source is a requirement for any partial or full reproduction.Automatic Speaker Recognition systems show interesting properties, such as speed of processing or repeatability of results, in contrast to speaker recognition by humans. But they will be usable just if they are reliable. Testability, or the ability to extensively evaluate the goodness of the speaker detector decisions, becomes then critical. In the last 20 years, the US National Institute of Standards and Technology (NIST) has organized, providing the proper speech data and evaluation protocols, a series of text-independent Speaker Recognition Evaluations (SRE). Those evaluations have become not just a periodical benchmark test, but also a meeting point of a collaborative community of scientists that have been deeply involved in the cycle of evaluations, allowing tremendous progress in a specially complex task where the speaker information is spread across different information levels (acoustic, prosodic, linguistic…) and is strongly affected by speaker intrinsic and extrinsic variability factors. In this paper, we outline how the evaluations progressively challenged the technology including new speaking conditions and sources of variability, and how the scientific community gave answers to those demands. Finally, NIST SREs will be shown to be not free of inconveniences, and future challenges to speaker recognition assessment will also be discussed

    The analysis of breathing and rhythm in speech

    Get PDF
    Speech rhythm can be described as the temporal patterning by which speech events, such as vocalic onsets, occur. Despite efforts to quantify and model speech rhythm across languages, it remains a scientifically enigmatic aspect of prosody. For instance, one challenge lies in determining how to best quantify and analyse speech rhythm. Techniques range from manual phonetic annotation to the automatic extraction of acoustic features. It is currently unclear how closely these differing approaches correspond to one another. Moreover, the primary means of speech rhythm research has been the analysis of the acoustic signal only. Investigations of speech rhythm may instead benefit from a range of complementary measures, including physiological recordings, such as of respiratory effort. This thesis therefore combines acoustic recording with inductive plethysmography (breath belts) to capture temporal characteristics of speech and speech breathing rhythms. The first part examines the performance of existing phonetic and algorithmic techniques for acoustic prosodic analysis in a new corpus of rhythmically diverse English and Mandarin speech. The second part addresses the need for an automatic speech breathing annotation technique by developing a novel function that is robust to the noisy plethysmography typical of spontaneous, naturalistic speech production. These methods are then applied in the following section to the analysis of English speech and speech breathing in a second, larger corpus. Finally, behavioural experiments were conducted to investigate listeners' perception of speech breathing using a novel gap detection task. The thesis establishes the feasibility, as well as limits, of automatic methods in comparison to manual annotation. In the speech breathing corpus analysis, they help show that speakers maintain a normative, yet contextually adaptive breathing style during speech. The perception experiments in turn demonstrate that listeners are sensitive to the violation of these speech breathing norms, even if unconsciously so. The thesis concludes by underscoring breathing as a necessary, yet often overlooked, component in speech rhythm planning and production

    Is it necessary to assess fluent symptoms, duration of dysfluent events and physical concomitants when identifying children who have speech difficulties?

    Get PDF
    Riley’s (1994) Stuttering Severity Instrument version three (SSI-3) has three components: a symptom frequency measure (%SS), average duration of the three longest stutters and a physical concomitant (PC) score. An assessment of whether it was necessary to use all of these when using SSI-3 to identify which children are at risk of speech difficulty was performed. Participants were 879 reception class children aged 4-6 years from UK schools. The distributions of the separate components of SSI-3 were examined. Departures from normality were noted for each component. The features seen in the distribution of the individual components were also apparent in the distribution of overall scores (this was not normal and had multiple modes). These findings undermine the usefulness of the overall measure for identifying children at risk of speech difficulty. Prior work used a fixed SSI-3 threshold to identify at risk children. Classification of children as fluent or at risk based on this threshold was compared with classifications based on thresholds applied to the individual components. Classifications were comparable for %SS, but less satisfactory for duration and PC. These findings suggests that %SS performs similarly to overall SSI-3 scores when used to identify at risk children. Riley (1994) conducted correlation analyses to justify the inclusion of all components in SSI-3. This involved part (individual component) scores that were correlated with whole (overall SSI-3) scores. These results were replicated. However, correlations are spuriously inflated if this procedure is employed. Additional correlation analyses showed that part-‘whole’ correlations were low when the component used as the part was excluded from the ‘whole’. Thus Riley’s justification for using all components is questionable. Physical concomitants measured on five-point scales (as Riley specified) were no more sensitive than when the scale was collapsed to three or to two points. Since judgments were not affected when the scale was decimated, judges did not appear to be able to use the original scale. Procedures for identifying at risk children in schools need to be short and easy to administer. Thus, since there is no justification for including all components of SSI-3 and duration and physical concomitants are not sensitive measures of fluency, a procedure based on the frequency measure alone is appropriate for use in schools

    Tracking Reading: Dual Task Costs of Oral Reading for Young Versus Older Adults

    Get PDF
    A digital pursuit rotor was used to monitor oral reading costs by time-locking tracking performance to the auditory wave form produced as young and older adults were reading out short paragraphs. Multilevel modeling was used to determine how paragraph-level predictors of length, grammatical complexity, and readability and person-level predictors such as speaker age or working memory capacity predicted reading and tracking performance. In addition, sentence-by-sentence variation in tracking performance was examined during the production of individual sentences and during the pauses before upcoming sentences. The results suggest that dual tasking has a greater impact on older adults’ reading comprehension and tracking performance. At the level of individual sentences, young and older adults adopt different strategies to deal with grammatically complex and propositionally dense sentences.This research was supported in part by grants from the NIH to the University of Kansas through the Mental Retardation and Developmental Disabilities Research Center, grant number P30 HD-002528, and the Center for Biobehavioral Neurosciences in Communication Disorders (BNCD), grant number P30 DC-005803 as well as by grant RO1 AG-025906 from the National Institute on Aging to Susan Kemper. Its contents are solely the responsibility of the author and do not necessarily represent the official views of the NIH. We thank Ruth Herman for her assistance with data collection and analysis. A suite of digital pursuit rotor applications is available upon request

    Dual Task Costs of Oral Reading for Young versus Older Adults

    Get PDF
    A digital pursuit rotor was used to monitor oral reading costs by time-locking tracking performance to the auditory wave form produced as young and older adults were reading out short paragraphs. Multilevel modeling was used to determine how paragraph-level predictors of length, grammatical complexity, and readability and person-level predictors such as speaker age or working memory capacity predicted reading and tracking performance. In addition, sentence-by-sentence variation in tracking performance was examined during the production of individual sentences and during the pauses before upcoming sentences. The results suggest that dual tasking has a greater impact on older adults’ reading comprehension and tracking performance. At the level of individual sentences, young and older adults adopt different strategies to deal with grammatically complex and propositionally dense sentences

    An exploration of the rhythm of Malay

    Get PDF
    In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing. The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (∆C, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English. Further analysis has been carried out in light of Fletcher’s (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvaniti’s (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listeners’ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima. This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm
    corecore