487 research outputs found

    Comparing Different Methods for Disfluency Structure Detection

    Get PDF
    This paper presents a number of experiments focusing on assessing the performance of different machine learning methods on the identification of disfluencies and their distinct structural regions over speech data. Several machine learning methods have been applied, namely Naive Bayes, Logistic Regression, Classification and Regression Trees (CARTs), J48 and Multilayer Perceptron. Our experiments show that CARTs outperform the other methods on the identification of the distinct structural disfluent regions. Reported experiments are based on audio segmentation and prosodic features, calculated from a corpus of university lectures in European Portuguese, containing about 32h of speech and about 7.7% of disfluencies. The set of features automatically extracted from the forced alignment corpus proved to be discriminant of the regions contained in the production of a disfluency. This work shows that using fully automatic prosodic features, disfluency structural regions can be reliably identified using CARTs, where the best results achieved correspond to 81.5% precision, 27.6% recall, and 41.2% F-measure. The best results concern the detection of the interregnum, followed by the detection of the interruption point

    Robust Speech Recognition for Adverse Environments

    Get PDF

    Helping, I Mean Assessing Psychiatric Communication: An Applicaton of Incremental Self-Repair Detection

    Get PDF
    18th SemDial Workshop on the Semantics and Pragmatics of Dialogue (DialWatt), 1-3 September 2014, Edinburgh, ScotlandSelf-repair is pervasive in dialogue, and models thereof have long been a focus of research, particularly for disfluency detection in speech recognition and spoken dialogue systems. However, the generality of such models across domains has received little attention. In this paper we investigate the application of an automatic incremental self-repair detection system, STIR, developed on the Switchboard corpus of telephone speech, to a new domain – psychiatric consultations. We find that word-level accuracy is reduced markedly by the differences in annotation schemes and transcription conventions between corpora, which has implications for the generalisability of all repair detection systems. However, overall rates of repair are detected accurately, promising a useful resource for clinical dialogue studies

    Hesitations in Spoken Dialogue Systems

    Get PDF
    Betz S. Hesitations in Spoken Dialogue Systems. Bielefeld: Universität Bielefeld; 2020

    The Effect of Attention to Self-Regulation of Speech Sound Productions on Speech Fluency in Oral Reading

    Get PDF
    Purpose: This study ultimately sought to test whether a condition of heightened attention to speech sound production during connected speech serves to trigger increased disfluencies. Disfluencies, or disruptions in the flow of speech, are highly variable in form and location, both within and across individuals and situations. Research to identify conditions that can predictably trigger disfluencies has the potential to provide insight into their elusive nature. A review of related literature covered the cognitive-linguistic theories related to speech fluency and stuttering. This review of previous literature also served as the foundation for why it was proposed that disfluencies would be triggered by heightened self-monitoring attention to how speech sounds are made during connected speech. Methods: Participants included 10 male and 10 female normally fluent adult college students. Their tasks included a baseline oral reading of a 330-word passage, learning of two new speech sounds, followed by an experimental reading of the same passage again. During the experimental reading, target sounds, which were indicated by highlighted locations within the passage, had to be replaced with the newly learned speech sounds. Participants indicated much greater attention was given to how speech sounds were produced during the experimental oral reading than in the baseline oral reading, to support and validate the nature of the task. Results: Disfluencies and oral reading rates were examined using descriptive statistics and analyzed by means of the negative binomial distribution model. Secondary analyses of oral reading rates were conducted with the Wilcoxon’s Signed Rank test. The results revealed that the experimental reading task was associated with a significant increase in Stuttering-Like Disfluency (SLD) and Other Disfluency (OD), and a significant decrease in oral reading rate. Furthermore, SLDs increased significantly more than ODs from the first to the second reading. Discussion: Results supported the hypothesis that disfluency, especially SLD, can be triggered by a condition of increased attention to self-monitoring how speech sounds are produced during connected speech. These findings support theories explaining disfluencies as a symptom of a speaker’s cognitive-linguistic speech planning processes being over-burdened. Implications are raised for specific populations that may be at risk-for more disfluencies: young children learning language, second-language learners, and children in speech therapy. Future research directions are recommended to better understand how to prevent disfluencies in at-risk populations and clarify the enigmatic relationship among attentional processes, phonological production planning, and stuttering

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Extending automatic transcripts in a unified data representation towards a prosodic-based metadata annotation and evaluation

    Get PDF
    This paper describes a framework that extends automatic speech transcripts in order to accommodate relevant information coming from manual transcripts, the speech signal itself, and other resources, like lexica. The proposed framework automatically collects, relates, computes, and stores all relevant information together in a self-contained data source, making it possible to easily provide a wide range of interconnected information suitable for speech analysis, training, and evaluating a number of automatic speech processing tasks. The main goal of this framework is to integrate different linguistic and paralinguistic layers of knowledge for a more complete view of their representation and interactions in several domains and languages. The processing chain is composed of two main stages, where the first consists of integrating the relevant manual annotations in the speech recognition data, and the second consists of further enriching the previous output in order to accommodate prosodic information. The described framework has been used for the identification and analysis of structural metadata in automatic speech transcripts. Initially put to use for automatic detection of punctuation marks and for capitalization recovery from speech data, it has also been recently used for studying the characterization of disfluencies in speech. It was already applied to several domains of Portuguese corpora, and also to English and Spanish Broadcast News corpora

    Computational Models of Miscommunication Phenomena

    Get PDF
    Miscommunication phenomena such as repair in dialogue are important indicators of the quality of communication. Automatic detection is therefore a key step toward tools that can characterize communication quality and thus help in applications from call center management to mental health monitoring. However, most existing computational linguistic approaches to these phenomena are unsuitable for general use in this way, and particularly for analyzing human–human dialogue: Although models of other-repair are common in human-computer dialogue systems, they tend to focus on specific phenomena (e.g., repair initiation by systems), missing the range of repair and repair initiation forms used by humans; and while self-repair models for speech recognition and understanding are advanced, they tend to focus on removal of “disfluent” material important for full understanding of the discourse contribution, and/or rely on domain-specific knowledge. We explain the requirements for more satisfactory models, including incrementality of processing and robustness to sparsity. We then describe models for self- and other-repair detection that meet these requirements (for the former, an adaptation of an existing repair model; for the latter, an adaptation of standard techniques) and investigate how they perform on datasets from a range of dialogue genres and domains, with promising results.EPSRC. Grant Number: EP/10383/1; Future and Emerging Technologies (FET). Grant Number: 611733; German Research Foundation (DFG). Grant Number: SCHL 845/5-1; Swedish Research Council (VR). Grant Numbers: 2016-0116, 2014-3

    Incremental Disfluency Detection for Spoken Learner English

    Get PDF
    Dialogue-based computer-assisted language learning (CALL) concerns the application and analysis of automated systems that engage with a language learner through dialogue. Routed in an interactionist perspective of second language acquisition, dialogue-based CALL systems assume the role of a speaking partner, providing learners the opportunity for spontaneous production of their second language. One area of interest for such systems is the implementation of corrective feedback. However, the feedback strategies employed by such systems remain fairly limited. In particular, there are currently no provisions for learners to initiate the correction of their own errors, despite this being the most frequently occurring and most preferred type of error correction in learner speech. To address this gap, this thesis proposes a framework for implementing such functionality, identifying incremental self-initiated self-repair (i.e. disfluency) detection as a key area for research. Taking an interdisciplinary approach to the exploration of this topic, this thesis outlines the steps taken to optimise an incremental disfluency detection model for use with spoken learner English. To begin, a linguistic comparative analysis of native and learner disfluency corpora explored the differences between the disfluency behaviour of native and learner speech, highlighting key features of learner speech not previously explored in disfluency detection model analysis. Following this, in order to identify a suitable baseline model for further experimentation, two state-of-the-art incremental self-repair detection models were trained and tested with a learner speech corpus. An error analysis of the models' outputs found an LSTM model using word embeddings and part-of-speech tags to be the most suitable for learner speech, thanks to its lower number of false positives triggered by learner errors in the corpus. Following this, several adaptations to the model were tested to improve performance. Namely, the inclusion of character embeddings, silence and laughter features, separating edit term detection from disfluency detection, lemmatization and the inclusion of learners' prior proficiency scores led to over an eight percent model improvement over the baseline. Findings from this thesis illustrate how the analysis of language characteristics specific to learner speech can positively inform model adaptation and provide a starting point for further investigation into the implementation of effective corrective feedback strategies in dialogue-based CALL systems
    • …
    corecore