177 research outputs found

    Systematic Review of Machine Learning Approaches for Detecting Developmental Stuttering

    Get PDF
    A systematic review of the literature on statistical and machine learning schemes for identifying symptoms of developmental stuttering from audio recordings is reported. Twenty-seven papers met the quality standards that were set. Comparison of results across studies was not possible because training and testing data, model architecture and feature inputs varied across studies. The limitations that were identified for comparison across studies included: no indication of application for the work, data were selected for training and testing models in ways that could lead to biases, studies used different datasets and attempted to locate different symptom types, feature inputs were reported in different ways and there was no standard way of reporting performance statistics. Recommendations were made about how these problems can be addressed in future work on this topic

    Automatic Framework to Aid Therapists to Diagnose Children who Stutter

    Get PDF

    StutterNet: Stuttering Detection Using Time Delay Neural Network

    Get PDF
    International audienceThis paper introduces StutterNet, a novel deep learning based stuttering detection capable of detecting and identifying various types of disfluencies. Most of the existing work in this domain uses automatic speech recognition (ASR) combined with language models for stuttering detection. Compared to the existing work, which depends on the ASR module, our method relies solely on the acoustic signal. We use a time-delay neural network (TDNN) suitable for capturing contextual aspects of the disfluent utterances. We evaluate our system on the UCLASS stuttering dataset consisting of more than 100 speakers. Our method achieves promising results and outperforms the state-of-the-art residual neural network based method. The number of trainable parameters of the proposed method is also substantially less due to the parameter sharing scheme of TDNN

    Computer-based stuttered speech detection system using Hidden Markov Model

    Get PDF
    Stuttering has attracted extensive research interests over the past decades. Most of the available stuttering diagnostics and assessment technique uses human perceptual judgment to overt stuttered speech characteristics. Conventionally, the stuttering severity is diagnosed by manual counting the number of occurrences of disfluencies of pre-recorded therapist-patient conversation. It is a time-consuming task, subjective, inconsistent and easily prone to error across clinics. Therefore, this thesis proposes a computerized system by deploying HMM-based speech recognition technique to detect the stuttered speech disfluency. The continuous Malay digit string has been used as the training and testing set for fluency detection. Hidden Markov Model (HMM) is a robust and powerful statistical-based acoustic modeling technique. With their efficient training algorithm (Forward-backward, Baum-Welch algorithms) and recognition algorithm, as well as its modeling flexibility in model topology and other knowledge sources, HMM has been successfully applied in solving various tasks. In this thesis, a set of normal voice for digit string as database is used for training HMM. Then, the pseudo stuttering voice was collected as testing set for proposed system. The generated experimental results were compared with the results made by Speech Language Pathologist (SLP) from Clinic of Audiology and Speech Sciences of Universiti Kebangsaan Malaysia (UKM). As a result, the proposed system is proven to be capable to achieve 100% average syllable repetition detection accuracy with 86.605% average sound prolongation detection accuracy. The SLP agreed with the result generated by the software. This system can be further enhanced for detecting stuttering disorder for daily speaking words where Microsoft Visual C++ 6.0 and Goldwave have been used for developing the software which can be executed under the window-based environment

    AI and Non AI Assessments for Dementia

    Full text link
    Current progress in the artificial intelligence domain has led to the development of various types of AI-powered dementia assessments, which can be employed to identify patients at the early stage of dementia. It can revolutionize the dementia care settings. It is essential that the medical community be aware of various AI assessments and choose them considering their degrees of validity, efficiency, practicality, reliability, and accuracy concerning the early identification of patients with dementia (PwD). On the other hand, AI developers should be informed about various non-AI assessments as well as recently developed AI assessments. Thus, this paper, which can be readable by both clinicians and AI engineers, fills the gap in the literature in explaining the existing solutions for the recognition of dementia to clinicians, as well as the techniques used and the most widespread dementia datasets to AI engineers. It follows a review of papers on AI and non-AI assessments for dementia to provide valuable information about various dementia assessments for both the AI and medical communities. The discussion and conclusion highlight the most prominent research directions and the maturity of existing solutions.Comment: 49 page

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Towards Improving The Evaluation Of Speech Production Deficits In Chronic Stroke

    Get PDF
    One of the most devastating consequences of stroke is aphasia - a disorder that impairs communication across the domains of expressive and receptive language. In addition to language difficulties, stroke survivors may struggle with disruptions in speech motor planning and/or execution processes (i.e., a motor speech disorder, MSD). The clinical management of MSDs has been challenged by debates regarding their theoretical nature and clinical manifestations. This is especially true for differentiating speech production errors that can be attributed to aphasia (i.e., phonemic paraphasias) from lower-level motor planning/programming impairments (i.e., articulation errors that occur in apraxia of speech; AOS). Therefore, the purposes of this study were 1) to identify objective measures that have the greatest discriminative weight in diagnostic classification of AOS, and 2) using neuroimaging, to localize patterns of brain damage predictive of these behaviors. Method: Stroke survivors (N=58; 21 female; mean age=61.03±10.01; months post-onset=66.07±52.93) were recruited as part of a larger study. Participants completed a thorough battery of speech and language testing and underwent a series of magnetic resonance imaging (MRI) sequences. Objective, acoustic measures were obtained from three connected speech samples. These variables quantified inter-articulatory planning, speech rhythm and prosody, and speech fluency. The number of phonemic and distortion errors per sample was also quantified. All measures were analyzed for group differences, and variables were subject to a linear discriminant analysis (LDA) to determine which served as the best predictor of AOS. MRI data were analyzed with voxel-based lesionsymptom mapping and connectome-symptom mapping to relate patterns of cortical necrosis and white matter compromise to different aspects of disordered speech. Results: Participants with both AOS and aphasia generally demonstrated significantly poorer performance across all production measures when compared to those with aphasia as their only impairment, and compared to those with no detectable speech or language impairment. The LDA model with the greatest classification accuracy correctly predicted 90.7% of cases. Neuroimaging analysis indicated that damage to mostly unique regions of the pre- and post-central gyri, the supramarginal gyrus, and white matter connections between these regions and subcortical structures was related to impaired speech production. Conclusions: Results support and build upon recent studies that have sought to improve the assessment of post-stroke speech production. Findings are discussed with regard to contemporary models of speech production, guided by the overarching goal of refining the clinical evaluation and theoretical explanations of AOS

    Evaluating pause particles and their functions in natural and synthesized speech in laboratory and lecture settings

    Get PDF
    Pause-internal phonetic particles (PINTs) comprise a variety of phenomena including: phonetic-acoustic silence, inhalation and exhalation breath noises, filler particles “uh” and “um” in English, tongue clicks, and many others. These particles are omni-present in spontaneous speech, however, they are under-researched in both natural speech and synthetic speech. The present work explores the influence of PINTs in small-context recall experiments, develops a bespoke speech synthesis system that incorporates the PINTs pattern of a single speaker, and evaluates the influence of PINTs on recall for larger material lengths, namely university lectures. The benefit of PINTs on recall has been documented in natural speech in small-context laboratory settings, however, this area of research has been under-explored for synthetic speech. We devised two experiments to evaluate if PINTs have the same recall benefit for synthetic material that is found with natural material. In the first experiment, we evaluated the recollection of consecutive missing digits for a randomized 7-digit number. Results indicated that an inserted silence improved recall accuracy for digits immediately following. In the second experiment, we evaluated sentence recollection. Results indicated that sentences preceded by an inhalation breath noise were better recalled than those with no inhalation. Together, these results reveal that in single-sentence laboratory settings PINTs can improve recall for synthesized speech. The speech synthesis systems used in the small-context recall experiments did not provide much freedom in terms of controlling PINT type or location. Therefore, we endeavoured to develop bespoke speech synthesis systems. Two neural text-to-speech (TTS) systems were created: one that used PINTs annotation labels in the training data, and another that did not include any PINTs labeling in the training material. The first system allowed fine-tuned control for inserting PINTs material into the rendered material. The second system produced PINTs probabilistally. To the best of our knowledge, these are the first TTS systems to render tongue clicks. Equipped with greater control of synthesized PINTs, we returned to evaluating the recall benefit of PINTs. This time we evaluated the influence of PINTs on the recollection of key information in lectures, an ecologically valid task that focused on larger material lengths. Results indicated that key information that followed PINTs material was less likely to be recalled. We were unable to replicate the benefits of PINTs found in the small-context laboratory settings. This body of work showcases that PINTs improve recall for TTS in small-context environments just like previous work had indicated for natural speech. Additionally, we’ve provided a technological contribution via a neural TTS system that exerts finer control over PINT type and placement. Lastly, we’ve shown the importance of using material rendered by speech synthesis systems in perceptual studies.This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) within the project “Pause-internal phonetic particles in speech communication” (project number: 418659027; project IDs: MO 597/10-1 and TR 468/3-1). Associate member of SFB1102 “Information Density and Linguistic Encoding” (project number: 232722074)

    Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference
    • …
    corecore