522 research outputs found

    Automatic Screening of Childhood Speech Sound Disorders and Detection of Associated Pronunciation Errors

    Full text link
    Speech disorders in children can affect their fluency and intelligibility. Delay in their diagnosis and treatment increases the risk of social impairment and learning disabilities. With the significant shortage of Speech and Language Pathologists (SLPs), there is an increasing interest in Computer-Aided Speech Therapy tools with automatic detection and diagnosis capability. However, the scarcity and unreliable annotation of disordered child speech corpora along with the high acoustic variations in the child speech data has impeded the development of reliable automatic detection and diagnosis of childhood speech sound disorders. Therefore, this thesis investigates two types of detection systems that can be achieved with minimum dependency on annotated mispronounced speech data. First, a novel approach that adopts paralinguistic features which represent the prosodic, spectral, and voice quality characteristics of the speech was proposed to perform segment- and subject-level classification of Typically Developing (TD) and Speech Sound Disordered (SSD) child speech using a binary Support Vector Machine (SVM) classifier. As paralinguistic features are both language- and content-independent, they can be extracted from an unannotated speech signal. Second, a novel Mispronunciation Detection and Diagnosis (MDD) approach was introduced to detect the pronunciation errors made due to SSDs and provide low-level diagnostic information that can be used in constructing formative feedback and a detailed diagnostic report. Unlike existing MDD methods where detection and diagnosis are performed at the phoneme level, the proposed method achieved MDD at the speech attribute level, namely the manners and places of articulations. The speech attribute features describe the involved articulators and their interactions when making a speech sound allowing a low-level description of the pronunciation error to be provided. Two novel methods to model speech attributes are further proposed in this thesis, a frame-based (phoneme-alignment) method leveraging the Multi-Task Learning (MTL) criterion and training a separate model for each attribute, and an alignment-free jointly-learnt method based on the Connectionist Temporal Classification (CTC) sequence to sequence criterion. The proposed techniques have been evaluated using standard and publicly accessible adult and child speech corpora, while the MDD method has been validated using L2 speech corpora

    The Production of Emotional Prosdy in Varying Severities of Apraxia of Speech

    Get PDF
    One mild AOS, one moderate AOS and one control speaker were asked to produce utterances with different emotional intent. In Experiment 1, the three subjects were asked to produce sentences with a happy, sad, or neutral intent through a repetition task. In Experiment 2, the three subjects were asked to produce sentences with either a happy or sad intent through a picture elicitation task. Paired t-tests comparing data from the acoustic analyses of each subject\u27s utterances revealed significant differences between FO, duration, and intensity characteristics between the happy and sad sentences of the control speaker. There were no significant differences in the acoustic characteristics of the productions of the AOS speakers suggesting that the AOS subjects were unable to volitionally produce acoustic parameters that help convey emotion. Two more experiments were designed to determine if näive listeners could hear the acoustic cues to signal emotion in all three speakers. In Experiment 3, näive listeners were asked to identify the sentences produced in Experiment 1 as happy, sad, or neutral. In Experiment 4, näive listeners were asked to identify the sentences produced in Experiment 2 as either happy or sad. Chi-square findings revealed that the naive listeners were able to identify the emotional differences of the control speaker and the correct identification was not by chance. The näive listeners could not distinguish between the emotional utterances of the mild or moderate AOS speakers. Higher percentages of correct identification in certain sentences over others were artifacts attributed to either chance (the näive listeners were guessing) or a response strategy (when in doubt, the naive listeners chose neutral or sad). The findings from Exp. 3 & 4 corroborate the acoustic findings from Exp. 1 & 2. In addition to the 4 structured experiments, spontaneous samples of happy, sad, and neutral utterances were collected and compared to those sentences produced in Experiments 1 & 2. Comparisons between the elicited and spontaneous sentences indicated that the moderate AOS subject was able to produce variations of FO and duration similar to those variations that would be produced by normal speakers conveying emotion (Banse & Scherer, 1996; Lieberman & Michaels, 1962; Scherer, 1988). The mild AOS subject was unable to produce prosodic differences between happy and sad emotion. This study found that although these AOS subjects were unable to produce acoustic parameters during elicited speech that signal emotion, they were able to produce some more variation in the acoustic properties of FO and duration, especially in the moderate AOS speaker. However, any meaningful variation pattern that would convey emotion (such as seen in the control subject) were not found. These findings suggest that the AOS subjects probably convey emotion non-verbally (e.g., facial expression, muscle tension, body language)

    Specific Language Impairments and Possibilities of Classification and Detection from Children's Speech

    Get PDF
    Many young children have speech disorders. My research focused on one such disorder, known as specific language impairment or developmental dysphasia. A major problem in treating this disorder is the fact that specific language impairment is detected in children at a relatively late age. For successful speech therapy, early diagnosis is critical. I present two different approaches to this issue using a very simple test that I have devised for diagnosing this disorder. In this thesis, I describe a new method for detecting specific language impairment based on the number of pronunciation errors in utterances. An advantage of this method is its simplicity; anyone can use it, including parents. The second method is based on the acoustic features of the speech signal. An advantage of this method is that it could be used to develop an automatic detection system. KeyKatedra teorie obvod

    Extraction and Classification of Acoustic Features from Italian Speaking Children with Autism Spectrum Disorders

    Get PDF
    Autism Spectrum Disorders (ASD) are a group of complex developmental conditions whose effects and severity show high intraindividual variability. However, one of the main symptoms shared along the spectrum is social interaction impairments that can be explored through acoustic analysis of speech production. In this paper, we compare 14 Italian-speaking children with ASD and 14 typically developing peers. Accordingly, we extracted and selected the acoustic features related to prosody, quality of voice, loudness, and spectral distribution using the parameter set eGeMAPS provided by the openSMILE feature extraction toolkit. We implemented four supervised machine learning methods to evaluate the extraction performances. Our findings show that Decision Trees (DTs) and Support Vector Machines (SVMs) are the best-performing methods. The overall DT models reach a 100% recall on all the trials, meaning they correctly recognise autistic features. However, half of its models overfit, while SVMs are more consistent. One of the results of the work is the creation of a speech pipeline to extract Italian speech biomarkers typical of ASD by comparing our results with studies based on other languages. A better understanding of this topic can support clinicians in diagnosing the disorder

    Automated Classification of Phonetic Segments in Child Speech Using Raw Ultrasound Imaging

    Full text link
    Speech sound disorder (SSD) is defined as a persistent impairment in speech sound production leading to reduced speech intelligibility and hindered verbal communication. Early recognition and intervention of children with SSD and timely referral to speech and language therapists (SLTs) for treatment are crucial. Automated detection of speech impairment is regarded as an efficient method for examining and screening large populations. This study focuses on advancing the automatic diagnosis of SSD in early childhood by proposing a technical solution that integrates ultrasound tongue imaging (UTI) with deep-learning models. The introduced FusionNet model combines UTI data with the extracted texture features to classify UTI. The overarching aim is to elevate the accuracy and efficiency of UTI analysis, particularly for classifying speech sounds associated with SSD. This study compared the FusionNet approach with standard deep-learning methodologies, highlighting the excellent improvement results of the FusionNet model in UTI classification and the potential of multi-learning in improving UTI classification in speech therapy clinics

    Paralinguistic cues in the speech of withdrawn children

    Get PDF
    Thirty withdrawn (n = 15) and non-withdrawn (n = 15) Prince George school children from grades two to seven performed a public speech before their peers and an experimenter. The groups of children were compared in terms of paralinguistic vocal characteristics (mean speech production to total episode ratio, mean duration of pauses, filled pausing rate and mean variation of vocal pitch) and self report measures of social support and trait anxiety. Withdrawn children exhibited less speech and longer mean pause duration within the episode and reported lower levels of social support and higher levels of trait anxiety than did nonwithdrawn children. Paralinguistically, the relatively excessive silence and longer mean pause duration exhibited by the withdrawn children constitutes a qualitative rather than merely a quantitative behavioural difference from the non-withdrawn children. As excessive pausing is viewed as "unnattractive" by listeners (Siegman, 1987), the implications of passive and nonfluent vocal styles on peer acceptance are discussed.The original print copy of this thesis may be available here: http://wizard.unbc.ca/record=b119885

    Voice and speech perception in autism : a systematic review

    Get PDF
    Autism spectrum disorders (ASD) are characterized by persistent impairments in social communication and interaction, restricted and repetitive behavior. In the original description of autism by Kanner (1943) the presence of emotional impairments was already emphasized (self-absorbed, emotionally cold, distanced, and retracted). However, little research has been conducted focusing on auditory perception of vocal emotional cues, being the audio-visual comprehension most commonly explored instead. Similarly to faces, voices play an important role in social interaction contexts in which individuals with ASD show impairments. The aim of the current systematic review was to integrate evidence from behavioral and neurobiological studies for a more comprehensive understanding of voice processing abnormalities in ASD. Among different types of information that the human voice may provide, we hypothesize particular deficits with vocal affect information processing by individuals with ASD. The relationship between vocal stimuli impairments and disrupted Theory of Mind in Autism is discussed. Moreover, because ASD are characterized by deficits in social reciprocity, further discussion of the abnormal oxytocin system in individuals with ASD is performed as a possible biological marker for abnormal vocal affect information processing and social interaction skills in ASD population

    An Online Attachment Style Recognition System Based on Voice and Machine Learning

    Get PDF
    Attachment styles are known to have significant associations with mental and physical health. Specifically, insecure attachment leads individuals to higher risk of suffering from mental disorders and chronic diseases. The aim of this study is to develop an attachment recognition model that can distinguish between secure and insecure attachment styles from voice recordings, exploring the importance of acoustic features while also evaluating gender differences. A total of 199 participants recorded their responses to four open questions intended to trigger their attachment system using a web-based interrogation system. The recordings were processed to obtain the standard acoustic feature set eGeMAPS, and recursive feature elimination was applied to select the relevant features. Different supervised machine learning models were trained to recognize attachment styles using both gender-dependent and gender-independent approaches. The gender-independent model achieved a test accuracy of 58.88%, whereas the gender-dependent models obtained 63.88% and 83.63% test accuracy for women and men respectively, indicating a strong influence of gender on attachment style recognition and the need to consider them separately in further studies. These results also demonstrate the potential of acoustic properties for remote assessment of attachment style, enabling fast and objective identification of this health risk factor, and thus supporting the implementation of large-scale mobile screening systems
    corecore