4 research outputs found

    Reducción de ruido en la detección automática de hipernasalidad en niños

    Get PDF
    RESUMEN: En este artículo se presenta una metodología para reducir el ruido de fondo en un sistema de detección de hipernasalidad; se utilizan algunas medidas clásicas de calidad e inteligibilidad para evaluar los algoritmos, que mejoran las señales de voz, utilizados en el sistema. La detección de hipernasalidad se realiza con un clasificador lineal y se comparan los resultados obtenidos con diferentes algoritmos de sustracción espectral. Los resultados muestran que las técnicas de sustracción espectral pueden ser usadas para mejorar el rendimiento del clasificador en la detección de hipernasalidad cuando las señales se encuentran contaminadas con ruido aditivo.ABSTRACT: In this paper a methodology to reduce the background noise in a hypernasality detector system using spectral subtraction method is presented, some classical measures of quality and intelligibility are used to evaluate the speech enhancements algorithms used in the system. A linear classifier is used for the hypernasality detection and the results obtained with different spectral subtraction algorithms are compared. The results show that the spectral subtraction techniques can be used to improve the performance of the classifier in the detection of hypernasality when signals are contaminated with additive noise

    Acoustic analysis of the unvoiced stop consonants for detecting hypernasal speech

    Full text link
    Speakers having evidence of a defective velopharyngeal mechanism produce speech with inappropriate nasal resonance (hypernasal speech). Voice analysis methods for the detection of hypernasality commonly use vowels and nasalized vowels. However, to obtain a more general assessment of this abnormality it is necessary to analyze stops and fricatives. This study describes a method for hipernasality detection analyzing the unvoiced Spanish stop consonants /k/ and /p/, as well. The importance of phonemeby- phoneme analysis is shown, in contrast with whole word parametrization which may include irrelevant segments from the classification point of view. Parameters that correlate the imprints of Velopharyngeal Incompetence (VPI) over voiceless stop consonants were used in the feature estimation stage. Classification was carried out using a Support Vector Machine (SVM), obtaining a performance of 74% for a repeated cross-validation strategy evaluation

    Automatic Screening of Childhood Speech Sound Disorders and Detection of Associated Pronunciation Errors

    Full text link
    Speech disorders in children can affect their fluency and intelligibility. Delay in their diagnosis and treatment increases the risk of social impairment and learning disabilities. With the significant shortage of Speech and Language Pathologists (SLPs), there is an increasing interest in Computer-Aided Speech Therapy tools with automatic detection and diagnosis capability. However, the scarcity and unreliable annotation of disordered child speech corpora along with the high acoustic variations in the child speech data has impeded the development of reliable automatic detection and diagnosis of childhood speech sound disorders. Therefore, this thesis investigates two types of detection systems that can be achieved with minimum dependency on annotated mispronounced speech data. First, a novel approach that adopts paralinguistic features which represent the prosodic, spectral, and voice quality characteristics of the speech was proposed to perform segment- and subject-level classification of Typically Developing (TD) and Speech Sound Disordered (SSD) child speech using a binary Support Vector Machine (SVM) classifier. As paralinguistic features are both language- and content-independent, they can be extracted from an unannotated speech signal. Second, a novel Mispronunciation Detection and Diagnosis (MDD) approach was introduced to detect the pronunciation errors made due to SSDs and provide low-level diagnostic information that can be used in constructing formative feedback and a detailed diagnostic report. Unlike existing MDD methods where detection and diagnosis are performed at the phoneme level, the proposed method achieved MDD at the speech attribute level, namely the manners and places of articulations. The speech attribute features describe the involved articulators and their interactions when making a speech sound allowing a low-level description of the pronunciation error to be provided. Two novel methods to model speech attributes are further proposed in this thesis, a frame-based (phoneme-alignment) method leveraging the Multi-Task Learning (MTL) criterion and training a separate model for each attribute, and an alignment-free jointly-learnt method based on the Connectionist Temporal Classification (CTC) sequence to sequence criterion. The proposed techniques have been evaluated using standard and publicly accessible adult and child speech corpora, while the MDD method has been validated using L2 speech corpora
    corecore