2,628 research outputs found

    Detección automática de voz hipernasal de niños con labio y paladar hendido a partir de vocales y palabras del español usando medidas clásicas y análisis no lineal

    Get PDF
    RESUMEN: Este artículo presenta un sistema para la detección automática de señales de voz hipernasales basado en la combinación de dos diferentes esquemas de caracterización aplicados en las cinco vocales del español y dos palabras seleccionadas. El primer esquema está basado en características clásicas como perturbaciones del periodo fundamental, medidas de ruido y coeficientes cepstrales en la frecuencia de Mel. El segundo enfoque está basado en medidas de dinámica no lineal. Las características más relevantes son seleccionadas usando dos técnicas: análisis de componentes principales y selección flotante hacia adelante secuencial. La decisión acerca de si un registro de voz es hipernasal o sano es tomada usando una máquina de soporte vectorial de margen suave. Los experimentos consideran grabaciones de las cinco vocales del idioma español y las palabras y se consideran, asimismo, tres conjuntos de características: (1) el enfoque clásico, (2) el análisis de dinámica no lineal y (3) la combinación de ambos esquemas. En general, los aciertos son mayores y más estables cuando las características clásicas y no lineales son combinadas, indicando que el análisis de dinámica no lineal se complementa con el esquema clásico.ABSTRACT: This paper presents a system for the automatic detection of hypernasal speech signals based on the combination of two different characterization approaches applied to the five spanish vowels and two selected words. The first approach is based on classical features such as pitch period perturbations, noise measures, and Mel-Frequency Cepstral Coefficients (MFCC). The second approach is based on the Non-Linear Dynamics (NLD) analysis. The most relevant features are selected and sorted using two techniques: Principal Components Analysis (PCA) and Sequential Forward Floating Selection (SFFS). The decision about whether a voice record is hypernasal or healthy is taken using a Soft Margin - Support Vector Machine (SM-SVM). Experiments upon recordings of the five Spanish vowels and the words are performed considering three different set of features: (1) the classical approach, (2) the NLD analysis, and (3) the combination of the classical and NLD measures. In general, the accuracies are higher and more stable when the classical and NLD features are combined, indicating that the NLD analysis is complementary to the classical approach

    Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

    Full text link
    The rapid population aging has stimulated the development of assistive devices that provide personalized medical support to the needies suffering from various etiologies. One prominent clinical application is a computer-assisted speech training system which enables personalized speech therapy to patients impaired by communicative disorders in the patient's home environment. Such a system relies on the robust automatic speech recognition (ASR) technology to be able to provide accurate articulation feedback. With the long-term aim of developing off-the-shelf ASR systems that can be incorporated in clinical context without prior speaker information, we compare the ASR performance of speaker-independent bottleneck and articulatory features on dysarthric speech used in conjunction with dedicated neural network-based acoustic models that have been shown to be robust against spectrotemporal deviations. We report ASR performance of these systems on two dysarthric speech datasets of different characteristics to quantify the achieved performance gains. Despite the remaining performance gap between the dysarthric and normal speech, significant improvements have been reported on both datasets using speaker-independent ASR architectures.Comment: to appear in Computer Speech & Language - https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial text overlap with arXiv:1807.1094

    DIA : a tool for objective intelligibility assessment of pathological speech

    Get PDF
    Intelligibility is generally accepted to be a very relevant measure in the assessment of pathological speech. In clinical practice, intelligibility is measured using one of the many existing perceptual tests. These tests usually have the drawback that they employ unnatural speech material (e.g. nonsense words) and that they cannot fully exclude errors due to the listener's bias. This raises the need for an objective and automated tool to measure intelligibility. Here, we present the Dutch Intelligibility Assessment (DIA), an objective tool that aids the speech therapist in evaluating the intelligibility of persons with pathological speech. This tool will soon be made publicly available

    Representation Learning Strategies to Model Pathological Speech: Effect of Multiple Spectral Resolutions

    Full text link
    This paper considers a representation learning strategy to model speech signals from patients with Parkinson's disease and cleft lip and palate. In particular, it compares different parametrized representation types such as wideband and narrowband spectrograms, and wavelet-based scalograms, with the goal of quantifying the representation capacity of each. Methods for quantification include the ability of the proposed model to classify different pathologies and the associated disease severity. Additionally, this paper proposes a novel fusion strategy called multi-spectral fusion that combines wideband and narrowband spectral resolutions using a representation learning strategy based on autoencoders. The proposed models are able to classify the speech from Parkinson's disease patients with accuracy up to 95\%. The proposed models were also able to asses the dysarthria severity of Parkinson's disease patients with a Spearman correlation up to 0.75. These results outperform those observed in literature where the same problem was addressed with the same corpus.Comment: 7 pages, 3 figure

    A serious game for children with speech disorders and hearing problems

    Get PDF
    Tezin basılısı İstanbul Şehir Üniversitesi Kütüphanesi'ndedir.Speechimpedimentaffectingchildrenwithhearingdifficultiesandspeechdisordersrequires speech therapy and much practice to overcome. In fact, speech therapy via serious games gives an opportunity to children with speech disorders and hearing problems to overcome their problems. As far as children are more inclined to play games, so we intend to learn them by entertainments like serious games. In this thesis, we have designed and implemented a serious game that can be used both as a therapy and as a tool to measure the performance of children with speech impediments in which children will learn to speak specific words that they are expected to know before the age of 7. And then we will teach them how to make sentences. The game consists of three steps. The first step provides information for parents or therapists to decide if their child needs speech therapy or not. In the second step, the child starts to learn specific words while playing the game. The third step aims to measure the performance of the child and evaluate how much the child has learned at the end of the game. The game has an avatar which can be controlled by the child through speech, with the objective of moving the avatar around the environment to earn coins. The avatar is controlled by both voice commands such as Jump, Ahead, Back, Left, Right, and arrow keys of the keyboard. The child will be guided by an arrow during the game instead of getting help from a therapist or a teacher to guide the child to the next goal. This allows the child to practice longer hours, compared to clinical approaches under the supervision of a therapist, which are time-limited. Our preliminary performance measurements indicate an improvement of 40% for children who play our game at least 5 times and a specific period of time.Declaration of Authorship ii Abstract iv Öz v Acknowledgments vii List of Figures x List of Tables xi Abbreviations xii 1 Introduction 1 1.1 Introduction ................................... 1 1.1.1 Learning definition ........................... 1 1.1.2 Does gamification work? ........................ 2 1.2 Introduction to Serious Games: ........................ 4 1.2.1 What is serious games? ........................ 4 1.2.2 First Serious Game ........................... 5 1.2.3 Background on Serious Games .................... 5 1.3 Research Problems ............................... 7 1.4 Motivation.................................... 8 1.5 Research Contributions............................. 9 1.5.1 Research Publications ......................... 9 1.6 Thesis Outline ................................. 9 2 Background 11 2.1 Related Works ................................. 11 2.2 An overview of Serious Games in health ................... 13 2.3 Does speech therapy and language recovery work? .............. 14 2.4 A literature survey of serious games for speech disorder ........... 14 2.5 Main Characteristics of Into the Forest Game ................ 15 3 Proposed System 19 3.1 Game engine analysis .............................. 19 3.2 Avatar ...................................... 20 3.3 Proposed Game ................................. 21 4 Implementation 30 4.1 Preliminary Testing............................... 30 4.2 Testing ...................................... 32 5 Conclusion and Future Work 37 5.1 Conclusion.................................... 37 5.1.1 Future Work .............................. 38 Bibliography 3

    Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?

    Full text link
    The detection of pathologies from speech features is usually defined as a binary classification task with one class representing a specific pathology and the other class representing healthy speech. In this work, we train neural networks, large margin classifiers, and tree boosting machines to distinguish between four different pathologies: Parkinson's disease, laryngeal cancer, cleft lip and palate, and oral squamous cell carcinoma. We demonstrate that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be effectively used to classify these types of pathological voices. We evaluate the robustness of our classifiers by adding room impulse responses to the test data and by applying them to unseen speech corpora. Our approach achieves unweighted average F1-Scores between 74.1% and 96.4%, depending on the model and the noise conditions used. The systems generalize and perform well on unseen data of healthy speakers sampled from a variety of different sources.Comment: Submitted to ICASSP 202

    The latest development of the DELAD project for sharing corpora of speech disorders

    Get PDF
    Corpora of speech of individuals with communication disorders (CSD) are invaluable resources for education and research, but they are costly and hard to build and difficult to share for various reasons. DELAD, which means 'shared' in Swedish, is a project initiated by Professors Nicole Muller and Martin Ball in 2015 that aims to address this issue by establishing a platform for researchers to share datasets of speech disorders with interested audiences. To date four workshops have been held, where selected participants, covering various expertise including researchers in clinical phonetics and linguistics, speech and language therapy, infrastructure specialists, and ethics and legal specialists, participated to discuss relevant issues in setting up such an archive. Positive and steady progress has been made since 2015, including refurbishing the DELAD website (http://delad.net/) with information and application forms for researchers to join and share their datasets and linking with the CLARIN K-Centre for Atypical Communication Expertise (https://ace.ruhosting.nl/) where CSD can be hosted and accessed through the CLARIN B-Centres, The Language Archive (https://tla.mpi.nl/tools/tla-tools/) and TalkBank (https://talkbank.org/). The latest workshop, which was funded by CLARIN (Common Language Resources and Technology Infrastructure) was held as an online event in January 2021 on topics including Data Protection Impact Assessments, reviewing changes in ethics perspectives in academia on sharing CSD, and voice conversion as a mean to pseudonomise speech. This paper reports the latest progress of DELAD and discusses the directions for further advance of the initiative, with information on how researchers can contribute to the repository.Peer reviewe
    corecore