392 research outputs found

    Silent Speech Interfaces for Speech Restoration: A Review

    Get PDF
    This work was supported in part by the Agencia Estatal de Investigacion (AEI) under Grant PID2019-108040RB-C22/AEI/10.13039/501100011033. The work of Jose A. Gonzalez-Lopez was supported in part by the Spanish Ministry of Science, Innovation and Universities under Juan de la Cierva-Incorporation Fellowship (IJCI-2017-32926).This review summarises the status of silent speech interface (SSI) research. SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication whenever normal verbal communication is not possible or not desirable. In this review, we focus on the first case and present latest SSI research aimed at providing new alternative and augmentative communication methods for persons with severe speech disorders. SSIs can employ a variety of biosignals to enable silent communication, such as electrophysiological recordings of neural activity, electromyographic (EMG) recordings of vocal tract movements or the direct tracking of articulator movements using imaging techniques. Depending on the disorder, some sensing techniques may be better suited than others to capture speech-related information. For instance, EMG and imaging techniques are well suited for laryngectomised patients, whose vocal tract remains almost intact but are unable to speak after the removal of the vocal folds, but fail for severely paralysed individuals. From the biosignals, SSIs decode the intended message, using automatic speech recognition or speech synthesis algorithms. Despite considerable advances in recent years, most present-day SSIs have only been validated in laboratory settings for healthy users. Thus, as discussed in this paper, a number of challenges remain to be addressed in future research before SSIs can be promoted to real-world applications. If these issues can be addressed successfully, future SSIs will improve the lives of persons with severe speech impairments by restoring their communication capabilities.Agencia Estatal de Investigacion (AEI) PID2019-108040RB-C22/AEI/10.13039/501100011033Spanish Ministry of Science, Innovation and Universities under Juan de la Cierva-Incorporation Fellowship IJCI-2017-3292

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    Evaluation of STT technologies performance and database design for Spanish dysarthric speech

    Get PDF
    [EN] Automatic Speech Recognition (ASR) systems have become an everyday use tool worldwide. Their use has spread throughout these last years and they have also been implemented in Environmental Control Systems (ECS) or Speech Generating Devices (SGD), among others. These systems might be especially beneficial for people with physical disabilities, as they would be able to control different devices with voice commands, therefore reducing the physical effort they have to make. However, people with functional diversity usually present difficulties in speech articulation too. One of the most common speech articulation problems is dysarthria, a disorder in the nervous system which causes weakness in muscles used for speech. Existing commercial ASR systems are not able to correctly understand dysarthric speech, so people with this condition cannot exploit this technology. Some investigation tackling this issue has been conducted, but an optimal solution has not been reached yet. On the other hand, nearly all existing investigation on the matter is in English, no previous study has approached the problem in other languages. Apart form this, ASR systems require of large speech databases, which are currently very few, most of them in English and they have not been designed for this end. Some commercial ASR systems offer a customization interface where users can train a base model with their speech data and thus improve the recognition accuracy. In this thesis, we evaluated the performance of the commercial ASR system Microsoft Azure Speech to Text. First, we reviewed the current state of the art. Then, we created a pilot database in Spanish and recorded it with 3 heterogeneous people with dysarthria and 1 typical speaker to be used as reference. Lastly, we trained the system and conducted different experiments to measure its accuracy. Results show that, overall, the customized models outperform the base models of the system. However, the results were not homogeneous, but vary depending on the speaker. Even though the recognition accuracy improved considerably, the results were far from being as good as those obtained for typical speech

    SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DSYARTHRIC SPEECH RECOGNITION

    Get PDF
    Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems may help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech is required, which is not readily available for dysarthric talkers. In this dissertation, we investigate dysarthric speech augmentation and synthesis methods. To better understand differences in prosodic and acoustic characteristics of dysarthric spontaneous speech at varying severity levels, a comparative study between typical and dysarthric speech was conducted. These characteristics are important components for dysarthric speech modeling, synthesis, and augmentation. For augmentation, prosodic transformation and time-feature masking have been proposed. For dysarthric speech synthesis, this dissertation has introduced a modified neural multi-talker TTS by adding a dysarthria severity level coefficient and a pause insertion model to synthesize dysarthric speech for varying severity levels. In addition, we have extended this work by using a label propagation technique to create more meaningful control variables such as a continuous Respiration, Laryngeal and Tongue (RLT) parameter, even for datasets that only provide discrete dysarthria severity level information. This approach increases the controllability of the system, so we are able to generate more dysarthric speech with a broader range. To evaluate their effectiveness for synthesis of training data, dysarthria-specific speech recognition was used. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, and that the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Overall results on the TORGO database demonstrate that using dysarthric synthetic speech to increase the amount of dysarthric-patterned speech for training has a significant impact on the dysarthric ASR systems

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis

    Towards Automatic Speech-Language Assessment for Aphasia Rehabilitation

    Full text link
    Speech-based technology has the potential to reinforce traditional aphasia therapy through the development of automatic speech-language assessment systems. Such systems can provide clinicians with supplementary information to assist with progress monitoring and treatment planning, and can provide support for on-demand auxiliary treatment. However, current technology cannot support this type of application due to the difficulties associated with aphasic speech processing. The focus of this dissertation is on the development of computational methods that can accurately assess aphasic speech across a range of clinically-relevant dimensions. The first part of the dissertation focuses on novel techniques for assessing aphasic speech intelligibility in constrained contexts. The second part investigates acoustic modeling methods that lead to significant improvement in aphasic speech recognition and allow the system to work with unconstrained speech samples. The final part demonstrates the efficacy of speech recognition-based analysis in automatic paraphasia detection, extraction of clinically-motivated quantitative measures, and estimation of aphasia severity. The methods and results presented in this work will enable robust technologies for accurately recognizing and assessing aphasic speech, and will provide insights into the link between computational methods and clinical understanding of aphasia.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/140840/1/ducle_1.pd

    Envelhecimento vocal: estudo acústico-articulatório das alterações de fala com a idade

    Get PDF
    Background: Although the aging process causes specific alterations in the speech organs, the knowledge about the age effects in speech production is still disperse and incomplete. Objective: To provide a broader view of the age-related segmental and suprasegmental speech changes in European Portuguese (EP), considering new aspects besides static acoustic features, such as dynamic and articulatory data. Method: Two databases, with speech data of Portuguese adult native speakers obtained through standardized recording and segmentation procedures, were devised: i) an acoustic database containing all EP oral vowels produced in similar context (reading speech), and also a sample of semispontaneous speech (image description) collected from a large sample of adults between the ages 35 and 97; ii) and another with articulatory data (ultrasound (US) tongue images synchronized with speech) for all EP oral vowels produced in similar contexts (pseudowords and isolated) collected from young ([21-35]) and older ([55-73]) adults. Results: Based on the curated databases, various aspects of the aging speech were analyzed. Acoustically, the aging speech is characterized by: 1) longer vowels (in both genders); 2) a tendency for F0 to decrease in women and slightly increase in men; 3) lower vowel formant frequencies in females; 4) a significant reduction of the vowel acoustic space in men; 5) vowels with higher trajectory slope of F1 (in both genders); 6) shorter descriptions with higher pause time for males; 7) faster speech and articulation rate for females; and 8) lower HNR for females in semi-spontaneous speech. In addition, the total speech duration decrease is associated to non-severe depression symptoms and age. Older adults tended to present more depressive symptoms that could impact the amount of speech produced. Concerning the articulatory data, the tongue tends to be higher and more advanced with aging for almost all vowels, meaning that the vowel articulatory space tends to be higher, advanced, and bigger in older females. Conclusion: This study provides new information on aging speech for a language other than English. These results corroborate that speech changes with age and present different patterns between genders, and also suggest that speakers might develop specific articulatory adjustments with aging.Contextualização: Embora o processo de envelhecimento cause alterações específicas no sistema de produção de fala, o conhecimento sobre os efeitos da idade na fala é ainda disperso e incompleto. Objetivo: Proporcionar uma visão mais ampla das alterações segmentais e suprassegmentais da fala relacionadas com a idade no Português Europeu (PE), considerando outros aspetos, para além das características acústicas estáticas, tais como dados dinâmicos e articulatórios. Método: Foram criadas duas bases de dados, com dados de fala de adultos nativos do PE, obtidos através de procedimentos padronizados de gravação e segmentação: i) uma base de dados acústica contendo todas as vogais orais do PE em contexto semelhante (leitura de palavras), e também uma amostra de fala semiespontânea (descrição de imagem) produzidas por uma larga amostra de indivíduos entre os 35 e os 97 anos; ii) e outra com dados articulatórios (imagens de ultrassom da língua sincronizadas com o sinal acústico) de todas as vogais orais do PE produzidas em contextos semelhantes (pseudopalavras e palavras isoladas) por adultos de duas faixas etárias ([21-35] e [55-73]). Resultados: Tendo em conta as bases de dados curadas, foi analisado o efeito da idade em diversas características da fala. Acusticamente, a fala de pessoas mais velhas é caracterizada por: 1) vogais mais longas (ambos os sexos); 2) tendência para F0 diminuir nas mulheres e aumentar ligeiramente nos homens; 3) diminuição da frequência dos formantes das vogais nas mulheres; 4) redução significativa do espaço acústico das vogais nos homens; 5) vogais com maior inclinação da trajetória de F1 (ambos os sexos); 6) descrições mais curtas e com maior tempo de pausa nos homens; 7) aumento da velocidade articulatória e da velocidade de fala nas mulheres; e 8) diminuição do HNR na fala semiespontânea em mulheres. Além disso, os idosos tendem a apresentar mais sintomas depressivos que podem afetar a quantidade de fala produzida. Em relação aos dados articulatórios, a língua tende a apresentar-se mais alta e avançada em quase todas as vogais com a idade, ou seja o espaço articulatório das vogais tende a ser maior, mais alto e avançado nas mulheres mais velhas. Conclusão: Este estudo fornece novos dados sobre o efeito da idade na fala para uma língua diferente do inglês. Os resultados corroboram que a fala sofre alterações com a idade, que diferem em função do género, sugerindo ainda que os falantes podem desenvolver ajustes articulatórios específicos com a idade.Programa Doutoral em Gerontologia e Geriatri

    An examination of the language construct in NIMH's research domain criteria:Time for reconceptualization!

    Get PDF
    The National Institute of Mental Health’s Research Domain Criteria (RDoC) Initiative “calls for the development of new ways of classifying psychopathology based on dimensions of observable behavior.” As aresult of this ambitious initiative, language has been identifi d as an independent construct in the RDoC matrix. In this article, we frame language within an evolutionary and neuro- psychological context and discuss some of the limitations to the current measurements of language. Findings from genomics and the neuroimaging of performance during language tasks are dis- cussed in relation to serious mental illness and within the context of caveats regarding measuring language. Indeed, the data collec- tion and analysis methods employed to assay language have been both aided and constrained by the available technologies, methodologies, and conceptual defi Consequently, differ- ent fields of language research show inconsistent defi s of language that have become increasingly broad over time. Individ- ually, they have also shown significant improvements in conceptual resolution, aswell as inexperimental and analytic techniques. More recently, language research has embraced collaborations across disciplines, notably neuroscience, cognitive science, and computa- tional linguistics and has ultimately re-defi classical ideas of language. As we move forward, the new models of language with their remarkably multifaceted constructs force a re-examination of the NIMH RDoC conceptualization of language and thus the neuroscience and genetics underlying this concept

    Augmented Reality

    Get PDF
    Augmented Reality (AR) is a natural development from virtual reality (VR), which was developed several decades earlier. AR complements VR in many ways. Due to the advantages of the user being able to see both the real and virtual objects simultaneously, AR is far more intuitive, but it's not completely detached from human factors and other restrictions. AR doesn't consume as much time and effort in the applications because it's not required to construct the entire virtual scene and the environment. In this book, several new and emerging application areas of AR are presented and divided into three sections. The first section contains applications in outdoor and mobile AR, such as construction, restoration, security and surveillance. The second section deals with AR in medical, biological, and human bodies. The third and final section contains a number of new and useful applications in daily living and learning
    corecore