265 research outputs found

    Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

    Full text link
    The rapid population aging has stimulated the development of assistive devices that provide personalized medical support to the needies suffering from various etiologies. One prominent clinical application is a computer-assisted speech training system which enables personalized speech therapy to patients impaired by communicative disorders in the patient's home environment. Such a system relies on the robust automatic speech recognition (ASR) technology to be able to provide accurate articulation feedback. With the long-term aim of developing off-the-shelf ASR systems that can be incorporated in clinical context without prior speaker information, we compare the ASR performance of speaker-independent bottleneck and articulatory features on dysarthric speech used in conjunction with dedicated neural network-based acoustic models that have been shown to be robust against spectrotemporal deviations. We report ASR performance of these systems on two dysarthric speech datasets of different characteristics to quantify the achieved performance gains. Despite the remaining performance gap between the dysarthric and normal speech, significant improvements have been reported on both datasets using speaker-independent ASR architectures.Comment: to appear in Computer Speech & Language - https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial text overlap with arXiv:1807.1094

    Combined brain language connectivity and intraoperative neurophysiologic techniques in awake craniotomy for eloquent-area brain tumor resection

    Get PDF
    Speech processing can be disturbed by primary brain tumors (PBT). Improvement of presurgical planning techniques decrease neurological morbidity associated to tumor resection during awake craniotomy. The aims of this work were: 1. To perform Diffusion Kurtosis Imaging based tractography (DKI-tract) in the detection of brain tracts involved in language; 2. To investigate which factors contribute to functional magnetic resonance imaging (fMRI) maps in predicting eloquent language regional reorganization; 3. To determine the technical aspects of accelerometric (ACC) recording of speech during surgery. DKI-tracts were streamlined using a 1.5T magnetic resonance scanner. Number of tracts and fiber pathways were compared between DKI and standard Diffusion Tensor Imaging (DTI) in healthy subjects (HS) and PBT patients. fMRI data were acquired using task-specific and resting-state paradigms during language and motor tasks. After testing intraoperative fMRI’s influence on direct cortical stimulation (DCS) number of stimuli, graph-theory measures were extracted and analyzed. Regarding speech recording, ACC signals were recorded after evaluating neck positions and filter bandwidths. To test this method, language disturbances were recorded in patients with dysphonia and after applying DCS in the inferior frontal gyrus. In contrast, HS reaction time was recorded during speech execution. DKI-tract showed increased number of arcuate fascicle tracts in PBT patients. Lower spurious tracts were identified with DKI-tract. Intraoperative fMRI and DCS showed similar stimuli in comparison with DCS alone. Increased local centrality accompanied language ipsilateral and contralateral reorganization. ACC recordings showed minor artifact contamination when placed at the suprasternal notch using a 20-200 Hz filter bandwidth. Patients with dysphonia showed decreased amplitude and frequency in comparison with HS. ACC detected an additional 11% disturbances after DCS, and a shortening of latency within the presence of a loud stimuli during speech execution. This work improved current knowledge on presurgical planning techniques based on brain structural and functional neuroimaging connectivity, and speech recordingA função linguística do ser humano pode ser afetada pela presença de tumores cerebrais (TC) A melhoria de técnicas de planeamento pré-cirurgico diminui a morbilidade neurológica iatrogénica associada ao seu tratamento cirúrgico. O objetivo deste trabalho é: 1. Testar a fiabilidade da tractografia estimada por difusor de kurtose (tract-DKI), dos feixes cerebrais envolvidos na linguagem 2. Identificar os fatores que contribuem para o mapeamento linguagem por ressonância magnética funcional (RMf) na predição da neuroplasticidade. 3. Identificar aspetos técnicos do registo da linguagem por accelerometria (ACC). A DKI-tract foi estimada após realização de RM cerebral com 1.5T. O número e percurso das fibras foi avaliado. A RMf foi adquirida durante realização de tarefas linguísticas, motoras, e em repouso. Foi testada influência dos mapas de ativação calculados por RMf, no número de estímulos realizados durante a estimulação direta cortical (EDC) intraoperatória. Medidas de conectividade foram extraídas de regiões cerebrais. A posição e filtragem de sinal ACC foram estudadas após vocalização de palavras. O sinal ACC obtido em voluntários foi comparado com doentes disfónicos, após estimulação do giro inferior frontal, e após a adição de um estímulo sonoro perturbador durante vocalização. A tract-DKI estimou um elevado número de fascículos do feixe arcuato com menos falsos negativos. Os mapas linguísticos de RMf intraoperatória, não influenciou a EDC. Medidas de centralidade aumentaram após neuroplasticidade ipsilateral e contralateral. A posição supraesternal e a filtragem de sinal ACC entre 20-200Hz demonstrou menor ruido de contaminação. Este método identificou diminuição de frequência e amplitude em doentes com disfonia, 11% de erros linguísticos adicionais após estimulação e diminuição do tempo de latência quando presente o sinal sonoro perturbador. Este trabalho promoveu a utilização de novas técnicas no planeamento pré-cirúrgico do doente com tumor cerebral e alterações da linguagem através do estudo de conectividade estrutural, funcional e registo da linguagem

    Negative vaccine voices in Swedish social media

    Get PDF
    Vaccinations are one of the most significant interventions to public health, but vaccine hesitancy creates concerns for a portion of the population in many countries, including Sweden. Since discussions on vaccine hesitancy are often taken on social networking sites, data from Swedish social media are used to study and quantify the sentiment among the discussants on the vaccination-or-not topic during phases of the COVID-19 pandemic. Out of all the posts analyzed a majority showed a stronger negative sentiment, prevailing throughout the whole of the examined period, with some spikes or jumps due to the occurrence of certain vaccine-related events distinguishable in the results. Sentiment analysis can be a valuable tool to track public opinions regarding the use, efficacy, safety, and importance of vaccination

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    THE RELATIONSHIP BETWEEN ACOUSTIC FEATURES OF SECOND LANGUAGE SPEECH AND LISTENER EVALUATION OF SPEECH QUALITY

    Get PDF
    Second language (L2) speech is typically less fluent than native speech, and differs from it phonetically. While the speech of some L2 English speakers seems to be easily understood by native listeners despite the presence of a foreign accent, other L2 speech seems to be more demanding, such that listeners must expend considerable effort in order to understand it. One reason for this increased difficulty may simply be the speaker’s pronunciation accuracy or phonetic intelligibility. If a L2 speaker’s pronunciations of English sounds differ sufficiently from the sounds that native listeners expect, these differences may force native listeners to work much harder to understand the divergent speech patterns. However, L2 speakers also tend to differ from native ones in terms of fluency – the degree to which a speaker is able to produce appropriately structured phrases without unnecessary pauses, self-corrections or restarts. Previous studies have shown that measures of fluency are strongly predictive of listeners’ subjective ratings of the acceptability of L2 speech: Less fluent speech is consistently considered less acceptable (Ginther, Dimova, & Yang, 2010). However, since less fluent speakers tend also to have less accurate pronunciations, it is unclear whether or how these factors might interact to influence the amount of effort listeners exert to understand L2 speech, nor is it clear how listening effort might relate to perceived quality or acceptability of speech. In this dissertation, two experiments were designed to investigate these questions

    Towards Automatic Speech-Language Assessment for Aphasia Rehabilitation

    Full text link
    Speech-based technology has the potential to reinforce traditional aphasia therapy through the development of automatic speech-language assessment systems. Such systems can provide clinicians with supplementary information to assist with progress monitoring and treatment planning, and can provide support for on-demand auxiliary treatment. However, current technology cannot support this type of application due to the difficulties associated with aphasic speech processing. The focus of this dissertation is on the development of computational methods that can accurately assess aphasic speech across a range of clinically-relevant dimensions. The first part of the dissertation focuses on novel techniques for assessing aphasic speech intelligibility in constrained contexts. The second part investigates acoustic modeling methods that lead to significant improvement in aphasic speech recognition and allow the system to work with unconstrained speech samples. The final part demonstrates the efficacy of speech recognition-based analysis in automatic paraphasia detection, extraction of clinically-motivated quantitative measures, and estimation of aphasia severity. The methods and results presented in this work will enable robust technologies for accurately recognizing and assessing aphasic speech, and will provide insights into the link between computational methods and clinical understanding of aphasia.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/140840/1/ducle_1.pd

    Toward an Imagined Speech-Based Brain Computer Interface Using EEG Signals

    Get PDF
    Individuals with physical disabilities face difficulties in communication. A number of neuromuscular impairments could limit people from using available communication aids, because such aids require some degree of muscle movement. This makes brain–computer interfaces (BCIs) a potentially promising alternative communication technology for these people. Electroencephalographic (EEG) signals are commonly used in BCI systems to capture non-invasively the neural representations of intended, internal and imagined activities that are not physically or verbally evident. Examples include motor and speech imagery activities. Since 2006, researchers have become increasingly interested in classifying different types of imagined speech from EEG signals. However, the field still has a limited understanding of several issues, including experiment design, stimulus type, training, calibration and the examined features. The main aim of the research in this thesis is to advance automatic recognition of imagined speech using EEG signals by addressing a variety of issues that have not been solved in previous studies. These include (1)improving the discrimination between imagined speech versus non-speech tasks, (2) examining temporal parameters to optimise the recognition of imagined words and (3) providing a new feature extraction framework for improving EEG-based imagined speech recognition by considering temporal information after reducing within-session temporal non-stationarities. For the discrimination of speech versus non-speech, EEG data was collected during the imagination of randomly presented and semantically varying words. The non-speech tasks involved attention to visual stimuli and resting. Time-domain and spatio-spectral features were examined in different time intervals. Above-chance-level classification accuracies were achieved for each word and for groups of words compared to the non-speech tasks. To classify imagined words, EEG data related to the imagination of five words was collected. In addition to words classification, the impacts of experimental parameters on classification accuracy were examined. The optimization of these parameters is important to improve the rate and speed of recognizing unspoken speech in on-line applications. These parameters included using different training sizes, classification algorithms, feature extraction in different time intervals and the use of imagination time length as classification feature. Our extensive results showed that Random Forest classifier with features extracted using Discrete Wavelet Transform from 4 seconds fixed time frame EEG yielded that highest average classification of 87.93% in classification of five imagined words. To minimise within class temporal variations, a novel feature extraction framework based on dynamic time warping (DTW) was developed. Using linear discriminant analysis as the classifier, the proposed framework yielded an average 72.02% accuracy in the classification of imagined speech versus silence and 52.5% accuracy in the classification of five words. These results significantly outperformed a baseline configuration of state-of-the art time-domain features

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy. This edition celebrates twenty years of uninterrupted and succesfully research in the field of voice analysis

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies
    corecore