5,485 research outputs found

    A silent speech system based on permanent magnet articulography and direct synthesis

    Get PDF
    In this paper we present a silent speech interface (SSI) system aimed at restoring speech communication for individuals who have lost their voice due to laryngectomy or diseases affecting the vocal folds. In the proposed system, articulatory data captured from the lips and tongue using permanent magnet articulography (PMA) are converted into audible speech using a speaker-dependent transformation learned from simultaneous recordings of PMA and audio signals acquired before laryngectomy. The transformation is represented using a mixture of factor analysers, which is a generative model that allows us to efficiently model non-linear behaviour and perform dimensionality reduction at the same time. The learned transformation is then deployed during normal usage of the SSI to restore the acoustic speech signal associated with the captured PMA data. The proposed system is evaluated using objective quality measures and listening tests on two databases containing PMA and audio recordings for normal speakers. Results show that it is possible to reconstruct speech from articulator movements captured by an unobtrusive technique without an intermediate recognition step. The SSI is capable of producing speech of sufficient intelligibility and naturalness that the speaker is clearly identifiable, but problems remain in scaling up the process to function consistently for phonetically rich vocabularies

    Combining phonological and acoustic ASR-free features for pathological speech intelligibility assessment

    Get PDF
    Intelligibility is widely used to measure the severity of articulatory problems in pathological speech. Recently, a number of automatic intelligibility assessment tools have been developed. Most of them use automatic speech recognizers (ASR) to compare the patient's utterance with the target text. These methods are bound to one language and tend to be less accurate when speakers hesitate or make reading errors. To circumvent these problems, two different ASR-free methods were developed over the last few years, only making use of the acoustic or phonological properties of the utterance. In this paper, we demonstrate that these ASR-free techniques are also able to predict intelligibility in other languages. Moreover, they show to be complementary, resulting in even better intelligibility predictions when both methods are combined

    Computing phonological generalization over real speech exemplars

    No full text
    Though it has attracted growing attention from phonologists and phoneticians Exemplar Theory (e g Bybee 2001) has hitherto lacked an explicit production model that can apply to speech signals An adequate model must be able to generalize but this presents the problem of how to generate an output that generalizes over a collection of unique variable-length signals Rather than resorting to a priori phonological units such as phones we adopt a dynamic programming approach using an optimization criterion that is sensitive to the frequency of similar subsequences within other exemplars the Phonological Exemplar-Based Learning System We show that PEBLS displays pattern-entrenchment behaviour central to Exemplar Theory s account of phonologization (C) 2010 Elsevier Ltd All rights reserve

    A Tool for Differential Diagnosis of Childhood Apraxia of Speech and Dysarthria in Children: A Tutorial

    Get PDF
    Purpose: While there has been mounting research centered on the diagnosis of childhood apraxia of speech (CAS), little has focused on differentiating CAS from pediatric dysarthria. Because CAS and dysarthria share overlapping speech symptoms and some children have both motor speech disorders, differential diagnosis can be challenging. There is a need for clinical tools that facilitate assessment of both CAS and dysarthria symptoms in children. The goals of this tutorial are to (a) determine confidence levels of clinicians in differentially diagnosing dysarthria and CAS and (b) provide a systematic procedure for differentiating CAS and pediatric dysarthria in children. Method: Evidence related to differential diagnosis of CAS and dysarthria is reviewed. Next, a web-based survey of 359 pediatric speech-language pathologists is used to determine clinical confidence levels in diagnosing CAS and dysarthria. Finally, a checklist of pediatric auditory–perceptual motor speech features is presented along with a procedure to identify CAS and dysarthria in children with suspected motor speech impairments. Case studies illustrate application of this protocol, and treatment implications for complex cases are discussed. Results: The majority (60%) of clinician respondents reported low or no confidence in diagnosing dysarthria in children, and 40% reported they tend not to make this diagnosis as a result. Going forward, clinicians can use the feature checklist and protocol in this tutorial to support the differential diagnosis of CAS and dysarthria in clinical practice. Conclusions: Incorporating this diagnostic protocol into clinical practice should help increase confidence and accuracy in diagnosing motor speech disorders in children. Future research should test the sensitivity and specificity of this protocol in a large sample of children with varying speech sound disorders. Graduate programs and continuing education trainings should provide opportunities to practice rating speech features for children with dysarthria and CAS

    Models of verbal working memory capacity: What does it take to make them work?

    Get PDF
    Theories of working memory (WM) capacity limits will be more useful when we know what aspects of performance are governed by the limits and what aspects are governed by other memory mechanisms. Whereas considerable progress has been made on models of WM capacity limits for visual arrays of separate objects, less progress has been made in understanding verbal materials, especially when words are mentally combined to form multiword units or chunks. Toward a more comprehensive theory of capacity limits, we examined models of forced-choice recognition of words within printed lists, using materials designed to produce multiword chunks in memory (e.g., leather brief case). Several simple models were tested against data from a variety of list lengths and potential chunk sizes, with test conditions that only imperfectly elicited the interword associations. According to the most successful model, participants retained about 3 chunks on average in a capacity-limited region of WM, with some chunks being only subsets of the presented associative information (e.g., leather brief case retained with leather as one chunk and brief case as another). The addition to the model of an activated long-term memory component unlimited in capacity was needed. A fixed-capacity limit appears critical to account for immediate verbal recognition and other forms of WM. We advance a model-based approach that allows capacity to be assessed despite other important processing contributions. Starting with a psychological-process model of WM capacity developed to understand visual arrays, we arrive at a more unified and complete model

    Direct Speech Reconstruction From Articulatory Sensor Data by Machine Learning

    Get PDF
    This paper describes a technique that generates speech acoustics from articulator movements. Our motivation is to help people who can no longer speak following laryngectomy, a procedure that is carried out tens of thousands of times per year in the Western world. Our method for sensing articulator movement, permanent magnetic articulography, relies on small, unobtrusive magnets attached to the lips and tongue. Changes in magnetic field caused by magnet movements are sensed and form the input to a process that is trained to estimate speech acoustics. In the experiments reported here this “Direct Synthesis” technique is developed for normal speakers, with glued-on magnets, allowing us to train with parallel sensor and acoustic data. We describe three machine learning techniques for this task, based on Gaussian mixture models, deep neural networks, and recurrent neural networks (RNNs). We evaluate our techniques with objective acoustic distortion measures and subjective listening tests over spoken sentences read from novels (the CMU Arctic corpus). Our results show that the best performing technique is a bidirectional RNN (BiRNN), which employs both past and future contexts to predict the acoustics from the sensor data. BiRNNs are not suitable for synthesis in real time but fixed-lag RNNs give similar results and, because they only look a little way into the future, overcome this problem. Listening tests show that the speech produced by this method has a natural quality that preserves the identity of the speaker. Furthermore, we obtain up to 92% intelligibility on the challenging CMU Arctic material. To our knowledge, these are the best results obtained for a silent-speech system without a restricted vocabulary and with an unobtrusive device that delivers audio in close to real time. This work promises to lead to a technology that truly will give people whose larynx has been removed their voices back

    Variables contributing to listener effort in speakers with Parkinson\u27s disease

    Get PDF
    Reduced speech intensity or hypophonia is a common speech deficit observed in hypokinetic dysarthria associated with Parkinson’s disease (PD). The introduction of background noise is a particularly relevant context to study in relation to this speech symptom. Previous research has indicated that listeners have more difficulty understanding dysarthric speech, and must exert more effort when listening. However, little is known of the specific features of the speech signal that contribute to perceived listener effort in the speech of individuals with PD and hypophonia. The purpose of this study is to investigate two speech features (1. Articulatory Imprecision 2. Reduced Loudness) that may contribute to perceived listener effort and that are commonly impaired in individuals with PD. This study also aims to determine potential relationships among ratings of listener effort and speech intelligibility in two noise conditions (no added background noise and 65 dB multi-talker background noise). Listener participants orthographically transcribed audio recordings of each speaker with PD reading three sentences from the Sentence Intelligibility Test (SIT). Intelligibility, listener effort, articulatory imprecision, and reduced loudness of these sentences was also rated in each noise condition using visual analogue scaling (VAS). Results revealed that the noise condition had a significant impact on the ratings of intelligibility, listener effort, articulatory imprecision, and reduced loudness. The results of this study revealed that individuals with PD and hypophonia were rated to have less intense speech, less precise speech, and reduced speech intelligibility in background noise, and ratings of listener effort were also significantly higher in background noise
    corecore