9 research outputs found

    Classifying Rhoticity of /r/ in Speech Sound Disorder using Age-and-Sex Normalized Formants

    Full text link
    Mispronunciation detection tools could increase treatment access for speech sound disorders impacting, e.g., /r/. We show age-and-sex normalized formant estimation outperforms cepstral representation for detection of fully rhotic vs. derhotic /r/ in the PERCEPT-R Corpus. Gated recurrent neural networks trained on this feature set achieve a mean test participant-specific F1-score =.81 ({\sigma}x=.10, med = .83, n = 48), with post hoc modeling showing no significant effect of child age or sex.Comment: To appear in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 202

    Prospective Validation of Motor-Based Intervention with Automated Mispronunciation Detection of Rhotics in Residual Speech Sound Disorders

    Full text link
    Because lab accuracy of clinical speech technology systems may be overoptimistic, clinical validation is vital to demonstrate system reproducibility - in this case, the ability of the PERCEPT-R Classifier to predict clinician judgment of American English /r/ during ChainingAI motor-based speech sound disorder intervention. All five participants experienced statistically-significant improvement in untreated words following 10 sessions of combined human-ChainingAI treatment. These gains, despite a wide range of PERCEPT-human and human-human (F1-score) agreement, raise questions about best measuring classification performance for clinical speech that may be perceptually ambiguous.Comment: To appear in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 202

    Implementation fidelity of a computer-assisted intervention for children with speech sound disorders

    Get PDF
    © 2017 The Speech Pathology Association of Australia Limited Published by Informa UK Limited, trading as Taylor & Francis Group. Background: Implementation fidelity refers to the degree to which an intervention or programme adheres to its original design. This paper examines implementation fidelity in the Sound Start Study, a clustered randomised controlled trial of computer-assisted support for children with speech sound disorders (SSD). Method: Sixty-three children with SSD in 19 early childhood centres received computer-assisted support (Phoneme Factory Sound Sorter [PFSS]–Australian version). Educators facilitated the delivery of PFSS targeting phonological error patterns identified by a speech-language pathologist. Implementation data were gathered via (1) the computer software, which recorded when and how much intervention was completed over 9 weeks; (2) educators’ records of practice sessions; and (3) scoring of fidelity (intervention procedure, competence and quality of delivery) from videos of intervention sessions. Result: Less than one-third of children received the prescribed number of days of intervention, while approximately one-half participated in the prescribed number of intervention plays. Computer data differed from educators’ data for total number of days and plays in which children participated; the degree of match was lower as data became more specific. Fidelity to intervention procedures, competency and quality of delivery was high. Conclusion: Implementation fidelity may impact intervention outcomes and so needs to be measured in intervention research; however, the way in which it is measured may impact on data

    Multimodal Based Audio-Visual Speech Recognition for Hard-of-Hearing: State of the Art Techniques and Challenges

    Get PDF
    Multimodal Integration (MI) is the study of merging the knowledge acquired by the nervous system using sensory modalities such as speech, vision, touch, and gesture. The applications of MI expand over the areas of Audio-Visual Speech Recognition (AVSR), Sign Language Recognition (SLR), Emotion Recognition (ER), Bio Metrics Applications (BMA), Affect Recognition (AR), Multimedia Retrieval (MR), etc. The fusion of modalities such as hand gestures- facial, lip- hand position, etc., are mainly used sensory modalities for the development of hearing-impaired multimodal systems. This paper encapsulates an overview of multimodal systems available within literature towards hearing impaired studies. This paper also discusses some of the studies related to hearing-impaired acoustic analysis. It is observed that very less algorithms have been developed for hearing impaired AVSR as compared to normal hearing. Thus, the study of audio-visual based speech recognition systems for the hearing impaired is highly demanded for the people who are trying to communicate with natively speaking languages.  This paper also highlights the state-of-the-art techniques in AVSR and the challenges faced by the researchers for the development of AVSR systems

    /r/ from Äännekoulu? : Families' experiences of an internet site designed for improving articulation errors at home

    Get PDF
    Objectives: 18 percent of Finnish children have some kind of articulation problems when starting their school path. Lack of speech therapists and unequal spread of these professionals has reduced possibility to receive speech therapy for mild problems. Instead resources are allocated for clients with severe speech and language disorders. In some municipalities parents and special education teachers are already responsible for the remediation of articulation problems. The aim of this study was to identify if Äännekoulu-website (www.aannekoulu.fi) can help to rehabilitate articulation problems and if the site will meet the needs of the parents struggling with their children' articulation problem. Methods: The study was qualitatively centred mixed method design that had also some quantitative elements. 21 children between the ages of 5 and 7 with /r/ articulation problem participated in this study. Children were tested three times. Children's spontaneous development of /r/ was controlled by having one month break after the first testing. After the second test families received the right to use Äännekoulu practice materials. Families were instructed to use the site for two months just as regular paying customers would do. Outcome of the intervention was observed during the third test. Families were also given a phone call after one and a half months of the last test to explore the possible changes in the /r/ articulation. In addition families were asked to keep a record of their home practices and answer in a survey concerning the Äännekoulu intervention period. Results and conclusions: Six families dropped out from the intervention. Five children found the right /r/ spontaneously. 12 families participated in the remediation period and one child found the missing /r/. Nine children had no changes in their articulation. From these nine children three continued their practice after the real intervention. During one month of practice two out of these three children found the /r/ completely and one learned to do long /r/-vibration. Families didn't practice a lot and they were unsatisfied with the amount of practice. According to the parents health issues within the family, burdensome family situations, lack of time, child's unwillingness to complete assignments, technical difficulties and laziness were the biggest barriers for practicing. Families considered Äännekoulu to be a good way to practice. Families were pleased about the nice games and the possibility to practise whenever and wherever since the exercises were available online. Unfortunately families trained impractically without knowing it and actually reinforced the incorrect articulation. Parents thought that the visual look of Äännekoulu should be revised and some of them also longed for more diverse games. Still the most crucial target for development in Äännekoulu is the instructions of the assignments. The model of the site hasn't been clear to the users even though families didn't consider the website to be confusing. It's clear that the the instructions have been insufficient since only one family realised how to progress accordingly. On the basis of this study it can be stated that speech therapist's guidance is significant when rehabilitating articulation problems. In the future it would be beneficial to explore, what kind of home instructions help families to execute their role in a best way possible, what is the role of a speech therapist in the future especially when considering telepractice or other technological services and can Äännekoulu fulfill the gap in the field of articulation remediation.Tavoitteet: Kouluaan aloittavista lapsista noin 18 prosentilla on äännevirheitä. Puheterapiaresurssien riittämättömyys ja epätasainen jakautuminen ovat aiheuttaneet sen, että puheterapiamäärät ovat vähentyneet monissa kunnissa ja lievimmät häiriöt jäävät jopa kokonaan hoitamatta. Artikulaatiovirheiden korjaaminen onkin jo paikoittain siirtynyt vanhempien ja erityisopettajien vastuulle. Tämän tutkielman tarkoituksena oli selvittää voisiko Äännekoulu-sivusto (www.aannekoulu.fi) vastata niihin tarpeisiin, joita äännevirheisten lasten vanhemmat kohtaavat, ja auttaako sivusto äänteen kuntouttamisessa. Äännekoulun avulla perheet voivat harjoitella kotona lasten puuttuvaa äännettä puheterapeutin suunnittelemilla ohjeilla ja tehtävillä. Menetelmät: Tutkimus oli laadullisesti painottuva monimenetelmäinen tutkimus, joka sisälsi myös määrällistä aineistoa. Tutkimukseen osallistui 21 5–7-vuotiasta lasta, joilla oli r-äännevirhe. Lapsia tavattiin kolme kertaa: alku-baseline -tutkimuksessa, alkutestauksessa ja lopputestauksessa. Spontaania äänteen oppimista kontrolloitiin alku-baselinen jälkeen pidetyllä kuukauden tauolla. Alkutestauksen jälkeen perheet saivat kahden kuukauden käyttöoikeuden Äännekoulun /r/-materiaaleihin. Perheet toteuttivat harjoittelua samalla tavoin kuin Äännekoulun muutkin asiakkaat. Harjoittelun tuloksellisuutta tarkasteltiin lopputestauksessa. Äänteen mahdollisia muutoksia kartoitettiin vielä soittamalla perheisiin puolentoista kuukauden kuluttua lopputestauksesta. Perheitä pyydettiin myös pitämään kirjaa kotiharjoitteluistaan sekä vastaamaan harjoittelujakson jälkeen kyselyyn, jossa kartoitetiin harjoittelun sujuvuutta, harjoittelua edistäneitä ja jarruttaneita asioita sekä Äännekoulun toimivuutta. Tulokset ja johtopäätökset: Tutkittavista perheistä kuusi jättäytyi pois harjoittelujaksolta ja viiden lapsen /r/ löytyi spontaanisti ilman harjoittelua. Harjoittelujaksolle osallistuneista 12 perheestä yhden lapsi löysi /r/:n täryn harjoittelujakson aikana. Yhdeksän lapsen äänteessä ei tapahtunut muutosta. Näistä yhdeksästä kolme jatkoi harjoittelua vielä varsinaisen harjoittelujakson jälkeen Äännekoulun avulla. Heistä kaksi oppi kuukauden aikana /r/:n kokonaan ja yksi oppi pitkän /r/:n täryn. Harjoittelujaksolle osallistuneet perheet olivat yleisesti tyytymättömiä harjoittelumääriinsä ja harjoittelivatkin melko vähän. Harjoittelua vähensivät lapsen ja muiden perheenjäsenten sairastumiset, kuormittavat perhetilanteet, ajanpuute, lapsen haluttomuus tehdä tehtäviä, tekniset hankaluudet sekä laiskuus. Perheet itse pitivät Äännekoulua hyvänä harjoittelumuotona, mutta etenivät harjoittelussaan tietämättään epätarkoituksenmukaisesti ja vahvistivat vahingossa lastensa virheellistä äännettä. Erityistä kiitosta sivusto sai kivoista peleistä sekä siitä, että harjoittelu tapahtui internetin kautta, jolloin se ei ollut aikaan tai paikkaan sidottua. Kehitettävää vanhemmat löysivät erityisesti Äännekoulun visuaalisesta ilmeestä ja muutamat perheet kaipasivat myös monipuolisempia pelejä. Äännekoulun suurin kehityskohta oli kuitenkin tehtävien ohjeistuksissa. Vaikka perheet eivät kokeneet yksittäisiä tehtäväohjeistuksia epäselvinä, sivuston etenemismalli oli jäänyt vanhemmille vieraaksi. Koska vain yksi perhe oli harjoitellut tehtäviä tarkoituksenmukaisella tavalla, on selvää, että ohjeistukset olivat olleet perheiden tarpeisiin riittämättömät. Tulosten perusteella voidaan myös todeta, että puheterapeutin antamalla ohjauksella oli merkittävä rooli äänteen kuntoutumisessa. Jatkossa olisikin hyvä tutkia, millaiset kotiohjeistukset auttavat perheitä parhaiten toteuttamaan kuntoutustehtäväänsä, mikä on puheterapeutin tulevaisuuden rooli erityisesti etäkuntoutuksen kentällä ja voiko Äännekoulu täyttää kuntoutuskentällä olevaa aukkoa äännevirheiden kuntouttamisessa

    Multimodal Data Analysis of Dyadic Interactions for an Automated Feedback System Supporting Parent Implementation of Pivotal Response Treatment

    Get PDF
    abstract: Parents fulfill a pivotal role in early childhood development of social and communication skills. In children with autism, the development of these skills can be delayed. Applied behavioral analysis (ABA) techniques have been created to aid in skill acquisition. Among these, pivotal response treatment (PRT) has been empirically shown to foster improvements. Research into PRT implementation has also shown that parents can be trained to be effective interventionists for their children. The current difficulty in PRT training is how to disseminate training to parents who need it, and how to support and motivate practitioners after training. Evaluation of the parents’ fidelity to implementation is often undertaken using video probes that depict the dyadic interaction occurring between the parent and the child during PRT sessions. These videos are time consuming for clinicians to process, and often result in only minimal feedback for the parents. Current trends in technology could be utilized to alleviate the manual cost of extracting data from the videos, affording greater opportunities for providing clinician created feedback as well as automated assessments. The naturalistic context of the video probes along with the dependence on ubiquitous recording devices creates a difficult scenario for classification tasks. The domain of the PRT video probes can be expected to have high levels of both aleatory and epistemic uncertainty. Addressing these challenges requires examination of the multimodal data along with implementation and evaluation of classification algorithms. This is explored through the use of a new dataset of PRT videos. The relationship between the parent and the clinician is important. The clinician can provide support and help build self-efficacy in addition to providing knowledge and modeling of treatment procedures. Facilitating this relationship along with automated feedback not only provides the opportunity to present expert feedback to the parent, but also allows the clinician to aid in personalizing the classification models. By utilizing a human-in-the-loop framework, clinicians can aid in addressing the uncertainty in the classification models by providing additional labeled samples. This will allow the system to improve classification and provides a person-centered approach to extracting multimodal data from PRT video probes.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Exploring the use of Technology for Assessment and Intensive Treatment of Childhood Apraxia of Speech

    Get PDF
    Given the rapid advances in technology over the past decade, this thesis examines the potential for automatic speech recognition (ASR) technology to expedite the process of objective analysis of speech, particularly for lexical stress patterns in childhood apraxia of speech. This dissertation also investigates the potential for mobile technology to bridge the gap between current service delivery models in Australia and best practice treatment intensity for CAS. To address these two broad aims, this thesis describes three main projects. The first is a systematic literature review summarising the development, implementation and accuracy of automatic speech analysis tools when applied to evaluation and modification of children’s speech production skills. Guided by the results of the systematic review, the second project presents data on the accuracy and clinical utility of a custom-designed lexical stress classification tool, designed as part of a multi-component speech analysis system for a mobile therapy application, Tabby Talks, for use with children with CAS. The third project is a randomised control trial exploring the effect of different types of feedback on response to intervention for children with CAS. The intervention was designed to specifically explore the feasibility and effectiveness of using an app equipped with ASR technology to provide feedback on speech production accuracy during home practice sessions, simulating the common service delivery model in Australia. The thesis concludes with a discussion of future directions for technology-based speech assessment and intensive speech production practice, guidelines for future development of therapy tools that include more game-based practice activities and the contexts in which children can be transferred from predominantly clinician-delivered augmented feedback to ASR-delivered right/wrong feedback and continue to make optimal gains in acquisition and retention of speech production targets

    Diphthong Synthesis using the Three-Dimensional Dynamic Digital Waveguide Mesh

    Get PDF
    The human voice is a complex and nuanced instrument, and despite many years of research, no system is yet capable of producing natural-sounding synthetic speech. This affects intelligibility for some groups of listeners, in applications such as automated announcements and screen readers. Furthermore, those who require a computer to speak - due to surgery or a degenerative disease - are limited to unnatural-sounding voices that lack expressive control and may not match the user's gender, age or accent. It is evident that natural, personalised and controllable synthetic speech systems are required. A three-dimensional digital waveguide model of the vocal tract, based on magnetic resonance imaging data, is proposed here in order to address these issues. The model uses a heterogeneous digital waveguide mesh method to represent the vocal tract airway and surrounding tissues, facilitating dynamic movement and hence speech output. The accuracy of the method is validated by comparison with audio recordings of natural speech, and perceptual tests are performed which confirm that the proposed model sounds significantly more natural than simpler digital waveguide mesh vocal tract models. Control of such a model is also considered, and a proof-of-concept study is presented using a deep neural network to control the parameters of a two-dimensional vocal tract model, resulting in intelligible speech output and paving the way for extension of the control system to the proposed three-dimensional vocal tract model. Future improvements to the system are also discussed in detail. This project considers both the naturalness and control issues associated with synthetic speech and therefore represents a significant step towards improved synthetic speech for use across society

    Systematic review of virtual speech therapists for speech disorders

    No full text
    In this paper, a systematic review of relevant published studies on computer-based speech therapy systems or virtual speech therapists (VSTs) for people with speech disorders is presented. We structured this work based on the PRISMA framework. The advancements in speech technology and the increased number of successful real-world projects in this area point to a thriving market for VSTs in the near future; however, there is no standard roadmap to pinpoint how these systems should be designed, implemented, customized, and evaluated with respect to the various speech disorders. The focus of this systematic review is on articulation and phonological impairments. This systematic review addresses three research questions: what types of articulation and phonological disorders do VSTs address, how effective are virtual speech therapists, and what technological elements have been utilized in VST projects. The reviewed papers were sourced from comprehensive digital libraries, and were published in English between 2004 and 2014. All the selected studies involve computer-based intervention in the form of a VST regarding articulation or phonological impairments, followed by qualitative and/or quantitative assessments. To generate this review, we encountered several challenges. Studies were heterogeneous in terms of disorders, type and frequency of therapy, sample size, level of functionality, etc. Thus, overall conclusions were difficult to draw. Commonly, publications with rigorous study designs did not describe the technical elements used in their VST, and publications that did describe technical elements had poor study designs. Despite this heterogeneity, the selected studies reported the effectiveness of computers as a more engaging type of intervention with more tools to enrich the intervention programs, particularly when it comes to children; however, it was emphasized that virtual therapists should not drive the intervention but must be used as a medium to deliver the intervention planned by speech-language pathologists. Based on the reviewed papers, VSTs are significantly effective in training people with a variety of speech disorders; however, it cannot be claimed that a consensus exists in the superiority of VSTs over speech-language pathologists regarding rehabilitation outcomes. Our review shows that hearing-impaired cases were the most frequently addressed disorder in the reviewed studies. Automatic speech recognition, speech corpus, and speech synthesizers were the most popular technologies used in the VSTs
    corecore