94 research outputs found

    The BioVisualSpeech corpus of words with sibilants for speech therapy games development

    Get PDF
    Abstract: In order to develop computer tools for speech therapy that reliably classify speech productions, there is a need for speech production corpora that characterize the target population in terms of age, gender, and native language. Apart from including correct speech productions, in order to characterize the target population, the corpora should also include samples from people with speech sound disorders. In addition, the annotation of the data should include information on the correctness of the speech productions. Following these criteria, we collected a corpus that can be used to develop computer tools for speech and language therapy of Portuguese children with sigmatism. The proposed corpus contains European Portuguese children’s word productions in which the words have sibilant consonants. The corpus has productions from 356 children from 5 to 9 years of age. Some important characteristics of this corpus, that are relevant to speech and language therapy and computer science research, are that (1) the corpus includes data from children with speech sound disorders; and (2) the productions were annotated according to the criteria of speech and language pathologists, and have information about the speech production errors. These are relevant features for the development and assessment of speech processing tools for speech therapy of Portuguese children. In addition, as an illustration on how to use the corpus, we present three speech therapy games that use a convolutional neural network sibilants classifier trained with data from this corpus and a word recognition module trained on additional children data and calibrated and evaluated with the collected corpus.info:eu-repo/semantics/publishedVersio

    Sibilant consonants classification with deep neural networks

    Get PDF
    Abstract. Many children su ering from speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game that is controlled by the children's voices in real time and that allows children to practice the European Portuguese sibilant consonants. For this, the game uses a sibilant consonant classi er. Since the game does not require any type of adult supervision, children can practice the production of these sounds more often, which may lead to faster improvements of their speech. Recently, the use of deep neural networks has given considerable improvements in classi cation for a variety of use cases, from image classication to speech and language processing. Here we propose to use deep convolutional neural networks to classify sibilant phonemes of European Portuguese in our serious game for speech and language therapy. We compared the performance of several diferent arti cial neural networks that used Mel frequency cepstral coefcients or log Mel lterbanks. Our best deep learning model achieves classi cation scores of 95:48% using a 2D convolutional model with log Mel lterbanks as input features.info:eu-repo/semantics/publishedVersio

    The Effects of a Digital Articulatory Game on the Ability to Perceive Speech-Sound Contrasts in Another Language

    Get PDF
    Digital and mobile devices enable easy access to applications for the learning of foreign languages. However, experimental studies on the effectiveness of these applications are scarce. Moreover, it is not understood whether the effects of speech and language training generalize to features that are not trained. To this end, we conducted a four-week intervention that focused on articulatory training and learning of English words in 6-7-year-old Finnish-speaking children who used a digital language-learning game app Pop2talk. An essential part of the app is automatic speech recognition that enables assessing children's utterances and giving instant feedback to the players. The generalization of the effects of such training in English were explored by using discrimination tasks before and after training (or the same period of time in a control group). The stimuli of the discrimination tasks represented phonetic contrasts from two non-trained languages, including Russian sibilant consonants and Mandarin tones. We found some improvement with the Russian sibilant contrast in the gamers but it was not statistically significant. No improvement was observed for the tone contrast for the gaming group. A control group with no training showed no improvement in either contrast. The pattern of results suggests that the game may have improved the perception of non-trained speech sounds in some but not all individuals, yet the effects of motivation and attention span on their performance could not be excluded with the current methods. Children's perceptual skills were linked to their word learning in the control group but not in the gaming group where recurrent exposure enabled learning also for children with poorer perceptual skills. Together, the results demonstrate beneficial effects of learning via a digital application, yet raise a need for further research of individual differences in learning.Peer reviewe

    BioVisualSpeech: Deployment of an Interactive Platform for Speech Therapy Sessions With Children

    Get PDF
    Sigmatism is a speech sound disorder (SSD) that prevents people from correctly pro- nouncing sibilant consonant sounds ([Z], [z], [S] and [s]). If left untreated, it can negatively impact children’s ability to communicate and socialize. Parents are advised to seek speech therapy for their kids whenever they are not reaching the milestones that are expected of their age, and while the exercises employed in speech therapy sessions are vital for the treatment of these disorders, they can also become repetitive. BioVisualSpeech is a research project that explores ways to provide biofeedback in speech therapy sessions through the usage of serious games. An example of this is the BioVisualSpeech Therapy Support Platform, an interactive tool that contains many types of games in one place, and which children can play in therapy sessions and at home by using the computer’s microphone to capture their voices. However, because the platform was developed in an academic context, it was important for us to adapt this system to the context of real life in collaboration with speech-language pathologists (SLPs). To achieve this, we set the goal of deploying the platform to SLPs’ computers. For that we first reengineered the system to turn it into an in-session focused application, instead of a system where children can practice with SLPs and at home. In addition, we also integrated Windows Speech Recognition into the platform, made the system easier to install and capable of collecting data from players, such as voice productions that could be used in the future to train better classification models, and other objective parameters concerning game performance. Our deployment with SLPs was accompanied by the questionnaires, documentation and data collection protocol needed to proceed with: firstly, the further validation of the platform along with two of its games and, secondly, the design of a user study focused on gathering voice productions from children. In the end, not only did we get promising results regarding the validation of the platform, but SLPs also got the opportunity to own a system that can continue to be used, and distributed by future researchers, even after the termination of this project.O sigmatismo é uma perturbação da fala que impede quem sofre deste de pronunciar corretamente as consoantes sibilantes ([Z], [z], [S] and [s]). Se deixado por tratar, este pode ter um impacto negativo na habilidade das crianças de comunicar e socializar. Pais destas crianças são aconselhados a procurar consultas de terapia da fala para os seus filhos, e enquanto que os exercícios utilizados durante as sessões de terapia da fala são vitais para o tratamento de perturbações, estes também correm o risco de se tornarem repetitivos. BioVisualSpeech é um projeto de investigação que explora formas de fornecer bio- feedback em sessões de terapia da fala através de jogos sérios. Um exemplo destes é a Plataforma de Apoio à Terapia da Fala da BioVisualSpeech, um sistema que contém vários tipos de jogos que as crianças podem jogar em sessões de terapia e em casa, utilizando para isso o microfone do computador para capturarem as suas vozes. Contudo, visto que a plataforma foi desenvolvida num contexto académico, era importante adaptá-la ao contexto do mundo real em colaboração com terapeutas da fala e da linguagem (TFLs). Assim, o objetivo desta dissertação foi implantar a plataforma para os computadores de TFLs. Para isso foi primeiro preciso mudar o foco da plataforma de modo a se tornar numa aplicação de apoio às sessões de terapia, exclusivamente. Para além disto, também se integrou o Sistema de Reconhecimento de Voz do Windows na plataforma, tornou-se o sistema mais fácil de instalar e capaz de recolher dados dos jogadores, como produções de voz que podem no futuro ser utilizadas para treinar melhores classificadores de fala, e outros parâmetros objetivos sobre os jogos. A implantação com TFLs foi acompanhada pelos questionários, documentação e protocolo necessários para proceder com: primeiro, a validação da plataforma e dois dos seus jogos e, segundo, o desenho de um estudo focado na recolha de produções de voz de crianças. No final, não só foram obtidos resultados promissores no que toca à validação da plataforma, mas os TFLs também tiveram a oportunidade de ficar com um sistema que pode continuar a ser utilizado mesmo depois deste projeto acabar

    Rafigh: A Living Media System for Motivating Target Application Use for Children

    Get PDF
    Digital living media systems combine living media such as plants, animals and fungi with computational components. In this dissertation, I respond to the question of how can digital living media systems better motivate children to use target applications (i.e., learning and/or therapeutic applications)? To address this question, I employed a participatory design approach where I incorporated input from children, parents, speech language pathologists and teachers into the design of a new system. Rafigh is a digital embedded system that uses the growth of a living mushrooms colony to provide positive reinforcements to children when they conduct target activities. The growth of the mushrooms is affected by the amount of water administered to them, which in turn corresponds to the time children spend on target applications. I used an iterative design process to develop and evaluate three Rafigh prototypes. The evaluations showed that the system must be robust, customizable, and should include compelling engagement mechanisms to keep the children interested. I evaluated Rafigh using two case studies conducted in participants homes. In each case study, two siblings and their parent interacted with Rafigh over two weeks and the parents identified a series of target applications that Rafigh should motivate the children to use. The study showed that Rafigh motivated the children to spend significantly more time on target applications during the intervention phase and that it successfully engaged one out of two child participants in each case study who showed signs of responsibility, empathy and curiosity towards the living media. The study showed that the majority of participants described the relationship between using target applications and mushrooms growth correctly. Further, Rafigh encouraged more communication and collaboration between the participants. Rafighs slow responsivity did not impact the engagement of one out of two child participants in each case study and might even have contributed to their investment in the project. Finally, Rafighs presence as an ambient physical object allowed users to interact with it freely and as part of their home environment

    The role of correct pronunciation and intonation in teaching Italian as a foreign language through blended learning: a guide to the core sounds of the Italian language for English native speakers

    Get PDF
    Although many scholars have emphasised the value of pronunciation and intonation training as fundamental in FL (foreign language) teaching, it seems that the practise of these skills is still neglected by practitioners. Segmentals and suprasegmentals are often absent in Italian FL courses based on the claim that the phonology of Italian is rather easy, and students are expected to pick it up along the way. Proceeding from the recognition that a difference exists between the theory and the practice of integrating segmentals and suprasegmentals training in FL courses, this qualitative study investigates learners’ views about pronunciation and intonation’s role in learning foreign languages, in particular Italian FL, and the use of new specific materials and technological tools deployed for the teaching of these phonological skills online. Findings confirm the potential of materials devised appositively for pronunciation and intonation acquisition, and the benefits of using specific online voice recording tools to promote the development of phonological skills and boost students’ phonological and cultural awareness; however, they reveal that such potential often remains unrealised in the FL classroom. The role of teachers in terms of beliefs about, knowledge of and approach to teaching pronunciation and intonation in FL courses emerges as crucial. Findings also highlight the need of a deeper understanding of how pronunciation and intonation training can positively affect the students’ learning outcomes and how these skills should be systematically and appropriately addressed to in the FL class

    Paralinguistic vocal control of interactive media: how untapped elements of voice might enhance the role of non-speech voice input in the user's experience of multimedia.

    Get PDF
    Much interactive media development, especially commercial development, implies the dominance of the visual modality, with sound as a limited supporting channel. The development of multimedia technologies such as augmented reality and virtual reality has further revealed a distinct partiality to visual media. Sound, however, and particularly voice, have many aspects which have yet to be adequately investigated. Exploration of these aspects may show that sound can, in some respects, be superior to graphics in creating immersive and expressive interactive experiences. With this in mind, this thesis investigates the use of non-speech voice characteristics as a complementary input mechanism in controlling multimedia applications. It presents a number of projects that employ the paralinguistic elements of voice as input to interactive media including both screen-based and physical systems. These projects are used as a means of exploring the factors that seem likely to affect users’ preferences and interaction patterns during non-speech voice control. This exploration forms the basis for an examination of potential roles for paralinguistic voice input. The research includes the conceptual and practical development of the projects and a set of evaluative studies. The work submitted for Ph.D. comprises practical projects (50 percent) and a written dissertation (50 percent). The thesis aims to advance understanding of how voice can be used both on its own and in combination with other input mechanisms in controlling multimedia applications. It offers a step forward in the attempts to integrate the paralinguistic components of voice as a complementary input mode to speech input applications in order to create a synergistic combination that might let the strengths of each mode overcome the weaknesses of the other

    Home-Based Activities for Children with Speech Sound Disorders:Requirements for a Tangible User Interface for Internet of Things Artefacts

    Get PDF
    This paper presents the state of the art regarding the use of tangible user interfaces for internet of artefacts (IoA) targeting health applications, with a focus on speech and language therapy and related areas, targeting home-based interventions, including data security and privacy issues. Results from a systematic literature review, focus group, and a nationwide questionnaire have been used to determine the system requirements for an artefact prototype to be developed. The aim of this study was to understand what is the usual practice of clinicians and to contribute to a better intervention or post-intervention approach for children with Speech Sound Disorders (SSD). The literature review revealed that some studies proposed technological solutions while others used a social approach and/or gamified activities. We could conclude that more research is needed and that a unified method or framework to address SSD intervention or post-intervention tools is lacking. Clinicians need more and better tools to be able to quantify and qualitatively assess the activities developed at home

    A Sound Approach to Language Matters: In Honor of Ocke-Schwen Bohn

    Get PDF
    The contributions in this Festschrift were written by Ocke’s current and former PhD-students, colleagues and research collaborators. The Festschrift is divided into six sections, moving from the smallest building blocks of language, through gradually expanding objects of linguistic inquiry to the highest levels of description - all of which have formed a part of Ocke’s career, in connection with his teaching and/or his academic productions: “Segments”, “Perception of Accent”, “Between Sounds and Graphemes”, “Prosody”, “Morphology and Syntax” and “Second Language Acquisition”. Each one of these illustrates a sound approach to language matters

    The Role of Phonics in Teaching English Pronunciation English as a Foreign Language Students

    Get PDF
    This thesis proposes the use of a modified phonics program to teach students basic rules that will help them to translate graphemes to phonemes in both words they are familiar with and words they are not. It is a common misconception that English has a highly irregular or irrational orthography. Quite to the contrary, English, as a morphophonenic language, has a highly regular orthography governed by systematic rules and spelling patterns that correspond to phonemes in speech. We argue that a knowledge of these rules give students the necessary tools to move from grapheme to phoneme. This also increases their confidence, develops their metacognitive awareness and produces autonomous learners whose pronunciation and communication will improve because of knowledge of how English works and relates writing to speaking
    • …
    corecore