108 research outputs found

    Playing with NeMo for building an automatic speech recogniser for Italian

    Get PDF
    This paper presents work in progress for the creation of a Large Vocabulary Automatic Speech Recogniser for Italian using NVIDIA NeMo. Thanks to this package, we were able to build a reliable recogniser for adults' speech by fine tuning the English model provided by NVIDIA and rescoring it with powerful neural language models, obtaining very good performances. The lack of a standard, reliable and publicy available baseline for Italian motivated this work

    Automatic recognition of children’s read speech for stuttering application

    Get PDF
    Stuttering is a common speech disfluency that may persist into adulthood if not treated in its early stages. Techniques from spoken language understanding may be applied to provide auto-mated diagnoses of stuttering from voice recordings; however,there are several difficulties, including the lack of training data involving young children and the high dimensionality of these data. This study investigates how automatic speech recognition(ASR) could help clinicians by providing a tool that automatically recognises stuttering events and provides a useful written transcription of what was said. In addition, to enhance the performance of ASR and to alleviate the lack of stuttering data, this study examines the effect of augmenting the language model with artificially generated data. The performance of the ASR tool with and without language model augmentation is com-pared. Following language model augmentation, the ASR tool’s performance improved recall from 38% to 62.2% and precision from 56.58% to 71%. When mis-recognised events are more coarsely classified as stuttering/ non-stuttering events, the performance improves up to 73% in recall and 84% in precision.Although the obtained results are not perfect, they map to fairly robust stutter/ non-stutter decision boundaries

    A proposed framework of an interactive semi-virtual environment for enhanced education of children with autism spectrum disorders

    Get PDF
    Education of people with special needs has recently been considered as a key element in the field of medical education. Recent development in the area of information and communication technologies may enable development of collaborative interactive environments which facilitate early stage education and provide specialists with robust tools indicating the person's autism spectrum disorder level. Towards the goal of establishing an enhanced learning environment for children with autism this paper attempts to provide a framework of a semi-controlled real-world environment used for the daily education of an autistic person according to the scenarios selected by the specialists. The proposed framework employs both real-world objects and virtual environments equipped with humanoids able to provide emotional feedback and to demonstrate empathy. Potential examples and usage scenarios for such environments are also described

    Using affective avatars and rich multimedia content for education of children with autism

    Get PDF
    Autism is a communication disorder that mandates early and continuous educational interventions on various levels like the everyday social, communication and reasoning skills. Computer-aided education has recently been considered as a likely intervention method for such cases, and therefore different systems have been proposed and developed worldwide. In more recent years, affective computing applications for the aforementioned interventions have also been proposed to shed light on this problem. In this paper, we examine the technological and educational needs of affective interventions for autistic persons. Enabling affective technologies are visited and a number of possible exploitation scenarios are illustrated. Emphasis is placed in covering the continuous and long term needs of autistic persons by unobtrusive and ubiquitous technologies with the engagement of an affective speaking avatar. A personalised prototype system facilitating these scenarios is described. In addition the feedback from educators for autistic persons is provided for the system in terms of its usefulness, efficiency and the envisaged reaction of the autistic persons, collected by means of an anonymous questionnaire. Results illustrate the clear potential of this effort in facilitating a very promising autism intervention

    Improvement of automatic speech recognition skills of linguistics students through using ukrainian-english and ukrainian-german subtitles in publicistic movies

    Get PDF
    The increased world's attention to foreign language studies facilitates the development and improvement of its study system in higher education institutions. Such a system takes into account and promptly responds to the demands of today's multicultural society. All should start with the transformation and modernization of the higher education system. This system includes the introduction of innovative technologies in the study of English and German, which should be focused on the modern demands of the world labor market. All this has determined the relevance of the research. This article aims to establish ways for students to gain automatic recognition skills through subtitling Ukrainian-English and Ukrainian-German publicistic movies and series. The first assessment of new language audio and video corpus was developed at the Admiral Makarov National University of Shipbuilding, using an automatic subtitling mechanism to improve linguistics students' recognition and understanding of oral speech. The skills and abilities that improved during the work with the educational movie corpus have been identified

    Automatic Speech Recognition for Speech Assessment of Persian Preschool Children

    Full text link
    Preschool evaluation is crucial because it gives teachers and parents influential knowledge about children's growth and development. The COVID-19 pandemic has highlighted the necessity of online assessment for preschool children. One of the areas that should be tested is their ability to speak. Employing an Automatic Speech Recognition(ASR) system is useless since they are pre-trained on voices that are different from children's voices in terms of frequency and amplitude. We constructed an ASR for our cognitive test system to solve this issue using the Wav2Vec 2.0 model with a new pre-training objective called Random Frequency Pitch(RFP). In addition, we used our new dataset to fine-tune our model for Meaningless Words(MW) and Rapid Automatic Naming(RAN) tests. Our new approach reaches a Word Error Rate(WER) of 6.45 on the Persian section of the CommonVoice dataset. Furthermore, our novel methodology produces positive outcomes in zero- and few-shot scenarios.Comment: 8 pages, 5 figures, 4 tables, 1 algorith

    A Speaker Diarization System for Studying Peer-Led Team Learning Groups

    Full text link
    Peer-led team learning (PLTL) is a model for teaching STEM courses where small student groups meet periodically to collaboratively discuss coursework. Automatic analysis of PLTL sessions would help education researchers to get insight into how learning outcomes are impacted by individual participation, group behavior, team dynamics, etc.. Towards this, speech and language technology can help, and speaker diarization technology will lay the foundation for analysis. In this study, a new corpus is established called CRSS-PLTL, that contains speech data from 5 PLTL teams over a semester (10 sessions per team with 5-to-8 participants in each team). In CRSS-PLTL, every participant wears a LENA device (portable audio recorder) that provides multiple audio recordings of the event. Our proposed solution is unsupervised and contains a new online speaker change detection algorithm, termed G 3 algorithm in conjunction with Hausdorff-distance based clustering to provide improved detection accuracy. Additionally, we also exploit cross channel information to refine our diarization hypothesis. The proposed system provides good improvements in diarization error rate (DER) over the baseline LIUM system. We also present higher level analysis such as the number of conversational turns taken in a session, and speaking-time duration (participation) for each speaker.Comment: 5 Pages, 2 Figures, 2 Tables, Proceedings of INTERSPEECH 2016, San Francisco, US
    • …
    corecore