Search CORE

108 research outputs found

Playing with NeMo for building an automatic speech recogniser for Italian

Author: Tamburini F.
Publication venue: place:Aachen
Publication date: 01/01/2021
Field of study

This paper presents work in progress for the creation of a Large Vocabulary Automatic Speech Recogniser for Italian using NVIDIA NeMo. Thanks to this package, we were able to build a reliable recogniser for adults' speech by fine tuning the English model provided by NVIDIA and rescoring it with powerful neural language models, obtaining very good performances. The lack of a standard, reliable and publicy available baseline for Italian motivated this work

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Automatic recognition of children’s read speech for stuttering application

Author: Alharbi S.
Brumfitt S.
Green P.D.
Simons A.J.H.
Publication venue: 'International Speech Communication Association'
Publication date: 13/11/2017
Field of study

Stuttering is a common speech disfluency that may persist into adulthood if not treated in its early stages. Techniques from spoken language understanding may be applied to provide auto-mated diagnoses of stuttering from voice recordings; however,there are several difficulties, including the lack of training data involving young children and the high dimensionality of these data. This study investigates how automatic speech recognition(ASR) could help clinicians by providing a tool that automatically recognises stuttering events and provides a useful written transcription of what was said. In addition, to enhance the performance of ASR and to alleviate the lack of stuttering data, this study examines the effect of augmenting the language model with artificially generated data. The performance of the ASR tool with and without language model augmentation is com-pared. Following language model augmentation, the ASR tool’s performance improved recall from 38% to 62.2% and precision from 56.58% to 71%. When mis-recognised events are more coarsely classified as stuttering/ non-stuttering events, the performance improves up to 73% in recall and 84% in precision.Although the obtained results are not perfect, they map to fairly robust stutter/ non-stutter decision boundaries

Crossref

White Rose Research Online

A proposed framework of an interactive semi-virtual environment for enhanced education of children with autism spectrum disorders

Author: Bamidis Panagiotis D.
Frantzidis Christos A.
Konstantinidis Evdokimos I.
Luneski Andrej
Pappas Costas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2009
Field of study

Education of people with special needs has recently been considered as a key element in the field of medical education. Recent development in the area of information and communication technologies may enable development of collaborative interactive environments which facilitate early stage education and provide specialists with robust tools indicating the person's autism spectrum disorder level. Towards the goal of establishing an enhanced learning environment for children with autism this paper attempts to provide a framework of a semi-controlled real-world environment used for the daily education of an autistic person according to the scenarios selected by the specialists. The proposed framework employs both real-world objects and virtual environments equipped with humanoids able to provide emotional feedback and to demonstrate empathy. Potential examples and usage scenarios for such environments are also described

Crossref

White Rose Research Online

Using affective avatars and rich multimedia content for education of children with autism

Author: Hitoglou-Antoniadou Magda
Konstantinidis Evdokimos I.
Luneski Andrej
Nikolaiodu Maria
Panagiotis Bamidis D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Autism is a communication disorder that mandates early and continuous educational interventions on various levels like the everyday social, communication and reasoning skills. Computer-aided education has recently been considered as a likely intervention method for such cases, and therefore different systems have been proposed and developed worldwide. In more recent years, affective computing applications for the aforementioned interventions have also been proposed to shed light on this problem. In this paper, we examine the technological and educational needs of affective interventions for autistic persons. Enabling affective technologies are visited and a number of possible exploitation scenarios are illustrated. Emphasis is placed in covering the continuous and long term needs of autistic persons by unobtrusive and ubiquitous technologies with the engagement of an affective speaking avatar. A personalised prototype system facilitating these scenarios is described. In addition the feedback from educators for autistic persons is provided for the system in terms of its usefulness, efficiency and the envisaged reaction of the autistic persons, collected by means of an anonymous questionnaire. Results illustrate the clear potential of this effort in facilitating a very promising autism intervention

Crossref

White Rose Research Online

Improvement of automatic speech recognition skills of linguistics students through using ukrainian-english and ukrainian-german subtitles in publicistic movies

Author: Kaleniuk Svitlana
Proskurin Arkadii
Shamanova Nataliya
Shcherbak Olena
Yeganova Larisa
Publication venue: 'Amazonia Investiga'
Publication date: 04/07/2022
Field of study

The increased world's attention to foreign language studies facilitates the development and improvement of its study system in higher education institutions. Such a system takes into account and promptly responds to the demands of today's multicultural society. All should start with the transformation and modernization of the higher education system. This system includes the introduction of innovative technologies in the study of English and German, which should be focused on the modern demands of the world labor market. All this has determined the relevance of the research. This article aims to establish ways for students to gain automatic recognition skills through subtitling Ukrainian-English and Ukrainian-German publicistic movies and series. The first assessment of new language audio and video corpus was developed at the Admiral Makarov National University of Shipbuilding, using an automatic subtitling mechanism to improve linguistics students' recognition and understanding of oral speech. The skills and abilities that improved during the work with the educational movie corpus have been identified

Revista Amazonia Investiga

Automatic Speech Recognition for Speech Assessment of Persian Preschool Children

Author: Abaskohi Amirhossein
Moradi Hadi
Mortazavi Fatemeh
Publication venue
Publication date: 20/07/2022
Field of study

Preschool evaluation is crucial because it gives teachers and parents influential knowledge about children's growth and development. The COVID-19 pandemic has highlighted the necessity of online assessment for preschool children. One of the areas that should be tested is their ability to speak. Employing an Automatic Speech Recognition(ASR) system is useless since they are pre-trained on voices that are different from children's voices in terms of frequency and amplitude. We constructed an ASR for our cognitive test system to solve this issue using the Wav2Vec 2.0 model with a new pre-training objective called Random Frequency Pitch(RFP). In addition, we used our new dataset to fine-tune our model for Meaningless Words(MW) and Rapid Automatic Naming(RAN) tests. Our new approach reaches a Word Error Rate(WER) of 6.45 on the Persian section of the CommonVoice dataset. Furthermore, our novel methodology produces positive outcomes in zero- and few-shot scenarios.Comment: 8 pages, 5 figures, 4 tables, 1 algorith

arXiv.org e-Print Archive

A Speaker Diarization System for Studying Peer-Led Team Learning Groups

Author: Dubey Harishchandra
Hansen John H. L.
Kaushik Lakshmish
Sangwan Abhijeet
Publication venue
Publication date: 22/06/2016
Field of study

Peer-led team learning (PLTL) is a model for teaching STEM courses where small student groups meet periodically to collaboratively discuss coursework. Automatic analysis of PLTL sessions would help education researchers to get insight into how learning outcomes are impacted by individual participation, group behavior, team dynamics, etc.. Towards this, speech and language technology can help, and speaker diarization technology will lay the foundation for analysis. In this study, a new corpus is established called CRSS-PLTL, that contains speech data from 5 PLTL teams over a semester (10 sessions per team with 5-to-8 participants in each team). In CRSS-PLTL, every participant wears a LENA device (portable audio recorder) that provides multiple audio recordings of the event. Our proposed solution is unsupervised and contains a new online speaker change detection algorithm, termed G 3 algorithm in conjunction with Hausdorff-distance based clustering to provide improved detection accuracy. Additionally, we also exploit cross channel information to refine our diarization hypothesis. The proposed system provides good improvements in diarization error rate (DER) over the baseline LIUM system. We also present higher level analysis such as the number of conversational turns taken in a session, and speaking-time duration (participation) for each speaker.Comment: 5 Pages, 2 Figures, 2 Tables, Proceedings of INTERSPEECH 2016, San Francisco, US

arXiv.org e-Print Archive

Crossref