108 research outputs found
Playing with NeMo for building an automatic speech recogniser for Italian
This paper presents work in progress for the creation of a Large Vocabulary Automatic Speech Recogniser for Italian using NVIDIA NeMo. Thanks to this package, we were able to build a reliable recogniser for adults' speech by fine tuning the English model provided by NVIDIA and rescoring it with powerful neural language models, obtaining very good performances. The lack of a standard, reliable and publicy available baseline for Italian motivated this work
Automatic recognition of children’s read speech for stuttering application
Stuttering is a common speech disfluency that may persist into adulthood if not treated in its early stages. Techniques from spoken language understanding may be applied to provide auto-mated diagnoses of stuttering from voice recordings; however,there are several difficulties, including the lack of training data involving young children and the high dimensionality of these data. This study investigates how automatic speech recognition(ASR) could help clinicians by providing a tool that automatically recognises stuttering events and provides a useful written transcription of what was said. In addition, to enhance the performance of ASR and to alleviate the lack of stuttering data, this study examines the effect of augmenting the language model with artificially generated data. The performance of the ASR tool with and without language model augmentation is com-pared. Following language model augmentation, the ASR tool’s performance improved recall from 38% to 62.2% and precision from 56.58% to 71%. When mis-recognised events are more coarsely classified as stuttering/ non-stuttering events, the performance improves up to 73% in recall and 84% in precision.Although the obtained results are not perfect, they map to fairly robust stutter/ non-stutter decision boundaries
A proposed framework of an interactive semi-virtual environment for enhanced education of children with autism spectrum disorders
Education of people with special needs has recently been considered as a key element in the field of medical education. Recent development in the area of information and communication technologies may enable development of collaborative interactive environments which facilitate early stage education and provide specialists with robust tools indicating the person's autism spectrum disorder level. Towards the goal of establishing an enhanced learning environment for children with autism this paper attempts to provide a framework of a semi-controlled real-world environment used for the daily education of an autistic person according to the scenarios selected by the specialists. The proposed framework employs both real-world objects and virtual environments equipped with humanoids able to provide emotional feedback and to demonstrate empathy. Potential examples and usage scenarios for such environments are also described
Using affective avatars and rich multimedia content for education of children with autism
Autism is a communication disorder that mandates early and
continuous educational interventions on various levels like the everyday social, communication and reasoning skills. Computer-aided education has recently been considered as a likely intervention method for such cases, and therefore different systems have been proposed and developed worldwide. In more recent years, affective computing applications for the aforementioned interventions have also been proposed to shed light on this problem.
In this paper, we examine the technological and educational needs of affective interventions for autistic persons. Enabling affective technologies are visited and a number of possible exploitation scenarios are illustrated. Emphasis is placed in covering the continuous and long term needs of autistic persons by unobtrusive and ubiquitous technologies with the engagement of an affective speaking avatar. A personalised prototype system facilitating these scenarios is described. In addition the feedback from educators for autistic persons is provided for the system in terms of its
usefulness, efficiency and the envisaged reaction of the autistic persons, collected by means of an anonymous questionnaire. Results illustrate the clear potential of this effort in facilitating a very promising autism intervention
Improvement of automatic speech recognition skills of linguistics students through using ukrainian-english and ukrainian-german subtitles in publicistic movies
The increased world's attention to foreign language studies facilitates the development and improvement of its study system in higher education institutions. Such a system takes into account and promptly responds to the demands of today's multicultural society. All should start with the transformation and modernization of the higher education system. This system includes the introduction of innovative technologies in the study of English and German, which should be focused on the modern demands of the world labor market. All this has determined the relevance of the research. This article aims to establish ways for students to gain automatic recognition skills through subtitling Ukrainian-English and Ukrainian-German publicistic movies and series. The first assessment of new language audio and video corpus was developed at the Admiral Makarov National University of Shipbuilding, using an automatic subtitling mechanism to improve linguistics students' recognition and understanding of oral speech. The skills and abilities that improved during the work with the educational movie corpus have been identified
Automatic Speech Recognition for Speech Assessment of Persian Preschool Children
Preschool evaluation is crucial because it gives teachers and parents
influential knowledge about children's growth and development. The COVID-19
pandemic has highlighted the necessity of online assessment for preschool
children. One of the areas that should be tested is their ability to speak.
Employing an Automatic Speech Recognition(ASR) system is useless since they are
pre-trained on voices that are different from children's voices in terms of
frequency and amplitude. We constructed an ASR for our cognitive test system to
solve this issue using the Wav2Vec 2.0 model with a new pre-training objective
called Random Frequency Pitch(RFP). In addition, we used our new dataset to
fine-tune our model for Meaningless Words(MW) and Rapid Automatic Naming(RAN)
tests. Our new approach reaches a Word Error Rate(WER) of 6.45 on the Persian
section of the CommonVoice dataset. Furthermore, our novel methodology produces
positive outcomes in zero- and few-shot scenarios.Comment: 8 pages, 5 figures, 4 tables, 1 algorith
A Speaker Diarization System for Studying Peer-Led Team Learning Groups
Peer-led team learning (PLTL) is a model for teaching STEM courses where
small student groups meet periodically to collaboratively discuss coursework.
Automatic analysis of PLTL sessions would help education researchers to get
insight into how learning outcomes are impacted by individual participation,
group behavior, team dynamics, etc.. Towards this, speech and language
technology can help, and speaker diarization technology will lay the foundation
for analysis. In this study, a new corpus is established called CRSS-PLTL, that
contains speech data from 5 PLTL teams over a semester (10 sessions per team
with 5-to-8 participants in each team). In CRSS-PLTL, every participant wears a
LENA device (portable audio recorder) that provides multiple audio recordings
of the event. Our proposed solution is unsupervised and contains a new online
speaker change detection algorithm, termed G 3 algorithm in conjunction with
Hausdorff-distance based clustering to provide improved detection accuracy.
Additionally, we also exploit cross channel information to refine our
diarization hypothesis. The proposed system provides good improvements in
diarization error rate (DER) over the baseline LIUM system. We also present
higher level analysis such as the number of conversational turns taken in a
session, and speaking-time duration (participation) for each speaker.Comment: 5 Pages, 2 Figures, 2 Tables, Proceedings of INTERSPEECH 2016, San
Francisco, US
- …