Search CORE

4 research outputs found

Hangkorverzió alkalmazása dysarthriás betegek beszédminőségének javítására

Author: Ivaskó Lívia
Terbe Dániel
Tóth László
Publication venue
Publication date: 01/01/2022
Field of study

A dysarthria egy gyűjtőfogalom az artikulációs nehezítettségből eredő beszédzavarra, amelynek hátterében számos betegség állhat. A dysarthriás személyek beszédének minősége, érthetősége leromlik, ami az érintettek szociális kapcsolataira és így életminőségére is rossz hatással lehet. A hangkonverziós technológia fejlődésével felvetődik az ötlet, hogy vajon lehetséges-e ezen betegek hangfelvételeinek minőségét, érthetőségét gépi eszközökkel feljavítani, és beszédkommunikációjukat egy ilyen elven működő eszközzel támogatni. Cikkünkben áttekintjük a (neuronhálós) hangkonverziós algoritmusok fő változatait, majd bemutatjuk a dysarthriás betegek felvételein végzett kísérleteink tapasztalatait, és ezek alapján megvitatjuk az egyes módszerek legfontosabb előnyeit és hátrányait

University of Szeged

Improving Dysarthric Speech Recognition by Enriching Training Datasets

Author: Cullen Sophie
Publication venue: Technological University Dublin
Publication date: 01/01/2022
Field of study

Dysarthria is a motor speech disorder that results from disruptions in the neuro-motor interface and is characterised by poor articulation of phonemes and hyper-nasality and is characteristically different from normal speech. Many modern automatic speech recognition systems focus on a narrow range of speech diversity therefore as a consequence of this they exclude a groups of speakers who deviate in aspects of gender, race, age and speech impairment when building training datasets. This study attempts to develop an automatic speech recognition system that deals with dysarthric speech with limited dysarthric speech data. Speech utterances collected from the TORGO database are used to conduct experiments on a wav2vec2.0 model only trained on the Librispeech 960h dataset to obtain a baseline performance of the word error rate (WER) when recognising dysarthric speech. A version of the Librispeech model fine-tuned on multi-language datasets was tested to see if it would improve accuracy and achieved a top reduction of 24.15% in the WER for one of the male dysarthric speakers in the dataset. Transfer learning with speech recognition models and preprocessing dysarthric speech to improve its intelligibility by using general adversarial networks were limited in their potential due to a lack of dysarthric speech dataset of adequate size to use these technologies. The main conclusion drawn from this study is that a large diverse dysarthric speech dataset comparable to the size of datasets used to train machine learning ASR systems like Librispeech,with different types of speech, scripted and unscripted, is required to improve performance.

Arrow@TUDublin

XVIII. Magyar Számítógépes Nyelvészeti Konferencia

Author
Publication venue: Szegedi Tudományegyetem TTIK Informatikai Intézet
Publication date: 01/01/2022
Field of study

University of Szeged