Search CORE

2,858 research outputs found

Simulating dysarthric speech for training data augmentation in clinical speech applications

Author: Berisha Visar
Jiao Yishan
Liss Julie
Tu Ming
Publication venue
Publication date: 26/04/2018
Field of study

Training machine learning algorithms for speech applications requires large, labeled training data sets. This is problematic for clinical applications where obtaining such data is prohibitively expensive because of privacy concerns or lack of access. As a result, clinical speech applications are typically developed using small data sets with only tens of speakers. In this paper, we propose a method for simulating training data for clinical applications by transforming healthy speech to dysarthric speech using adversarial training. We evaluate the efficacy of our approach using both objective and subjective criteria. We present the transformed samples to five experienced speech-language pathologists (SLPs) and ask them to identify the samples as healthy or dysarthric. The results reveal that the SLPs identify the transformed speech as dysarthric 65% of the time. In a pilot classification experiment, we show that by using the simulated speech samples to balance an existing dataset, the classification accuracy improves by about 10% after data augmentation.Comment: Will appear in Proc. of ICASSP 201

arXiv.org e-Print Archive

Crossref

A critical analysis of self-supervision, or what we can learn from a single image

Author: Asano Yuki M.
Rupprecht Christian
Vedaldi Andrea
Publication venue
Publication date: 01/01/2020
Field of study

We look critically at popular self-supervision techniques for learning deep convolutional neural networks without manual labels. We show that three different and representative methods, BiGAN, RotNet and DeepCluster, can learn the first few layers of a convolutional network from a single image as well as using millions of images and manual labels, provided that strong data augmentation is used. However, for deeper layers the gap with manual supervision cannot be closed even if millions of unlabelled images are used for training. We conclude that: (1) the weights of the early layers of deep networks contain limited information about the statistics of natural images, that (2) such low-level statistics can be learned through self-supervision just as well as through strong supervision, and that (3) the low-level statistics can be captured via synthetic transformations instead of using a large image dataset.Comment: Accepted paper at the International Conference on Learning Representations (ICLR) 202

arXiv.org e-Print Archive

Oxford University Research Archive

FuSSI-Net: Fusion of Spatio-temporal Skeletons for Intention Prediction Network

Author: Andreasson Kajsa
Balakrishnan Rajarathnam
Bjurek Kalle
Davidsson Ebba
Eriksson Colin
Hagman Victor
Li Ying
Muppirisetty L. Srikar
Nunez Carlos
Perez Maria Jesus
Piccoli Francesco
Raj Ria Dass
Roychowdhury Sohini
Sachdeo Moraldeepsingh
Sjoberg Jonas
Tang Matthew
Publication venue
Publication date: 01/01/2020
Field of study

Pedestrian intention recognition is very important to develop robust and safe autonomous driving (AD) and advanced driver assistance systems (ADAS) functionalities for urban driving. In this work, we develop an end-to-end pedestrian intention framework that performs well on day- and night- time scenarios. Our framework relies on objection detection bounding boxes combined with skeletal features of human pose. We study early, late, and combined (early and late) fusion mechanisms to exploit the skeletal features and reduce false positives as well to improve the intention prediction performance. The early fusion mechanism results in AP of 0.89 and precision/recall of 0.79/0.89 for pedestrian intention classification. Furthermore, we propose three new metrics to properly evaluate the pedestrian intention systems. Under these new evaluation metrics for the intention prediction, the proposed end-to-end network offers accurate pedestrian intention up to half a second ahead of the actual risky maneuver.Comment: 5 pages, 6 figures, 5 tables, IEEE Asilomar SS

arXiv.org e-Print Archive

Chalmers Research