200,587 research outputs found
Audio feedback design: principles and emerging practice
This paper considers the design of audio feedback as experienced in several faculties of a UK university and as identified in the literature. Several adaptable models are presented, including: 'personal tutor monologue' recorded at the PC by the tutor as part of the marking process; 'personal feedback conversations', recorded by the tutor or student(s) in the lab or studio to capture project discussions or studio 'crits'; 'broadcast feedback' targeted at large groups; 'peer audio feedback', in which students learn as they assess each other's work; 'tutor conversations', a 'common room conversation' approach designed to model critical thinking; and 'personal audio interventions', targeted at individuals to address emerging issues. The methods are introduced and evaluated according to their potential to formatively affect learning. Audio feedback design factors are outlined and practical recommendations are offered. The paper concludes that the use of audio feedback can promote a culture of dialogic engagement
AUTOMATIC DUBBING OF VIDEOS WITH MULTIPLE SPEAKERS
A machine-learning model that automatically converts audio streams from an audio-visual content from a source language to a destination language is described. In response to determining that an audio stream should be translated, a machine-learning-based dubbing model is invoked for a specific destination language. In case of multiple speakers, voice embedding techniques are used to match dubbed audio streams to the corresponding speakers. The sentiment in the original speaker’s voice is preserved by training the model with targeted data set in the destination language
Audio Adversarial Examples: Targeted Attacks on Speech-to-Text
We construct targeted audio adversarial examples on automatic speech
recognition. Given any audio waveform, we can produce another that is over
99.9% similar, but transcribes as any phrase we choose (recognizing up to 50
characters per second of audio). We apply our white-box iterative
optimization-based attack to Mozilla's implementation DeepSpeech end-to-end,
and show it has a 100% success rate. The feasibility of this attack introduce a
new domain to study adversarial examples
Adversarial Black-Box Attacks on Automatic Speech Recognition Systems using Multi-Objective Evolutionary Optimization
Fooling deep neural networks with adversarial input have exposed a
significant vulnerability in the current state-of-the-art systems in multiple
domains. Both black-box and white-box approaches have been used to either
replicate the model itself or to craft examples which cause the model to fail.
In this work, we propose a framework which uses multi-objective evolutionary
optimization to perform both targeted and un-targeted black-box attacks on
Automatic Speech Recognition (ASR) systems. We apply this framework on two ASR
systems: Deepspeech and Kaldi-ASR, which increases the Word Error Rates (WER)
of these systems by upto 980%, indicating the potency of our approach. During
both un-targeted and targeted attacks, the adversarial samples maintain a high
acoustic similarity of 0.98 and 0.97 with the original audio.Comment: Published in Interspeech 201
- …