64,886 research outputs found
DolphinAtack: Inaudible Voice Commands
Speech recognition (SR) systems such as Siri or Google Now have become an
increasingly popular human-computer interaction method, and have turned various
systems into voice controllable systems(VCS). Prior work on attacking VCS shows
that the hidden voice commands that are incomprehensible to people can control
the systems. Hidden voice commands, though hidden, are nonetheless audible. In
this work, we design a completely inaudible attack, DolphinAttack, that
modulates voice commands on ultrasonic carriers (e.g., f > 20 kHz) to achieve
inaudibility. By leveraging the nonlinearity of the microphone circuits, the
modulated low frequency audio commands can be successfully demodulated,
recovered, and more importantly interpreted by the speech recognition systems.
We validate DolphinAttack on popular speech recognition systems, including
Siri, Google Now, Samsung S Voice, Huawei HiVoice, Cortana and Alexa. By
injecting a sequence of inaudible voice commands, we show a few
proof-of-concept attacks, which include activating Siri to initiate a FaceTime
call on iPhone, activating Google Now to switch the phone to the airplane mode,
and even manipulating the navigation system in an Audi automobile. We propose
hardware and software defense solutions. We validate that it is feasible to
detect DolphinAttack by classifying the audios using supported vector machine
(SVM), and suggest to re-design voice controllable systems to be resilient to
inaudible voice command attacks.Comment: 15 pages, 17 figure
Hemispheric processing of memory is affected by sleep
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Sleep is known to affect learning and memory, but the extent to which it influences behavioural processing in the left and right hemispheres of the brain is as yet unknown. We tested two hypotheses about lateralised effects of sleep on recognition memory for words: whether sleep reactivated recent experiences of words promoting access to the long-term store in the left hemisphere (LH), and whether sleep enhanced spreading activation differentially in semantic networks in the hemispheres. In Experiment 1, participants viewed lists of semantically related words, then slept or stayed awake for 12 h before being tested on seen, unseen but related, or unrelated words presented to the left or the right hemisphere. Sleep was found to promote word recognition in the LH, and to spread activation equally within semantic networks in both hemispheres. Experiment 2 ensured that the results were not due to time of day effects influencing cognitive performance
Improving Statistical Language Model Performance with Automatically Generated Word Hierarchies
An automatic word classification system has been designed which processes
word unigram and bigram frequency statistics extracted from a corpus of natural
language utterances. The system implements a binary top-down form of word
clustering which employs an average class mutual information metric. Resulting
classifications are hierarchical, allowing variable class granularity. Words
are represented as structural tags --- unique -bit numbers the most
significant bit-patterns of which incorporate class information. Access to a
structural tag immediately provides access to all classification levels for the
corresponding word. The classification system has successfully revealed some of
the structure of English, from the phonemic to the semantic level. The system
has been compared --- directly and indirectly --- with other recent word
classification systems. Class based interpolated language models have been
constructed to exploit the extra information supplied by the classifications
and some experiments have shown that the new models improve model performance.Comment: 17 Page Paper. Self-extracting PostScript Fil
Recommended from our members
uC: Ubiquitous Collaboration Platform for Multimodal Team Interaction Support
A human-centered computing platform that improves teamwork and transforms the “human- computer interaction experience” for distributed teams is presented. This Ubiquitous Collaboration, or uC (“you see”), platform\u27s objective is to transform distributed teamwork (i.e., work occurring when teams of workers and learners are geographically dispersed and often interacting at different times). It achieves this goal through a multimodal team interaction interface realized through a reconfigurable open architecture. The approach taken is to integrate: (1) an intuitive speech- and video-centric multi-modal interface to augment more conventional methods (e.g., mouse, stylus and touch), (2) an open and reconfigurable architecture supporting information gathering, and (3) a machine intelligent approach to analysis and management of heterogeneous live and stored sensor data to support collaboration. The system will transform how teams of people interact with computers by drawing on both the virtual and physical environment
TASE: Task-Aware Speech Enhancement for Wake-Up Word Detection in Voice Assistants
Wake-up word spotting in noisy environments is a critical task for an excellent user experience with voice assistants. Unwanted activation of the device is often due to the presence of noises coming from background conversations, TVs, or other domestic appliances. In this work, we propose the use of a speech enhancement convolutional autoencoder, coupled with on-device keyword spotting, aimed at improving the trigger word detection in noisy environments. The end-to-end system learns by optimizing a linear combination of losses: a reconstruction-based loss, both at the log-mel spectrogram and at the waveform level, as well as a specific task loss that accounts for the cross-entropy error reported along the keyword spotting detection. We experiment with several neural network classifiers and report that deeply coupling the speech enhancement together with a wake-up word detector, e.g., by jointly training them, significantly improves the performance in the noisiest conditions. Additionally, we introduce a new publicly available speech database recorded for the TelefĂłnica's voice assistant, Aura. The OK Aura Wake-up Word Dataset incorporates rich metadata, such as speaker demographics or room conditions, and comprises hard negative examples that were studiously selected to present different levels of phonetic similarity with respect to the trigger words 'OK Aura'. Keywords: speech enhancement; wake-up word; keyword spotting; deep learning; convolutional neural networ
- …