1,028 research outputs found
Protecting Voice Controlled Systems Using Sound Source Identification Based on Acoustic Cues
Over the last few years, a rapidly increasing number of Internet-of-Things
(IoT) systems that adopt voice as the primary user input have emerged. These
systems have been shown to be vulnerable to various types of voice spoofing
attacks. Existing defense techniques can usually only protect from a specific
type of attack or require an additional authentication step that involves
another device. Such defense strategies are either not strong enough or lower
the usability of the system. Based on the fact that legitimate voice commands
should only come from humans rather than a playback device, we propose a novel
defense strategy that is able to detect the sound source of a voice command
based on its acoustic features. The proposed defense strategy does not require
any information other than the voice command itself and can protect a system
from multiple types of spoofing attacks. Our proof-of-concept experiments
verify the feasibility and effectiveness of this defense strategy.Comment: Proceedings of the 27th International Conference on Computer
Communications and Networks (ICCCN), Hangzhou, China, July-August 2018. arXiv
admin note: text overlap with arXiv:1803.0915
Spoof detection using time-delay shallow neural network and feature switching
Detecting spoofed utterances is a fundamental problem in voice-based
biometrics. Spoofing can be performed either by logical accesses like speech
synthesis, voice conversion or by physical accesses such as replaying the
pre-recorded utterance. Inspired by the state-of-the-art \emph{x}-vector based
speaker verification approach, this paper proposes a time-delay shallow neural
network (TD-SNN) for spoof detection for both logical and physical access. The
novelty of the proposed TD-SNN system vis-a-vis conventional DNN systems is
that it can handle variable length utterances during testing. Performance of
the proposed TD-SNN systems and the baseline Gaussian mixture models (GMMs) is
analyzed on the ASV-spoof-2019 dataset. The performance of the systems is
measured in terms of the minimum normalized tandem detection cost function
(min-t-DCF). When studied with individual features, the TD-SNN system
consistently outperforms the GMM system for physical access. For logical
access, GMM surpasses TD-SNN systems for certain individual features. When
combined with the decision-level feature switching (DLFS) paradigm, the best
TD-SNN system outperforms the best baseline GMM system on evaluation data with
a relative improvement of 48.03\% and 49.47\% for both logical and physical
access, respectively
ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements
International audienceThe now-acknowledged vulnerabilities of automatic speaker verification (ASV) technology to spoofing attacks have spawned interests to develop so-called spoofing countermeasures. By providing common databases, protocols and metrics for their assessment, the ASVspoof initiative was born to spear-head research in this area. The first competitive ASVspoof challenge held in 2015 focused on the assessment of countermeasures to protect ASV technology from voice conversion and speech synthesis spoofing attacks. The second challenge switched focus to the consideration of replay spoofing attacks and countermeasures. This paper describes Version 2.0 of the ASVspoof 2017 database which was released to correct data anomalies detected post-evaluation. The paper contains as-yet unpublished meta-data which describes recording and playback devices and acoustic environments. These support the analysis of replay detection performance and limits. Also described are new results for the official ASVspoof baseline system which is based upon a constant Q cesptral coefficient frontend and a Gaussian mixture model backend. Reported are enhancements to the baseline system in the form of log-energy coefficients and cepstral mean and variance normalisation in addition to an alternative i-vector backend. The best results correspond to a 48% relative reduction in equal error rate when compared to the original baseline system
- …