10,477 research outputs found
Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems
Voice Processing Systems (VPSes), now widely deployed, have been made
significantly more accurate through the application of recent advances in
machine learning. However, adversarial machine learning has similarly advanced
and has been used to demonstrate that VPSes are vulnerable to the injection of
hidden commands - audio obscured by noise that is correctly recognized by a VPS
but not by human beings. Such attacks, though, are often highly dependent on
white-box knowledge of a specific machine learning model and limited to
specific microphones and speakers, making their use across different acoustic
hardware platforms (and thus their practicality) limited. In this paper, we
break these dependencies and make hidden command attacks more practical through
model-agnostic (blackbox) attacks, which exploit knowledge of the signal
processing algorithms commonly used by VPSes to generate the data fed into
machine learning systems. Specifically, we exploit the fact that multiple
source audio samples have similar feature vectors when transformed by acoustic
feature extraction algorithms (e.g., FFTs). We develop four classes of
perturbations that create unintelligible audio and test them against 12 machine
learning models, including 7 proprietary models (e.g., Google Speech API, Bing
Speech API, IBM Speech API, Azure Speaker API, etc), and demonstrate successful
attacks against all targets. Moreover, we successfully use our maliciously
generated audio samples in multiple hardware configurations, demonstrating
effectiveness across both models and real systems. In so doing, we demonstrate
that domain-specific knowledge of audio signal processing represents a
practical means of generating successful hidden voice command attacks
Robust Object-Based Watermarking Using SURF Feature Matching and DFT Domain
In this paper we propose a robust object-based watermarking method, in which the watermark is embedded into the middle frequencies band of the Discrete Fourier Transform (DFT) magnitude of the selected object region, altogether with the Speeded Up Robust Feature (SURF) algorithm to allow the correct watermark detection, even if the watermarked image has been distorted. To recognize the selected object region after geometric distortions, during the embedding process the SURF features are estimated and stored in advance to be used during the detection process. In the detection stage, the SURF features of the distorted image are estimated and match them with the stored ones. From the matching result, SURF features are used to compute the Affine-transformation parameters and the object region is recovered. The quality of the watermarked image is measured using the Peak Signal to Noise Ratio (PSNR), Structural Similarity Index (SSIM) and the Visual Information Fidelity (VIF). The experimental results show the proposed method provides robustness against several geometric distortions, signal processing operations and combined distortions. The receiver operating characteristics (ROC) curves also show the desirable detection performance of the proposed method. The comparison with a previously reported methods based on different techniques is also provided
Watermarking Using Decimal Sequences
This paper introduces the use of decimal sequences in a code division
multiple access (CDMA) based watermarking system to hide information for
authentication in black and white images. Matlab version 6.5 was used to
implement the algorithms discussed in this paper. The advantage of using
d-sequences over PN sequences is that one can choose from a variety of prime
numbers which provides a more flexible system.Comment: 8 pages, 9 figure
Database of audio records
Diplomka a prakticky castDiplome with partical part
Human abnormal behavior impact on speaker verification systems
Human behavior plays a major role in improving human-machine communication. The performance must be affected by abnormal behavior as systems are trained using normal utterances. The abnormal behavior is often associated with a change in the human emotional state. Different emotional states cause physiological changes in the human body that affect the vocal tract. Fear, anger, or even happiness we recognize as a deviation from a normal behavior. The whole spectrum of human-machine application is susceptible to behavioral changes. Abnormal behavior is a major factor, especially for security applications such as verification systems. Face, fingerprint, iris, or speaker verification is a group of the most common approaches to biometric authentication today. This paper discusses human normal and abnormal behavior and its impact on the accuracy and effectiveness of automatic speaker verification (ASV). The support vector machines classifier inputs are Mel-frequency cepstral coefficients and their dynamic changes. For this purpose, the Berlin Database of Emotional Speech was used. Research has shown that abnormal behavior has a major impact on the accuracy of verification, where the equal error rate increase to 37 %. This paper also describes a new design and application of the ASV system that is much more immune to the rejection of a target user with abnormal behavior.Web of Science6401274012
Effectiveness in the Realisation of Speaker Authentication
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.An important consideration for the deployment of speaker recognition in authentication applications is the approach to the formation of training and testing utterances . Whilst defining this for a specific scenario is influenced by the associated requirements and conditions, the process can be further guided through the establishment of the relative usefulness of alternative frameworks for composing the training and testing material. In this regard, the present paper provides an analysis of the effects, on the speaker recognition accuracy, of various bases for the formation of the training and testing data. The experimental investigations are conducted based on the use of digit utterances taken from the XM2VTS database. The paper presents a detailed description of the individual approaches considered and discusses the experimental results obtained in different cases
- …