10,477 research outputs found

    Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems

    Full text link
    Voice Processing Systems (VPSes), now widely deployed, have been made significantly more accurate through the application of recent advances in machine learning. However, adversarial machine learning has similarly advanced and has been used to demonstrate that VPSes are vulnerable to the injection of hidden commands - audio obscured by noise that is correctly recognized by a VPS but not by human beings. Such attacks, though, are often highly dependent on white-box knowledge of a specific machine learning model and limited to specific microphones and speakers, making their use across different acoustic hardware platforms (and thus their practicality) limited. In this paper, we break these dependencies and make hidden command attacks more practical through model-agnostic (blackbox) attacks, which exploit knowledge of the signal processing algorithms commonly used by VPSes to generate the data fed into machine learning systems. Specifically, we exploit the fact that multiple source audio samples have similar feature vectors when transformed by acoustic feature extraction algorithms (e.g., FFTs). We develop four classes of perturbations that create unintelligible audio and test them against 12 machine learning models, including 7 proprietary models (e.g., Google Speech API, Bing Speech API, IBM Speech API, Azure Speaker API, etc), and demonstrate successful attacks against all targets. Moreover, we successfully use our maliciously generated audio samples in multiple hardware configurations, demonstrating effectiveness across both models and real systems. In so doing, we demonstrate that domain-specific knowledge of audio signal processing represents a practical means of generating successful hidden voice command attacks

    Robust Object-Based Watermarking Using SURF Feature Matching and DFT Domain

    Get PDF
    In this paper we propose a robust object-based watermarking method, in which the watermark is embedded into the middle frequencies band of the Discrete Fourier Transform (DFT) magnitude of the selected object region, altogether with the Speeded Up Robust Feature (SURF) algorithm to allow the correct watermark detection, even if the watermarked image has been distorted. To recognize the selected object region after geometric distortions, during the embedding process the SURF features are estimated and stored in advance to be used during the detection process. In the detection stage, the SURF features of the distorted image are estimated and match them with the stored ones. From the matching result, SURF features are used to compute the Affine-transformation parameters and the object region is recovered. The quality of the watermarked image is measured using the Peak Signal to Noise Ratio (PSNR), Structural Similarity Index (SSIM) and the Visual Information Fidelity (VIF). The experimental results show the proposed method provides robustness against several geometric distortions, signal processing operations and combined distortions. The receiver operating characteristics (ROC) curves also show the desirable detection performance of the proposed method. The comparison with a previously reported methods based on different techniques is also provided

    Watermarking Using Decimal Sequences

    Full text link
    This paper introduces the use of decimal sequences in a code division multiple access (CDMA) based watermarking system to hide information for authentication in black and white images. Matlab version 6.5 was used to implement the algorithms discussed in this paper. The advantage of using d-sequences over PN sequences is that one can choose from a variety of prime numbers which provides a more flexible system.Comment: 8 pages, 9 figure

    Database of audio records

    Get PDF
    Diplomka a prakticky castDiplome with partical part

    Human abnormal behavior impact on speaker verification systems

    Get PDF
    Human behavior plays a major role in improving human-machine communication. The performance must be affected by abnormal behavior as systems are trained using normal utterances. The abnormal behavior is often associated with a change in the human emotional state. Different emotional states cause physiological changes in the human body that affect the vocal tract. Fear, anger, or even happiness we recognize as a deviation from a normal behavior. The whole spectrum of human-machine application is susceptible to behavioral changes. Abnormal behavior is a major factor, especially for security applications such as verification systems. Face, fingerprint, iris, or speaker verification is a group of the most common approaches to biometric authentication today. This paper discusses human normal and abnormal behavior and its impact on the accuracy and effectiveness of automatic speaker verification (ASV). The support vector machines classifier inputs are Mel-frequency cepstral coefficients and their dynamic changes. For this purpose, the Berlin Database of Emotional Speech was used. Research has shown that abnormal behavior has a major impact on the accuracy of verification, where the equal error rate increase to 37 %. This paper also describes a new design and application of the ASV system that is much more immune to the rejection of a target user with abnormal behavior.Web of Science6401274012

    Effectiveness in the Realisation of Speaker Authentication

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.An important consideration for the deployment of speaker recognition in authentication applications is the approach to the formation of training and testing utterances . Whilst defining this for a specific scenario is influenced by the associated requirements and conditions, the process can be further guided through the establishment of the relative usefulness of alternative frameworks for composing the training and testing material. In this regard, the present paper provides an analysis of the effects, on the speaker recognition accuracy, of various bases for the formation of the training and testing data. The experimental investigations are conducted based on the use of digit utterances taken from the XM2VTS database. The paper presents a detailed description of the individual approaches considered and discusses the experimental results obtained in different cases
    corecore