886 research outputs found
Self-Authentication of Audio Signals by Chirp Coding
This paper discusses a new approach to ‘watermarking’ digital signals using linear frequency modulated or ‘chirp’ coding. The principles underlying this approach are based on the use of a matched filter to provide a reconstruction of a chirped code that is uniquely robust in the case of signals with very low signal-to-noise ratios. Chirp coding for authenticating data is generic in the sense that it can be used for a range of data types and applications (the authentication of speech and audio signals, for example). The theoretical and computational aspects of the matched filter and the properties of a chirp are revisited to provide the essential background to the method. Signal code generating schemes are then addressed and details of the coding and decoding techniques considered. Finally, the paper briefly describes an example application which is available on-line for readers who are interested in using the approach for audio data authentication working with either WAV or MP3 files
A HIGH SPEED VLSI ARCHITECTURE FOR DIGITAL SPEECH WATERMARKING WITH COMPRESSION
The need to provide a copy right protection on digital watermarking to multimedia data like speech, image or video is rapidly increasing with an intensification in the application in these areas. Digital watermarking has received a lot of attention in the past few years. A hardware system based solely on DSP processors are fast but may require more area, cost or power if the target application requires a large amount of parallel processing. An FPGA co-processor can provide as many as 550 parallel multiply and accumulate operations on a single device, but FPGAs excel at processing large amounts of data in parallel, as they are not optimized as processors for tasks such as periodic coefficient updates, decision- making control tasks. Combination of both the FPGA and DSP processor delivers an attractive solution for a wide range of applications. A hardware implementation of digital speech watermarking combined with speech compression, encryption on heterogeneous platform is made in this paper. It is observed that the proposed architecture is able to attain high speed while utilizing optimal resources in terms of area
RADIC Voice Authentication: Replay Attack Detection using Image Classification for Voice Authentication Systems
Systems like Google Home, Alexa, and Siri that use voice-based authentication to verify their users’ identities are vulnerable to voice replay attacks. These attacks gain unauthorized access to voice-controlled devices or systems by replaying recordings of passphrases and voice commands. This shows the necessity to develop more resilient voice-based authentication systems that can detect voice replay attacks.
This thesis implements a system that detects voice-based replay attacks by using deep learning and image classification of voice spectrograms to differentiate between live and recorded speech. Tests of this system indicate that the approach represents a promising direction for detecting voice-based replay attacks
Recommended from our members
Evaluation and analysis of hybrid intelligent pattern recognition techniques for speaker identification
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The rapid momentum of the technology progress in the recent years has led to a tremendous rise in the use of biometric authentication systems. The objective of this research is to investigate the problem
of identifying a speaker from its voice regardless of the content (i.e.
text-independent), and to design efficient methods of combining face and voice in producing a robust authentication system.
A novel approach towards speaker identification is developed using
wavelet analysis, and multiple neural networks including Probabilistic
Neural Network (PNN), General Regressive Neural Network (GRNN)and Radial Basis Function-Neural Network (RBF NN) with the AND
voting scheme. This approach is tested on GRID and VidTIMIT cor-pora and comprehensive test results have been validated with state-
of-the-art approaches. The system was found to be competitive and it improved the recognition rate by 15% as compared to the classical Mel-frequency Cepstral Coe±cients (MFCC), and reduced the recognition time by 40% compared to Back Propagation Neural Network (BPNN), Gaussian Mixture Models (GMM) and Principal Component Analysis (PCA).
Another novel approach using vowel formant analysis is implemented using Linear Discriminant Analysis (LDA). Vowel formant based speaker identification is best suitable for real-time implementation and requires only a few bytes of information to be stored for each speaker, making it both storage and time efficient. Tested on GRID and Vid-TIMIT, the proposed scheme was found to be 85.05% accurate when Linear Predictive Coding (LPC) is used to extract the vowel formants, which is much higher than the accuracy of BPNN and GMM. Since the proposed scheme does not require any training time other than creating a small database of vowel formants, it is faster as well. Furthermore, an increasing number of speakers makes it di±cult for BPNN and GMM to sustain their accuracy, but the proposed score-based methodology stays almost linear.
Finally, a novel audio-visual fusion based identification system is implemented using GMM and MFCC for speaker identi¯cation and PCA for face recognition. The results of speaker identification and face recognition are fused at different levels, namely the feature, score and decision levels. Both the score-level and decision-level (with OR voting) fusions were shown to outperform the feature-level fusion in terms of accuracy and error resilience. The result is in line with the distinct nature of the two modalities which lose themselves when combined at the feature-level. The GRID and VidTIMIT test results validate that
the proposed scheme is one of the best candidates for the fusion of
face and voice due to its low computational time and high recognition accuracy
Text Hiding in Coded Image Based on Quantization Level Modification and Chaotic Function
A text hiding method in codded image is presented in this paper that based on quantization level modification. The used image is transformed into wavelet domain by DWT and coefficient of transform is partitioned into predefined block size. Specific threshold has been used to classify these blocks into two types named smooth and complex. Each type has its own method of text hiding (binary data), for smooth blocks, secret bits which represent the text data are switched by the bitmap. In order to reduce distortion, the quantization levels are modified. To reach extra embedding payload the quantization level could carry extra two bits depending on other threshold. The complex block carry one data bit on each block and quantization levels are swapped to reduce distortion with bitmap flipping. The proposed method result shows a high signal to noise ratio, with studying capacity as important in this work
Digital Image Watermarking in Wavelet Domain
Internet allows individuals to share the information. The shared information is like text, image, audio and video files. This information sharing results in some problems such as copyright violation, unauthorized use of documents. Such problems can be solved by using a technique called as digital watermarking. This paper presents different aspects of watermarking and how it is useful for intellectual property protection on internet.DOI:http://dx.doi.org/10.11591/ijece.v3i1.174
- …