31 research outputs found
Semifragile Speech Watermarking Based on Least Significant Bit Replacement of Line Spectral Frequencies
There are various techniques for speech watermarking based on modifying the linear prediction coefficients (LPCs); however, the estimated and modified LPCs vary from each other even without attacks. Because line spectral frequency (LSF) has less sensitivity to watermarking than LPC, watermark bits are embedded into the maximum number of LSFs by applying the least significant bit replacement (LSBR) method. To reduce the differences between estimated and modified LPCs, a checking loop is added to minimize the watermark extraction error. Experimental results show that the proposed semifragile speech watermarking method can provide high imperceptibility and that any manipulation of the watermark signal destroys the watermark bits since manipulation changes it to a random stream of bits
Optimization of a Blind Speech Watermarking Technique against Amplitude Scaling
This paper presents a gain invariant speech watermarking technique based on quantization of the Lp-norm. In this scheme, first, the original speech signal is divided into different frames. Second, each frame is divided into two vectors based on odd and even indices. Third, quantization index modulation (QIM) is used to embed the watermark bits into the ratio of the Lp-norm between the odd and even indices. Finally, the Lagrange optimization technique is applied to minimize the embedding distortion. By applying a statistical analytical approach, the embedding distortion and error probability are estimated. Experimental results not only confirm the accuracy of the driven statistical analytical approach but also prove the robustness of the proposed technique against common signal processing attacks
Multilevel Steganography to Improve Secret Communication
This chapter presents multilevel audio steganography, which describes a new model for hidden communication in secret communication technology. At least two embedding methods are used in such a way that the second method will use the first method as a carrier. The proposed method has several potential benefits in hidden communication. This method can be used to increase the level of security while transmitting the confidential information over public channels or internet and also can be used to provide two or more information hiding solutions simultaneously. The performance of the proposed method in terms of imperceptibility, capacity & security is measured through different experiments
Fast Blind Audio Copy-Move Detection and Localization Using Local Feature Tensors in Noise
The increasing availability of audio editing software altering digital audios
and their ease of use allows create forgeries at low cost. A copy-move forgery
(CMF) is one of easiest and popular audio forgeries, which created by copying
and pasting audio segments within the same audio, and potentially
post-processing it. Three main approaches to audio copy-move detection exist
nowadays: samples/frames comparison, acoustic features coherence searching and
dynamic time warping. But these approaches will suffer from computational
complexity and/or sensitive to noise and post-processing. In this paper, we
propose a new local feature tensors-based copy-move detection algorithm that
can be applied to transformed duplicates detection and localization problem to
a special locality sensitive hash like procedure. The experimental results with
massive online real-time audios datasets reveal that the proposed technique
effectively determines and locating copy-move forgeries even on a forged speech
segment are as short as fractional second. This method is also computational
efficient and robust against the audios processed with severe nonlinear
transformation, such as resampling, filtering, jsittering, compression and
cropping, even contaminated with background noise and music. Hence, the
proposed technique provides an efficient and reliable way of copy-move forgery
detection that increases the credibility of audio in practical forensics
application
Recent Advances in Signal Processing
The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity
New Digital Audio Watermarking Algorithms for Copyright Protection
This thesis investigates the development of digital audio watermarking in addressing issues such as copyright protection. Over the past two decades, many digital watermarking algorithms have been developed, each with its own advantages and disadvantages. The main aim of this thesis was to develop a new watermarking algorithm within an existing Fast Fourier Transform framework. This resulted in the development of a Complex Spectrum Phase Evolution based watermarking algorithm. In this new implementation, the embedding positions were generated dynamically thereby rendering it more difficult for an attacker to remove, and watermark information was embedded by manipulation of the spectral components in the time domain thereby reducing any audible distortion. Further improvements were attained when the embedding criteria was based on bin location comparison instead of magnitude, thereby rendering it more robust against those attacks that interfere with the spectral magnitudes.
However, it was discovered that this new audio watermarking algorithm has some disadvantages such as a relatively low capacity and a non-consistent robustness for different audio files. Therefore, a further aim of this thesis was to improve the algorithm from a different perspective.
Improvements were investigated using an Singular Value Decomposition framework wherein a novel observation was discovered. Furthermore, a psychoacoustic model was incorporated to suppress any audible distortion. This resulted in a watermarking algorithm which achieved a higher capacity and a more consistent robustness.
The overall result was that two new digital audio watermarking algorithms were developed which were complementary in their performance thereby opening more opportunities for further research
Blind Detection of Copy-Move Forgery in Digital Audio Forensics
Although copy-move forgery is one of the most common fabrication techniques, blind detection of such tampering in digital audio is mostly unexplored. Unlike active techniques, blind forgery detection is challenging, because it does not embed a watermark or signature in an audio that is unknown in most of the real-life scenarios. Therefore, forgery localization becomes more challenging, especially when using blind methods. In this paper, we propose a novel method for blind detection and localization of copy-move forgery. One of the most crucial steps in the proposed method is a voice activity detection (VAD) module for investigating audio recordings to detect and localize the forgery. The VAD module is equally vital for the development of the copy-move forgery database, wherein audio samples are generated by using the recordings of various types of microphones. We employ a chaotic theory to copy and move the text in generated forged recordings to ensure forgery localization at any place in a recording. The VAD module is responsible for the extraction of words in a forged audio, and these words are analyzed by applying a 1-D local binary pattern operator. This operator provides the patterns of extracted words in the form of histograms. The forged parts (copy and move text) have similar histograms. An accuracy of 96.59% is achieved, and the proposed method is deemed robust against noise
Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization
Automatic speech recognition (ASR) has recently become an important challenge
when using deep learning (DL). It requires large-scale training datasets and
high computational and storage resources. Moreover, DL techniques and machine
learning (ML) approaches in general, hypothesize that training and testing data
come from the same domain, with the same input feature space and data
distribution characteristics. This assumption, however, is not applicable in
some real-world artificial intelligence (AI) applications. Moreover, there are
situations where gathering real data is challenging, expensive, or rarely
occurring, which can not meet the data requirements of DL models. deep transfer
learning (DTL) has been introduced to overcome these issues, which helps
develop high-performing models using real datasets that are small or slightly
different but related to the training data. This paper presents a comprehensive
survey of DTL-based ASR frameworks to shed light on the latest developments and
helps academics and professionals understand current challenges. Specifically,
after presenting the DTL background, a well-designed taxonomy is adopted to
inform the state-of-the-art. A critical analysis is then conducted to identify
the limitations and advantages of each framework. Moving on, a comparative
study is introduced to highlight the current challenges before deriving
opportunities for future research
Audio Deepfake Detection: A Survey
Audio deepfake detection is an emerging active topic. A growing number of
literatures have aimed to study deepfake detection algorithms and achieved
effective performance, the problem of which is far from being solved. Although
there are some review literatures, there has been no comprehensive survey that
provides researchers with a systematic overview of these developments with a
unified evaluation. Accordingly, in this survey paper, we first highlight the
key differences across various types of deepfake audio, then outline and
analyse competitions, datasets, features, classifications, and evaluation of
state-of-the-art approaches. For each aspect, the basic techniques, advanced
developments and major challenges are discussed. In addition, we perform a
unified comparison of representative features and classifiers on ASVspoof 2021,
ADD 2023 and In-the-Wild datasets for audio deepfake detection, respectively.
The survey shows that future research should address the lack of large scale
datasets in the wild, poor generalization of existing detection methods to
unknown fake attacks, as well as interpretability of detection results