323 research outputs found
THInImg: Cross-modal Steganography for Presenting Talking Heads in Images
Cross-modal Steganography is the practice of concealing secret signals in
publicly available cover signals (distinct from the modality of the secret
signals) unobtrusively. While previous approaches primarily concentrated on
concealing a relatively small amount of information, we propose THInImg, which
manages to hide lengthy audio data (and subsequently decode talking head video)
inside an identity image by leveraging the properties of human face, which can
be effectively utilized for covert communication, transmission and copyright
protection. THInImg consists of two parts: the encoder and decoder. Inside the
encoder-decoder pipeline, we introduce a novel architecture that substantially
increase the capacity of hiding audio in images. Moreover, our framework can be
extended to iteratively hide multiple audio clips into an identity image,
offering multiple levels of control over permissions. We conduct extensive
experiments to prove the effectiveness of our method, demonstrating that
THInImg can present up to 80 seconds of high quality talking-head video
(including audio) in an identity image with 160x160 resolution.Comment: Accepted at WACV 202
InvVis: Large-Scale Data Embedding for Invertible Visualization
We present InvVis, a new approach for invertible visualization, which is
reconstructing or further modifying a visualization from an image. InvVis
allows the embedding of a significant amount of data, such as chart data, chart
information, source code, etc., into visualization images. The encoded image is
perceptually indistinguishable from the original one. We propose a new method
to efficiently express chart data in the form of images, enabling
large-capacity data embedding. We also outline a model based on the invertible
neural network to achieve high-quality data concealing and revealing. We
explore and implement a variety of application scenarios of InvVis.
Additionally, we conduct a series of evaluation experiments to assess our
method from multiple perspectives, including data embedding quality, data
restoration accuracy, data encoding capacity, etc. The result of our
experiments demonstrates the great potential of InvVis in invertible
visualization.Comment: IEEE VIS 202
WavMark: Watermarking for Audio Generation
Recent breakthroughs in zero-shot voice synthesis have enabled imitating a
speaker's voice using just a few seconds of recording while maintaining a high
level of realism. Alongside its potential benefits, this powerful technology
introduces notable risks, including voice fraud and speaker impersonation.
Unlike the conventional approach of solely relying on passive methods for
detecting synthetic data, watermarking presents a proactive and robust defence
mechanism against these looming risks. This paper introduces an innovative
audio watermarking framework that encodes up to 32 bits of watermark within a
mere 1-second audio snippet. The watermark is imperceptible to human senses and
exhibits strong resilience against various attacks. It can serve as an
effective identifier for synthesized voices and holds potential for broader
applications in audio copyright protection. Moreover, this framework boasts
high flexibility, allowing for the combination of multiple watermark segments
to achieve heightened robustness and expanded capacity. Utilizing 10 to
20-second audio as the host, our approach demonstrates an average Bit Error
Rate (BER) of 0.48\% across ten common attacks, a remarkable reduction of over
2800\% in BER compared to the state-of-the-art watermarking tool. See
https://aka.ms/wavmark for demos of our work
Using Transcoding for Hidden Communication in IP Telephony
The paper presents a new steganographic method for IP telephony called
TranSteg (Transcoding Steganography). Typically, in steganographic
communication it is advised for covert data to be compressed in order to limit
its size. In TranSteg it is the overt data that is compressed to make space for
the steganogram. The main innovation of TranSteg is to, for a chosen voice
stream, find a codec that will result in a similar voice quality but smaller
voice payload size than the originally selected. Then, the voice stream is
transcoded. At this step the original voice payload size is intentionally
unaltered and the change of the codec is not indicated. Instead, after placing
the transcoded voice payload, the remaining free space is filled with hidden
data. TranSteg proof of concept implementation was designed and developed. The
obtained experimental results are enclosed in this paper. They prove that the
proposed method is feasible and offers a high steganographic bandwidth.
TranSteg detection is difficult to perform when performing inspection in a
single network localisation.Comment: 17 pages, 16 figures, 4 table
Data Hiding with Deep Learning: A Survey Unifying Digital Watermarking and Steganography
Data hiding is the process of embedding information into a noise-tolerant
signal such as a piece of audio, video, or image. Digital watermarking is a
form of data hiding where identifying data is robustly embedded so that it can
resist tampering and be used to identify the original owners of the media.
Steganography, another form of data hiding, embeds data for the purpose of
secure and secret communication. This survey summarises recent developments in
deep learning techniques for data hiding for the purposes of watermarking and
steganography, categorising them based on model architectures and noise
injection methods. The objective functions, evaluation metrics, and datasets
used for training these data hiding models are comprehensively summarised.
Finally, we propose and discuss possible future directions for research into
deep data hiding techniques
From Covert Hiding to Visual Editing: Robust Generative Video Steganography
Traditional video steganography methods are based on modifying the covert
space for embedding, whereas we propose an innovative approach that embeds
secret message within semantic feature for steganography during the video
editing process. Although existing traditional video steganography methods
display a certain level of security and embedding capacity, they lack adequate
robustness against common distortions in online social networks (OSNs). In this
paper, we introduce an end-to-end robust generative video steganography network
(RoGVS), which achieves visual editing by modifying semantic feature of videos
to embed secret message. We employ face-swapping scenario to showcase the
visual editing effects. We first design a secret message embedding module to
adaptively hide secret message into the semantic feature of videos. Extensive
experiments display that the proposed RoGVS method applied to facial video
datasets demonstrate its superiority over existing video and image
steganography techniques in terms of both robustness and capacity.Comment: Under Revie
Augmented watermarking
This thesis provides an augmented watermarking technique wherein noise is based on the watermark added to the watermarked image so that only the end user who has the key for embedding the watermark can both remove the noise and watermark to get a final clear image. The recovery for different values of noise is observed. This system may be implemented as a basic digital rights management system by defining a regime of partial rights using overlaid watermarks, together with respectively added layers of noise, in which the rights of the users define the precision with which the signals may be viewed
On the data hiding theory and multimedia content security applications
This dissertation is a comprehensive study of digital steganography for multimedia content protection. With the increasing development of Internet technology, protection and enforcement of multimedia property rights has become a great concern to multimedia authors and distributors. Watermarking technologies provide a possible solution for this problem.
The dissertation first briefly introduces the current watermarking schemes, including their applications in video,, image and audio. Most available embedding schemes are based on direct Spread Sequence (SS) modulation. A small value pseudo random signature sequence is embedded into the host signal and the information is extracted via correlation. The correlation detection problem is discussed at the beginning. It is concluded that the correlator is not optimum in oblivious detection. The Maximum Likelihood detector is derived and some feasible suboptimal detectors are also analyzed. Through the calculation of extraction Bit Error Rate (BER), it is revealed that the SS scheme is not very efficient due to its poor host noise suppression. The watermark domain selection problem is addressed subsequently. Some implications on hiding capacity and reliability are also studied. The last topic in SS modulation scheme is the sequence selection. The relationship between sequence bandwidth and synchronization requirement is detailed in the work. It is demonstrated that the white sequence commonly used in watermarking may not really boost watermark security.
To address the host noise suppression problem, the hidden communication is modeled as a general hypothesis testing problem and a set partitioning scheme is proposed. Simulation studies and mathematical analysis confirm that it outperforms the SS schemes in host noise suppression. The proposed scheme demonstrates improvement over the existing embedding schemes.
Data hiding in audio signals are explored next. The audio data hiding is believed a more challenging task due to the human sensitivity to audio artifacts and advanced feature of current compression techniques. The human psychoacoustic model and human music understanding are also covered in the work. Then as a typical audio perceptual compression scheme, the popular MP3 compression is visited in some length. Several schemes, amplitude modulation, phase modulation and noise substitution are presented together with some experimental results. As a case study, a music bitstream encryption scheme is proposed. In all these applications, human psychoacoustic model plays a very important role. A more advanced audio analysis model is introduced to reveal implications on music understanding. In the last part, conclusions and future research are presented
- …