324 research outputs found
ICStega: Image Captioning-based Semantically Controllable Linguistic Steganography
Nowadays, social media has become the preferred communication platform for
web users but brought security threats. Linguistic steganography hides secret
data into text and sends it to the intended recipient to realize covert
communication. Compared to edit-based linguistic steganography,
generation-based approaches largely improve the payload capacity. However,
existing methods can only generate stego text alone. Another common behavior in
social media is sending semantically related image-text pairs. In this paper,
we put forward a novel image captioning-based stegosystem, where the secret
messages are embedded into the generated captions. Thus, the semantics of the
stego text can be controlled and the secret data can be transmitted by sending
semantically related image-text pairs. To balance the conflict between payload
capacity and semantic preservation, we proposed a new sampling method called
Two-Parameter Semantic Control Sampling to cutoff low-probability words.
Experimental results have shown that our method can control diversity, payload
capacity, security, and semantic accuracy at the same time.Comment: 5 pages, 5 tables, 3 figures. Accepted by ICASSP 202
Digital watermarking: a state-of-the-art review
Digital watermarking is the art of embedding data, called a
watermark, into a multimedia object such that the watermark can be detected or
extracted later without impairing the object. Concealment of secret messages inside a
natural language, known as steganography, has been in existence as early as the 16th
century. However, the increase in electronic/digital information transmission and
distribution has resulted in the spread of watermarking from ordinary text to
multimedia transmission. In this paper, we review various approaches and methods
that have been used to conceal and preserve messages. Examples of real-world
applications are also discussed.SANPAD, Telkom, Cisco, Aria Technologies, THRIPDepartment of HE and Training approved lis
SocialStegDisc: Application of steganography in social networks to create a file system
The concept named SocialStegDisc was introduced as an application of the
original idea of StegHash method. This new kind of mass-storage was
characterized by unlimited space. The design also attempted to improve the
operation of StegHash by trade-off between memory requirements and computation
time. Applying the mechanism of linked list provided the set of operations on
files: creation, reading, deletion and modification. Features, limitations and
opportunities were discussed.Comment: 5 pages, 5 figure
Publicly Detectable Watermarking for Language Models
We construct the first provable watermarking scheme for language models with
public detectability or verifiability: we use a private key for watermarking
and a public key for watermark detection. Our protocol is the first
watermarking scheme that does not embed a statistical signal in generated text.
Rather, we directly embed a publicly-verifiable cryptographic signature using a
form of rejection sampling. We show that our construction meets strong formal
security guarantees and preserves many desirable properties found in schemes in
the private-key watermarking setting. In particular, our watermarking scheme
retains distortion-freeness and model agnosticity. We implement our scheme and
make empirical measurements over open models in the 7B parameter range. Our
experiments suggest that our watermarking scheme meets our formal claims while
preserving text quality
Perfectly Secure Steganography Using Minimum Entropy Coupling
Steganography is the practice of encoding secret information into innocuous
content in such a manner that an adversarial third party would not realize that
there is hidden meaning. While this problem has classically been studied in
security literature, recent advances in generative models have led to a shared
interest among security and machine learning researchers in developing scalable
steganography techniques. In this work, we show that a steganography procedure
is perfectly secure under Cachin (1998)'s information-theoretic model of
steganography if and only if it is induced by a coupling. Furthermore, we show
that, among perfectly secure procedures, a procedure maximizes information
throughput if and only if it is induced by a minimum entropy coupling. These
insights yield what are, to the best of our knowledge, the first steganography
algorithms to achieve perfect security guarantees for arbitrary covertext
distributions. To provide empirical validation, we compare a minimum entropy
coupling-based approach to three modern baselines -- arithmetic coding, Meteor,
and adaptive dynamic grouping -- using GPT-2, WaveRNN, and Image Transformer as
communication channels. We find that the minimum entropy coupling-based
approach achieves superior encoding efficiency, despite its stronger security
constraints. In aggregate, these results suggest that it may be natural to view
information-theoretic steganography through the lens of minimum entropy
coupling
Unbiased Watermark for Large Language Models
The recent advancements in large language models (LLMs) have sparked a
growing apprehension regarding the potential misuse. One approach to mitigating
this risk is to incorporate watermarking techniques into LLMs, allowing for the
tracking and attribution of model outputs. This study examines a crucial aspect
of watermarking: how significantly watermarks impact the quality of
model-generated outputs. Previous studies have suggested a trade-off between
watermark strength and output quality. However, our research demonstrates that
it is possible to integrate watermarks without affecting the output probability
distribution with appropriate implementation. We refer to this type of
watermark as an unbiased watermark. This has significant implications for the
use of LLMs, as it becomes impossible for users to discern whether a service
provider has incorporated watermarks or not. Furthermore, the presence of
watermarks does not compromise the performance of the model in downstream
tasks, ensuring that the overall utility of the language model is preserved.
Our findings contribute to the ongoing discussion around responsible AI
development, suggesting that unbiased watermarks can serve as an effective
means of tracking and attributing model outputs without sacrificing output
quality
Robust Distortion-free Watermarks for Language Models
We propose a methodology for planting watermarks in text from an
autoregressive language model that are robust to perturbations without changing
the distribution over text up to a certain maximum generation budget. We
generate watermarked text by mapping a sequence of random numbers -- which we
compute using a randomized watermark key -- to a sample from the language
model. To detect watermarked text, any party who knows the key can align the
text to the random number sequence. We instantiate our watermark methodology
with two sampling schemes: inverse transform sampling and exponential minimum
sampling. We apply these watermarks to three language models -- OPT-1.3B,
LLaMA-7B and Alpaca-7B -- to experimentally validate their statistical power
and robustness to various paraphrasing attacks. Notably, for both the OPT-1.3B
and LLaMA-7B models, we find we can reliably detect watermarked text () from tokens even after corrupting between -\% of the tokens
via random edits (i.e., substitutions, insertions or deletions). For the
Alpaca-7B model, we conduct a case study on the feasibility of watermarking
responses to typical user instructions. Due to the lower entropy of the
responses, detection is more difficult: around of the responses -- whose
median length is around tokens -- are detectable with , and
the watermark is also less robust to certain automated paraphrasing attacks we
implement
- …