4 research outputs found
Mark My Words: Analyzing and Evaluating Language Model Watermarks
The capabilities of large language models have grown significantly in recent
years and so too have concerns about their misuse. In this context, the ability
to distinguish machine-generated text from human-authored content becomes
important. Prior works have proposed numerous schemes to watermark text, which
would benefit from a systematic evaluation framework. This work focuses on text
watermarking techniques - as opposed to image watermarks - and proposes
MARKMYWORDS, a comprehensive benchmark for them under different tasks as well
as practical attacks. We focus on three main metrics: quality, size (e.g. the
number of tokens needed to detect a watermark), and tamper-resistance. Current
watermarking techniques are good enough to be deployed: Kirchenbauer et al. [1]
can watermark Llama2-7B-chat with no perceivable loss in quality, the watermark
can be detected with fewer than 100 tokens, and the scheme offers good
tamper-resistance to simple attacks. We argue that watermark
indistinguishability, a criteria emphasized in some prior works, is too strong
a requirement: schemes that slightly modify logit distributions outperform
their indistinguishable counterparts with no noticeable loss in generation
quality. We publicly release our benchmark
(https://github.com/wagner-group/MarkMyWords)Comment: 18 pages, 11 figure
A review and open issues of diverse text watermarking techniques in spatial domain
Nowadays, information hiding is becoming a helpful technique and fetches more attention due to the fast growth of using the internet; it is applied for sending secret information by using different techniques. Watermarking is one of major important technique in information hiding. Watermarking is of hiding secret data into a carrier media to provide the privacy and integrity of information so that no one can recognize and detect it's accepted the sender and receiver. In watermarking, many various carrier formats can be used such as an image, video, audio, and text. The text is most popular used as a carrier files due to its frequency on the internet. There are many techniques variables for the text watermarking; each one has its own robust and susceptible points. In this study, we conducted a review of text watermarking in the spatial domain to explore the term text watermarking by reviewing, collecting, synthesizing and analyze the challenges of different studies which related to this area published from 2013 to 2018. The aims of this paper are to provide an overview of text watermarking and comparison between approved studies as discussed according to the Arabic text characters, payload capacity, Imperceptibility, authentication, and embedding technique to open important research issues in the future work to obtain a robust method