200 research outputs found
A review and open issues of diverse text watermarking techniques in spatial domain
Nowadays, information hiding is becoming a helpful technique and fetches more attention due to the fast growth of using the internet; it is applied for sending secret information by using different techniques. Watermarking is one of major important technique in information hiding. Watermarking is of hiding secret data into a carrier media to provide the privacy and integrity of information so that no one can recognize and detect it's accepted the sender and receiver. In watermarking, many various carrier formats can be used such as an image, video, audio, and text. The text is most popular used as a carrier files due to its frequency on the internet. There are many techniques variables for the text watermarking; each one has its own robust and susceptible points. In this study, we conducted a review of text watermarking in the spatial domain to explore the term text watermarking by reviewing, collecting, synthesizing and analyze the challenges of different studies which related to this area published from 2013 to 2018. The aims of this paper are to provide an overview of text watermarking and comparison between approved studies as discussed according to the Arabic text characters, payload capacity, Imperceptibility, authentication, and embedding technique to open important research issues in the future work to obtain a robust method
A Survey on LLM-generated Text Detection: Necessity, Methods, and Future Directions
The powerful ability to understand, follow, and generate complex language
emerging from large language models (LLMs) makes LLM-generated text flood many
areas of our daily lives at an incredible speed and is widely accepted by
humans. As LLMs continue to expand, there is an imperative need to develop
detectors that can detect LLM-generated text. This is crucial to mitigate
potential misuse of LLMs and safeguard realms like artistic expression and
social networks from harmful influence of LLM-generated content. The
LLM-generated text detection aims to discern if a piece of text was produced by
an LLM, which is essentially a binary classification task. The detector
techniques have witnessed notable advancements recently, propelled by
innovations in watermarking techniques, zero-shot methods, fine-turning LMs
methods, adversarial learning methods, LLMs as detectors, and human-assisted
methods. In this survey, we collate recent research breakthroughs in this area
and underscore the pressing need to bolster detector research. We also delve
into prevalent datasets, elucidating their limitations and developmental
requirements. Furthermore, we analyze various LLM-generated text detection
paradigms, shedding light on challenges like out-of-distribution problems,
potential attacks, and data ambiguity. Conclusively, we highlight interesting
directions for future research in LLM-generated text detection to advance the
implementation of responsible artificial intelligence (AI). Our aim with this
survey is to provide a clear and comprehensive introduction for newcomers while
also offering seasoned researchers a valuable update in the field of
LLM-generated text detection. The useful resources are publicly available at:
https://github.com/NLP2CT/LLM-generated-Text-Detection
On the Reliability of Watermarks for Large Language Models
As LLMs become commonplace, machine-generated text has the potential to flood
the internet with spam, social media bots, and valueless content. Watermarking
is a simple and effective strategy for mitigating such harms by enabling the
detection and documentation of LLM-generated text. Yet a crucial question
remains: How reliable is watermarking in realistic settings in the wild? There,
watermarked text may be modified to suit a user's needs, or entirely rewritten
to avoid detection.
We study the robustness of watermarked text after it is re-written by humans,
paraphrased by a non-watermarked LLM, or mixed into a longer hand-written
document. We find that watermarks remain detectable even after human and
machine paraphrasing. While these attacks dilute the strength of the watermark,
paraphrases are statistically likely to leak n-grams or even longer fragments
of the original text, resulting in high-confidence detections when enough
tokens are observed. For example, after strong human paraphrasing the watermark
is detectable after observing 800 tokens on average, when setting a 1e-5 false
positive rate. We also consider a range of new detection schemes that are
sensitive to short spans of watermarked text embedded inside a large document,
and we compare the robustness of watermarking to other kinds of detectors.Comment: 14 pages in the main body. Code is available at
https://github.com/jwkirchenbauer/lm-watermarkin
A Survey on ChatGPT: AI-Generated Contents, Challenges, and Solutions
With the widespread use of large artificial intelligence (AI) models such as
ChatGPT, AI-generated content (AIGC) has garnered increasing attention and is
leading a paradigm shift in content creation and knowledge representation. AIGC
uses generative large AI algorithms to assist or replace humans in creating
massive, high-quality, and human-like content at a faster pace and lower cost,
based on user-provided prompts. Despite the recent significant progress in
AIGC, security, privacy, ethical, and legal challenges still need to be
addressed. This paper presents an in-depth survey of working principles,
security and privacy threats, state-of-the-art solutions, and future challenges
of the AIGC paradigm. Specifically, we first explore the enabling technologies,
general architecture of AIGC, and discuss its working modes and key
characteristics. Then, we investigate the taxonomy of security and privacy
threats to AIGC and highlight the ethical and societal implications of GPT and
AIGC technologies. Furthermore, we review the state-of-the-art AIGC
watermarking approaches for regulatable AIGC paradigms regarding the AIGC model
and its produced content. Finally, we identify future challenges and open
research directions related to AIGC.Comment: 20 pages, 6 figures, 4 table
Attribution and Obfuscation of Neural Text Authorship: A Data Mining Perspective
Two interlocking research questions of growing interest and importance in
privacy research are Authorship Attribution (AA) and Authorship Obfuscation
(AO). Given an artifact, especially a text t in question, an AA solution aims
to accurately attribute t to its true author out of many candidate authors
while an AO solution aims to modify t to hide its true authorship.
Traditionally, the notion of authorship and its accompanying privacy concern is
only toward human authors. However, in recent years, due to the explosive
advancements in Neural Text Generation (NTG) techniques in NLP, capable of
synthesizing human-quality open-ended texts (so-called "neural texts"), one has
to now consider authorships by humans, machines, or their combination. Due to
the implications and potential threats of neural texts when used maliciously,
it has become critical to understand the limitations of traditional AA/AO
solutions and develop novel AA/AO solutions in dealing with neural texts. In
this survey, therefore, we make a comprehensive review of recent literature on
the attribution and obfuscation of neural text authorship from a Data Mining
perspective, and share our view on their limitations and promising research
directions.Comment: Accepted at ACM SIGKDD Explorations, Vol. 25, June 202
Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods
This Special Issue is a book composed by collecting documents published through peer review on the research of various advanced technologies related to applications and theories of signal processing for multimedia systems using ML or advanced methods. Multimedia signals include image, video, audio, character recognition and optimization of communication channels for networks. The specific contents included in this book are data hiding, encryption, object detection, image classification, and character recognition. Academics and colleagues who are interested in these topics will find it interesting to read
Digital audio watermarking for broadcast monitoring and content identification
Copyright legislation was prompted exactly 300 years ago by a desire to protect authors against exploitation of their work by others. With regard to modern content owners, Digital Rights Management (DRM) issues have become very important since the advent of the Internet. Piracy, or illegal copying, costs content owners billions of dollars every year.
DRM is just one tool that can assist content owners in exercising their rights. Two categories of DRM technologies have evolved in digital signal processing recently, namely
digital fingerprinting and digital watermarking. One area of Copyright that is consistently overlooked in DRM developments is 'Public Performance'.
The research described in this thesis analysed the administration of public performance rights within the music industry in general, with specific focus on the collective rights and broadcasting sectors in Ireland. Limitations in the administration of artists' rights were
identified. The impact of these limitations on the careers of developing artists was evaluated.
A digital audio watermarking scheme is proposed that would meet the requirements of both the broadcast and collective rights sectors. The goal of the scheme is to embed a standard identifier within an audio signal via modification of its spectral properties in such a way that it would be robust and perceptually transparent. Modification of the audio signal spectrum was attempted in a variety of ways. A method based on a super-resolution frequency identification technique was found to be most effective. The watermarking scheme was evaluated for robustness and found to be extremely effective in recovering embedded watermarks in music signals using a semi-blind decoding process. The final digital audio watermarking algorithm proposed facilitates the development of other applications in the domain of broadcast monitoring for the purposes of equitable royalty distribution along with additional applications and extension to other domains
Recent Advances in Signal Processing
The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity
- …