3,119 research outputs found
The Ethical Need for Watermarks in Machine-Generated Language
Watermarks should be introduced in the natural language outputs of AI systems
in order to maintain the distinction between human and machine-generated text.
The ethical imperative to not blur this distinction arises from the asemantic
nature of large language models and from human projections of emotional and
cognitive states on machines, possibly leading to manipulation, spreading
falsehoods or emotional distress. Enforcing this distinction requires
unintrusive, yet easily accessible marks of the machine origin. We propose to
implement a code based on equidistant letter sequences. While no such code
exists in human-written texts, its appearance in machine-generated ones would
prove helpful for ethical reasons
Non-Compositional Term Dependence for Information Retrieval
Modelling term dependence in IR aims to identify co-occurring terms that are
too heavily dependent on each other to be treated as a bag of words, and to
adapt the indexing and ranking accordingly. Dependent terms are predominantly
identified using lexical frequency statistics, assuming that (a) if terms
co-occur often enough in some corpus, they are semantically dependent; (b) the
more often they co-occur, the more semantically dependent they are. This
assumption is not always correct: the frequency of co-occurring terms can be
separate from the strength of their semantic dependence. E.g. "red tape" might
be overall less frequent than "tape measure" in some corpus, but this does not
mean that "red"+"tape" are less dependent than "tape"+"measure". This is
especially the case for non-compositional phrases, i.e. phrases whose meaning
cannot be composed from the individual meanings of their terms (such as the
phrase "red tape" meaning bureaucracy). Motivated by this lack of distinction
between the frequency and strength of term dependence in IR, we present a
principled approach for handling term dependence in queries, using both lexical
frequency and semantic evidence. We focus on non-compositional phrases,
extending a recent unsupervised model for their detection [21] to IR. Our
approach, integrated into ranking using Markov Random Fields [31], yields
effectiveness gains over competitive TREC baselines, showing that there is
still room for improvement in the very well-studied area of term dependence in
IR
Detecting The Corruption Of Online Questionnaires By Artificial Intelligence
Online questionnaires that use crowd-sourcing platforms to recruit
participants have become commonplace, due to their ease of use and low costs.
Artificial Intelligence (AI) based Large Language Models (LLM) have made it
easy for bad actors to automatically fill in online forms, including generating
meaningful text for open-ended tasks. These technological advances threaten the
data quality for studies that use online questionnaires. This study tested if
text generated by an AI for the purpose of an online study can be detected by
both humans and automatic AI detection systems. While humans were able to
correctly identify authorship of text above chance level (76 percent accuracy),
their performance was still below what would be required to ensure satisfactory
data quality. Researchers currently have to rely on the disinterest of bad
actors to successfully use open-ended responses as a useful tool for ensuring
data quality. Automatic AI detection systems are currently completely unusable.
If AIs become too prevalent in submitting responses then the costs associated
with detecting fraudulent submissions will outweigh the benefits of online
questionnaires. Individual attention checks will no longer be a sufficient tool
to ensure good data quality. This problem can only be systematically addressed
by crowd-sourcing platforms. They cannot rely on automatic AI detection systems
and it is unclear how they can ensure data quality for their paying clients
Attribution and Obfuscation of Neural Text Authorship: A Data Mining Perspective
Two interlocking research questions of growing interest and importance in
privacy research are Authorship Attribution (AA) and Authorship Obfuscation
(AO). Given an artifact, especially a text t in question, an AA solution aims
to accurately attribute t to its true author out of many candidate authors
while an AO solution aims to modify t to hide its true authorship.
Traditionally, the notion of authorship and its accompanying privacy concern is
only toward human authors. However, in recent years, due to the explosive
advancements in Neural Text Generation (NTG) techniques in NLP, capable of
synthesizing human-quality open-ended texts (so-called "neural texts"), one has
to now consider authorships by humans, machines, or their combination. Due to
the implications and potential threats of neural texts when used maliciously,
it has become critical to understand the limitations of traditional AA/AO
solutions and develop novel AA/AO solutions in dealing with neural texts. In
this survey, therefore, we make a comprehensive review of recent literature on
the attribution and obfuscation of neural text authorship from a Data Mining
perspective, and share our view on their limitations and promising research
directions.Comment: Accepted at ACM SIGKDD Explorations, Vol. 25, June 202
Mapping consumer sentiment toward wireless services using geospatial twitter data
Hyper-dense wireless network deployment is one of the popular solutions to meeting high capacity requirement for 5G delivery. However, current operator understanding of consumer satisfaction
comes from call centers and base station quality-of-service (QoS) reports with poor geographic accuracy. The dramatic increase in geo-tagged social media posts adds a new potential to understand consumer satisfaction towards target-specific quality-of-experience (QoE) topics. In our paper, we focus on evaluating users’ opinion on wireless service-related topics by applying natural language processing (NLP) to geo-tagged Twitter data. Current generalized sentiment detection methods with generalized NLP corpora are not topic specific. Here, we develop a novel wireless service topic-specific sentiment framework, yielding higher targeting accuracy than generalized NLP frameworks. To do so, we first annotate a new sentiment corpus called SignalSentiWord (SSW) and compare its performance with two other popular corpus libraries, AFINN and SentiWordNet. We then apply three established machine learning methods, namely: Naïve Bayes (NB), Support Vector Machine (SVM), and Recurrent Neural Network (RNN) to build our topic-specific sentiment classifier. Furthermore, we discuss the capability of SSW to filter noisy and high-frequency irrelevant words to improve the performance of machine learning algorithms. Finally, the real-world testing results show that our proposed SSW improves the performance of NLP significantly
Reverse-Engineering Satire, or "Paper on Computational Humor Accepted Despite Making Serious Advances"
Humor is an essential human trait. Efforts to understand humor have called
out links between humor and the foundations of cognition, as well as the
importance of humor in social engagement. As such, it is a promising and
important subject of study, with relevance for artificial intelligence and
human-computer interaction. Previous computational work on humor has mostly
operated at a coarse level of granularity, e.g., predicting whether an entire
sentence, paragraph, document, etc., is humorous. As a step toward deep
understanding of humor, we seek fine-grained models of attributes that make a
given text humorous. Starting from the observation that satirical news
headlines tend to resemble serious news headlines, we build and analyze a
corpus of satirical headlines paired with nearly identical but serious
headlines. The corpus is constructed via Unfun.me, an online game that
incentivizes players to make minimal edits to satirical headlines with the goal
of making other players believe the results are serious headlines. The edit
operations used to successfully remove humor pinpoint the words and concepts
that play a key role in making the original, satirical headline funny. Our
analysis reveals that the humor tends to reside toward the end of headlines,
and primarily in noun phrases, and that most satirical headlines follow a
certain logical pattern, which we term false analogy. Overall, this paper
deepens our understanding of the syntactic and semantic structure of satirical
news headlines and provides insights for building humor-producing systems.Comment: Proceedings of the 33rd AAAI Conference on Artificial Intelligence,
201
Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as You May Think -- Introducing AI Detectability Index
With the rise of prolific ChatGPT, the risk and consequences of AI-generated
text has increased alarmingly. To address the inevitable question of ownership
attribution for AI-generated artifacts, the US Copyright Office released a
statement stating that 'If a work's traditional elements of authorship were
produced by a machine, the work lacks human authorship and the Office will not
register it'. Furthermore, both the US and the EU governments have recently
drafted their initial proposals regarding the regulatory framework for AI.
Given this cynosural spotlight on generative AI, AI-generated text detection
(AGTD) has emerged as a topic that has already received immediate attention in
research, with some initial methods having been proposed, soon followed by
emergence of techniques to bypass detection. This paper introduces the Counter
Turing Test (CT^2), a benchmark consisting of techniques aiming to offer a
comprehensive evaluation of the robustness of existing AGTD techniques. Our
empirical findings unequivocally highlight the fragility of the proposed AGTD
methods under scrutiny. Amidst the extensive deliberations on policy-making for
regulating AI development, it is of utmost importance to assess the
detectability of content generated by LLMs. Thus, to establish a quantifiable
spectrum facilitating the evaluation and ranking of LLMs according to their
detectability levels, we propose the AI Detectability Index (ADI). We conduct a
thorough examination of 15 contemporary LLMs, empirically demonstrating that
larger LLMs tend to have a higher ADI, indicating they are less detectable
compared to smaller LLMs. We firmly believe that ADI holds significant value as
a tool for the wider NLP community, with the potential to serve as a rubric in
AI-related policy-making.Comment: EMNLP 2023 Mai
Are Deep Learning-Generated Social Media Profiles Indistinguishable from Real Profiles?
In recent years, deep learning methods have become increasingly capable of generating near photorealistic pictures and humanlike text up to the point that humans can no longer recognize what is real and what is AI-generated. Concerningly, there is evidence that some of these methods have already been adopted to produce fake social media profiles and content. We hypothesize that these advances have made detecting generated fake social media content in the feed extremely difficult, if not impossible, for the average user of social media. This paper presents the results of an experiment where 375 participants attempted to label real and generated profiles and posts in a simulated social media feed. The results support our hypothesis and suggest that even fully-generated fake profiles with posts written by an advanced text generator are difficult for humans to identify
ArguGPT: evaluating, understanding and identifying argumentative essays generated by GPT models
AI generated content (AIGC) presents considerable challenge to educators
around the world. Instructors need to be able to detect such text generated by
large language models, either with the naked eye or with the help of some
tools. There is also growing need to understand the lexical, syntactic and
stylistic features of AIGC. To address these challenges in English language
teaching, we first present ArguGPT, a balanced corpus of 4,038 argumentative
essays generated by 7 GPT models in response to essay prompts from three
sources: (1) in-class or homework exercises, (2) TOEFL and (3) GRE writing
tasks. Machine-generated texts are paired with roughly equal number of
human-written essays with three score levels matched in essay prompts. We then
hire English instructors to distinguish machine essays from human ones. Results
show that when first exposed to machine-generated essays, the instructors only
have an accuracy of 61% in detecting them. But the number rises to 67% after
one round of minimal self-training. Next, we perform linguistic analyses of
these essays, which show that machines produce sentences with more complex
syntactic structures while human essays tend to be lexically more complex.
Finally, we test existing AIGC detectors and build our own detectors using SVMs
and RoBERTa. Results suggest that a RoBERTa fine-tuned with the training set of
ArguGPT achieves above 90% accuracy in both essay- and sentence-level
classification. To the best of our knowledge, this is the first comprehensive
analysis of argumentative essays produced by generative large language models.
Machine-authored essays in ArguGPT and our models will be made publicly
available at https://github.com/huhailinguist/ArguGP
- …