6 research outputs found
Neural Authorship Attribution: Stylometric Analysis on Large Language Models
Large language models (LLMs) such as GPT-4, PaLM, and Llama have
significantly propelled the generation of AI-crafted text. With rising concerns
about their potential misuse, there is a pressing need for AI-generated-text
forensics. Neural authorship attribution is a forensic effort, seeking to trace
AI-generated text back to its originating LLM. The LLM landscape can be divided
into two primary categories: proprietary and open-source. In this work, we
delve into these emerging categories of LLMs, focusing on the nuances of neural
authorship attribution. To enrich our understanding, we carry out an empirical
analysis of LLM writing signatures, highlighting the contrasts between
proprietary and open-source models, and scrutinizing variations within each
group. By integrating stylometric features across lexical, syntactic, and
structural aspects of language, we explore their potential to yield
interpretable results and augment pre-trained language model-based classifiers
utilized in neural authorship attribution. Our findings, based on a range of
state-of-the-art LLMs, provide empirical insights into neural authorship
attribution, paving the way for future investigations aimed at mitigating the
threats posed by AI-generated misinformation
Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection
Large language models (LLMs) excel in many diverse applications beyond
language generation, e.g., translation, summarization, and sentiment analysis.
One intriguing application is in text classification. This becomes pertinent in
the realm of identifying hateful or toxic speech -- a domain fraught with
challenges and ethical dilemmas. In our study, we have two objectives: firstly,
to offer a literature review revolving around LLMs as classifiers, emphasizing
their role in detecting and classifying hateful or toxic content. Subsequently,
we explore the efficacy of several LLMs in classifying hate speech: identifying
which LLMs excel in this task as well as their underlying attributes and
training. Providing insight into the factors that contribute to an LLM
proficiency (or lack thereof) in discerning hateful content. By combining a
comprehensive literature review with an empirical analysis, our paper strives
to shed light on the capabilities and constraints of LLMs in the crucial domain
of hate speech detection
Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey
The contemporary LLMs are prone to producing hallucinations, stemming mainly
from the knowledge gaps within the models. To address this critical limitation,
researchers employ diverse strategies to augment the LLMs by incorporating
external knowledge, aiming to reduce hallucinations and enhance reasoning
accuracy. Among these strategies, leveraging knowledge graphs as a source of
external information has demonstrated promising results. In this survey, we
conduct a comprehensive review of these knowledge-graph-based knowledge
augmentation techniques in LLMs, focusing on their efficacy in mitigating
hallucinations. We systematically categorize these methods into three
overarching groups, offering both methodological comparisons and empirical
evaluations of their performance. Lastly, the paper explores the challenges
associated with these techniques and outlines potential avenues for future
research in this emerging field
PEACE: Cross-Platform Hate Speech Detection- A Causality-guided Framework
Hate speech detection refers to the task of detecting hateful content that
aims at denigrating an individual or a group based on their religion, gender,
sexual orientation, or other characteristics. Due to the different policies of
the platforms, different groups of people express hate in different ways.
Furthermore, due to the lack of labeled data in some platforms it becomes
challenging to build hate speech detection models. To this end, we revisit if
we can learn a generalizable hate speech detection model for the cross platform
setting, where we train the model on the data from one (source) platform and
generalize the model across multiple (target) platforms. Existing
generalization models rely on linguistic cues or auxiliary information, making
them biased towards certain tags or certain kinds of words (e.g., abusive
words) on the source platform and thus not applicable to the target platforms.
Inspired by social and psychological theories, we endeavor to explore if there
exist inherent causal cues that can be leveraged to learn generalizable
representations for detecting hate speech across these distribution shifts. To
this end, we propose a causality-guided framework, PEACE, that identifies and
leverages two intrinsic causal cues omnipresent in hateful content: the overall
sentiment and the aggression in the text. We conduct extensive experiments
across multiple platforms (representing the distribution shift) showing if
causal cues can help cross-platform generalization.Comment: ECML PKDD 202
Detecting Harmful Agendas in News Articles
Manipulated news online is a growing problem which necessitates the use of
automated systems to curtail its spread. We argue that while misinformation and
disinformation detection have been studied, there has been a lack of investment
in the important open challenge of detecting harmful agendas in news articles;
identifying harmful agendas is critical to flag news campaigns with the
greatest potential for real world harm. Moreover, due to real concerns around
censorship, harmful agenda detectors must be interpretable to be effective. In
this work, we propose this new task and release a dataset, NewsAgendas, of
annotated news articles for agenda identification. We show how interpretable
systems can be effective on this task and demonstrate that they can perform
comparably to black-box models.Comment: Camera-ready for ACL-WASSA 202
J-Guard: Journalism Guided Adversarially Robust Detection of AI-generated News
The rapid proliferation of AI-generated text online is profoundly reshaping
the information landscape. Among various types of AI-generated text,
AI-generated news presents a significant threat as it can be a prominent source
of misinformation online. While several recent efforts have focused on
detecting AI-generated text in general, these methods require enhanced
reliability, given concerns about their vulnerability to simple adversarial
attacks. Furthermore, due to the eccentricities of news writing, applying these
detection methods for AI-generated news can produce false positives,
potentially damaging the reputation of news organizations. To address these
challenges, we leverage the expertise of an interdisciplinary team to develop a
framework, J-Guard, capable of steering existing supervised AI text detectors
for detecting AI-generated news while boosting adversarial robustness. By
incorporating stylistic cues inspired by the unique journalistic attributes,
J-Guard effectively distinguishes between real-world journalism and
AI-generated news articles. Our experiments on news articles generated by a
vast array of AI models, including ChatGPT (GPT3.5), demonstrate the
effectiveness of J-Guard in enhancing detection capabilities while maintaining
an average performance decrease of as low as 7% when faced with adversarial
attacks.Comment: This Paper is Accepted to The 13th International Joint Conference on
Natural Language Processing and the 3rd Conference of the Asia-Pacific
Chapter of the Association for Computational Linguistics (IJCNLP-AACL 2023