17 research outputs found
Debiasing Community Detection: The Importance of Lowly-Connected Nodes
Community detection is an important task in social network analysis, allowing
us to identify and understand the communities within the social structures.
However, many community detection approaches either fail to assign low degree
(or lowly-connected) users to communities, or assign them to trivially small
communities that prevent them from being included in analysis. In this work, we
investigate how excluding these users can bias analysis results. We then
introduce an approach that is more inclusive for lowly-connected users by
incorporating them into larger groups. Experiments show that our approach
outperforms the existing state-of-the-art in terms of F1 and Jaccard similarity
scores while reducing the bias towards low-degree users
Exacerbating Algorithmic Bias through Fairness Attacks
Algorithmic fairness has attracted significant attention in recent years,
with many quantitative measures suggested for characterizing the fairness of
different machine learning algorithms. Despite this interest, the robustness of
those fairness measures with respect to an intentional adversarial attack has
not been properly addressed. Indeed, most adversarial machine learning has
focused on the impact of malicious attacks on the accuracy of the system,
without any regard to the system's fairness. We propose new types of data
poisoning attacks where an adversary intentionally targets the fairness of a
system. Specifically, we propose two families of attacks that target fairness
measures. In the anchoring attack, we skew the decision boundary by placing
poisoned points near specific target points to bias the outcome. In the
influence attack on fairness, we aim to maximize the covariance between the
sensitive attributes and the decision outcome and affect the fairness of the
model. We conduct extensive experiments that indicate the effectiveness of our
proposed attacks
On the steerability of large language models toward data-driven personas
Large language models (LLMs) are known to generate biased responses where the
opinions of certain groups and populations are underrepresented. Here, we
present a novel approach to achieve controllable generation of specific
viewpoints using LLMs, that can be leveraged to produce multiple perspectives
and to reflect the diverse opinions. Moving beyond the traditional reliance on
demographics like age, gender, or party affiliation, we introduce a data-driven
notion of persona grounded in collaborative filtering, which is defined as
either a single individual or a cohort of individuals manifesting similar views
across specific inquiries. As individuals in the same demographic group may
have different personas, our data-driven persona definition allows for a more
nuanced understanding of different (latent) social groups present in the
population. In addition to this, we also explore an efficient method to steer
LLMs toward the personas that we define. We show that our data-driven personas
significantly enhance model steerability, with improvements of between
over our best performing baselines
Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies
Gender-inclusive NLP research has documented the harmful limitations of
gender binary-centric large language models (LLM), such as the inability to
correctly use gender-diverse English neopronouns (e.g., xe, zir, fae). While
data scarcity is a known culprit, the precise mechanisms through which scarcity
affects this behavior remain underexplored. We discover LLM misgendering is
significantly influenced by Byte-Pair Encoding (BPE) tokenization, the
tokenizer powering many popular LLMs. Unlike binary pronouns, BPE overfragments
neopronouns, a direct consequence of data scarcity during tokenizer training.
This disparate tokenization mirrors tokenizer limitations observed in
multilingual and low-resource NLP, unlocking new misgendering mitigation
strategies. We propose two techniques: (1) pronoun tokenization parity, a
method to enforce consistent tokenization across gendered pronouns, and (2)
utilizing pre-existing LLM pronoun knowledge to improve neopronoun proficiency.
Our proposed methods outperform finetuning with standard BPE, improving
neopronoun accuracy from 14.1% to 58.4%. Our paper is the first to link LLM
misgendering to tokenization and deficient neopronoun grammar, indicating that
LLMs unable to correctly treat neopronouns as pronouns are more prone to
misgender.Comment: Accepted to NAACL 2024 finding
FLIRT: Feedback Loop In-context Red Teaming
Warning: this paper contains content that may be inappropriate or offensive.
As generative models become available for public use in various applications,
testing and analyzing vulnerabilities of these models has become a priority.
Here we propose an automatic red teaming framework that evaluates a given model
and exposes its vulnerabilities against unsafe and inappropriate content
generation. Our framework uses in-context learning in a feedback loop to red
team models and trigger them into unsafe content generation. We propose
different in-context attack strategies to automatically learn effective and
diverse adversarial prompts for text-to-image models. Our experiments
demonstrate that compared to baseline approaches, our proposed strategy is
significantly more effective in exposing vulnerabilities in Stable Diffusion
(SD) model, even when the latter is enhanced with safety features. Furthermore,
we demonstrate that the proposed framework is effective for red teaming
text-to-text models, resulting in significantly higher toxic response
generation rate compared to previously reported numbers
