22 research outputs found
Decoding The Digital Fuku: Deciphering Colonial Legacies to Critically Assess ChatGPT in Dominican Education
Educational disparities within the Dominican Republic (DR) have long-standing
origins rooted in economic, political, and social inequity. Addressing these
challenges has necessarily called for capacity building with respect to
educational materials, high-quality instruction, and structural resourcing.
Generative AI tools like ChatGPT have begun to pique the interest of Dominican
educators due to their perceived potential to bridge these educational gaps.
However, a substantial body of AI fairness literature has documented ways AI
disproportionately reinforces power dynamics reflective of jurisdictions
driving AI development and deployment policies, collectively termed the AI
Global North. As such, indiscriminate adoption of this technology for DR
education, even in part, risks perpetuating forms of digital coloniality.
Therefore, this paper centers embracing AI-facilitated educational reform by
critically examining how AI-driven tools like ChatGPT in DR education may
replicate facets of digital colonialism. We provide a concise overview of
20th-century Dominican education reforms following the 1916 US occupation.
Then, we employ identified neocolonial aspects historically shaping Dominican
education to interrogate the perceived advantages of ChatGPT for contemporary
Dominican education, as outlined by a Dominican scholar. This work invites AI
Global North & South developers, stakeholders, and Dominican leaders alike to
exercise a relational contextualization of data-centric epistemologies like
ChatGPT to reap its transformative benefits while remaining vigilant of
safeguarding Dominican digital sovereignty
Recommended from our members
Learning from the Outliers: On Centering Underrepresented Communities to Build Inclusive and Socially-Grounded Language Technologies
Large scale deployment of chat-based large language models (LLM) require careful evaluations to ensure these systems operate in an inclusive manner across diverse sociocultural contexts. Prior research has found that AI-driven systems can replicate and amplify existing social inequalities, such as ascribing a person who uses the pronoun "she" as less likely to be a doctor and more likely to be a homemaker. Historically marginalized communities, such as transgender and non-binary (TGNB) individuals, are particularly susceptible to these harms, as algorithmic systems often fail to represent identities that diverge from binary gender conventions.This dissertation demonstrates the interdependence of technical and social considerations in the development of inclusive language models. In the first part, we systematically investigate the representational harms LLMs can inflict on TGNB identities. We introduce TANGO, a benchmark dataset designed to evaluate gender-inclusive competencies such as pronoun congruence and gender disclosure. Our findings reveal high misgendering rates and severe data-resource limitations, leading to poor handling of gender-diverse pronouns. To address these challenges, we propose novel mitigation techniques which center tokenization and low-resource methods, leading to significant improvements in LLM gender inclusivity.In the second part, we uncover fundamental limitations within existing gender bias evaluation frameworks, highlighting the sociotechnical consequences of limited construct validity. Through contextually grounded evaluations based on lived TGNB experiences, we demonstrate that even LLMs explicitly aligned for safety can propagate harmful biases that go undetected by conventional evaluation frameworks. By involving the TGNB community in dataset creation and evaluation, we showcase how participatory methods can ensure that marginalized voices guide the development of more inclusive AI systems. Finally, we present SLOGAN, a framework for detecting local biases in clinical prediction tasks, illustrating how these contextually grounded techniques can address biases in various domains.Together, these findings collectively highlight promising directions for tackling LLM harms through community-informed technical and systemic mitigation strategies
Should they? Mobile Biometrics and Technopolicy meet Queer Community Considerations
Smartphones are integral to our daily lives and activities, providing us with
basic functions like texting and phone calls to more complex motion-based
functionalities like navigation, mobile gaming, and fitness-tracking. To
facilitate these functionalities, smartphones rely on integrated sensors like
accelerometers and gyroscopes. These sensors provide personalized measurements
that, in turn, contribute to tasks such as analyzing biometric data for mobile
health purposes. In addition to benefiting smartphone users, biometric data
holds significant value for researchers engaged in biometric identification
research. Nonetheless, utilizing this user data for biometric identification
tasks, such as gait and gender recognition, raises serious privacy, normative,
and ethical concerns, particularly within the queer community. Concerns of
algorithmic bias and algorithmically-driven dysphoria surface from a historical
backdrop of marginalization, surveillance, harassment, discrimination, and
violence against the queer community. In this position paper, we contribute to
the timely discourse on safeguarding human rights within AI-driven systems by
providing a sense of challenges, tensions, and opportunities for new data
protections and biometric collection practices in a way that grapples with the
sociotechnical realities of the queer community.Comment: To appear at 2023 ACM Conference on Equity and Access in Algorithms,
Mechanisms, and Optimizatio
Improving Adversarial Robustness to Sensitivity and Invariance Attacks with Deep Metric Learning
Intentionally crafted adversarial samples have effectively exploited
weaknesses in deep neural networks. A standard method in adversarial robustness
assumes a framework to defend against samples crafted by minimally perturbing a
sample such that its corresponding model output changes. These sensitivity
attacks exploit the model's sensitivity toward task-irrelevant features.
Another form of adversarial sample can be crafted via invariance attacks, which
exploit the model underestimating the importance of relevant features. Previous
literature has indicated a tradeoff in defending against both attack types
within a strictly L_p bounded defense. To promote robustness toward both types
of attacks beyond Euclidean distance metrics, we use metric learning to frame
adversarial regularization as an optimal transport problem. Our preliminary
results indicate that regularizing over invariant perturbations in our
framework improves both invariant and sensitivity defense.Comment: v
Factoring the Matrix of Domination: A Critical Review and Reimagination of Intersectionality in AI Fairness
Intersectionality is a critical framework that, through inquiry and praxis,
allows us to examine how social inequalities persist through domains of
structure and discipline. Given AI fairness' raison d'etre of "fairness", we
argue that adopting intersectionality as an analytical framework is pivotal to
effectively operationalizing fairness. Through a critical review of how
intersectionality is discussed in 30 papers from the AI fairness literature, we
deductively and inductively: 1) map how intersectionality tenets operate within
the AI fairness paradigm and 2) uncover gaps between the conceptualization and
operationalization of intersectionality. We find that researchers
overwhelmingly reduce intersectionality to optimizing for fairness metrics over
demographic subgroups. They also fail to discuss their social context and when
mentioning power, they mostly situate it only within the AI pipeline. We: 3)
outline and assess the implications of these gaps for critical inquiry and
praxis, and 4) provide actionable recommendations for AI fairness researchers
to engage with intersectionality in their work by grounding it in AI
epistemology.Comment: To appear at AIES 202
ChatGPT for Us: Preserving Data Privacy in ChatGPT via Dialogue Text Ambiguation to Expand Mental Health Care Delivery
Large language models have been useful in expanding mental health care
delivery. ChatGPT, in particular, has gained popularity for its ability to
generate human-like dialogue. However, data-sensitive domains -- including but
not limited to healthcare -- face challenges in using ChatGPT due to privacy
and data-ownership concerns. To enable its utilization, we propose a text
ambiguation framework that preserves user privacy. We ground this in the task
of addressing stress prompted by user-provided texts to demonstrate the
viability and helpfulness of privacy-preserved generations. Our results suggest
that chatGPT recommendations are still able to be moderately helpful and
relevant, even when the original user text is not provided
"I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation
Transgender and non-binary (TGNB) individuals disproportionately experience
discrimination and exclusion from daily life. Given the recent popularity and
adoption of language generation technologies, the potential to further
marginalize this population only grows. Although a multitude of NLP fairness
literature focuses on illuminating and addressing gender biases, assessing
gender harms for TGNB identities requires understanding how such identities
uniquely interact with societal gender norms and how they differ from gender
binary-centric perspectives. Such measurement frameworks inherently require
centering TGNB voices to help guide the alignment between gender-inclusive NLP
and whom they are intended to serve. Towards this goal, we ground our work in
the TGNB community and existing interdisciplinary literature to assess how the
social reality surrounding experienced marginalization of TGNB persons
contributes to and persists within Open Language Generation (OLG). This social
knowledge serves as a guide for evaluating popular large language models (LLMs)
on two key aspects: (1) misgendering and (2) harmful responses to gender
disclosure. To do this, we introduce TANGO, a dataset of template-based
real-world text curated from a TGNB-oriented community. We discover a dominance
of binary gender norms reflected by the models; LLMs least misgendered subjects
in generated text when triggered by prompts whose subjects used binary
pronouns. Meanwhile, misgendering was most prevalent when triggering generation
with singular they and neopronouns. When prompted with gender disclosures, TGNB
disclosure generated the most stigmatizing language and scored most toxic, on
average. Our findings warrant further research on how TGNB harms manifest in
LLMs and serve as a broader case study toward concretely grounding the design
of gender-inclusive AI in community voices and interdisciplinary literature
Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies
Gender-inclusive NLP research has documented the harmful limitations of
gender binary-centric large language models (LLM), such as the inability to
correctly use gender-diverse English neopronouns (e.g., xe, zir, fae). While
data scarcity is a known culprit, the precise mechanisms through which scarcity
affects this behavior remain underexplored. We discover LLM misgendering is
significantly influenced by Byte-Pair Encoding (BPE) tokenization, the
tokenizer powering many popular LLMs. Unlike binary pronouns, BPE overfragments
neopronouns, a direct consequence of data scarcity during tokenizer training.
This disparate tokenization mirrors tokenizer limitations observed in
multilingual and low-resource NLP, unlocking new misgendering mitigation
strategies. We propose two techniques: (1) pronoun tokenization parity, a
method to enforce consistent tokenization across gendered pronouns, and (2)
utilizing pre-existing LLM pronoun knowledge to improve neopronoun proficiency.
Our proposed methods outperform finetuning with standard BPE, improving
neopronoun accuracy from 14.1% to 58.4%. Our paper is the first to link LLM
misgendering to tokenization and deficient neopronoun grammar, indicating that
LLMs unable to correctly treat neopronouns as pronouns are more prone to
misgender.Comment: Accepted to NAACL 2024 finding
Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation
The recent advancement of large and powerful models with Text-to-Image (T2I)
generation abilities -- such as OpenAI's DALLE-3 and Google's Gemini -- enables
users to generate high-quality images from textual prompts. However, it has
become increasingly evident that even simple prompts could cause T2I models to
exhibit conspicuous social bias in generated images. Such bias might lead to
both allocational and representational harms in society, further marginalizing
minority groups. Noting this problem, a large body of recent works has been
dedicated to investigating different dimensions of bias in T2I systems.
However, an extensive review of these studies is lacking, hindering a
systematic understanding of current progress and research gaps. We present the
first extensive survey on bias in T2I generative models. In this survey, we
review prior studies on dimensions of bias: Gender, Skintone, and Geo-Culture.
Specifically, we discuss how these works define, evaluate, and mitigate
different aspects of bias. We found that: (1) while gender and skintone biases
are widely studied, geo-cultural bias remains under-explored; (2) most works on
gender and skintone bias investigated occupational association, while other
aspects are less frequently studied; (3) almost all gender bias works overlook
non-binary identities in their studies; (4) evaluation datasets and metrics are
scattered, with no unified framework for measuring biases; and (5) current
mitigation methods fail to resolve biases comprehensively. Based on current
limitations, we point out future research directions that contribute to
human-centric definitions, evaluations, and mitigation of biases. We hope to
highlight the importance of studying biases in T2I systems, as well as
encourage future efforts to holistically understand and tackle biases, building
fair and trustworthy T2I technologies for everyone
Bound by the Bounty: Collaboratively Shaping Evaluation Processes for Queer AI Harms
Bias evaluation benchmarks and dataset and model documentation have emerged
as central processes for assessing the biases and harms of artificial
intelligence (AI) systems. However, these auditing processes have been
criticized for their failure to integrate the knowledge of marginalized
communities and consider the power dynamics between auditors and the
communities. Consequently, modes of bias evaluation have been proposed that
engage impacted communities in identifying and assessing the harms of AI
systems (e.g., bias bounties). Even so, asking what marginalized communities
want from such auditing processes has been neglected. In this paper, we ask
queer communities for their positions on, and desires from, auditing processes.
To this end, we organized a participatory workshop to critique and redesign
bias bounties from queer perspectives. We found that when given space, the
scope of feedback from workshop participants goes far beyond what bias bounties
afford, with participants questioning the ownership, incentives, and efficacy
of bounties. We conclude by advocating for community ownership of bounties and
complementing bounties with participatory processes (e.g., co-creation).Comment: To appear at AIES 202
