44 research outputs found
Racial categories in machine learning
Controversies around race and machine learning have sparked debate among
computer scientists over how to design machine learning systems that guarantee
fairness. These debates rarely engage with how racial identity is embedded in
our social experience, making for sociological and psychological complexity.
This complexity challenges the paradigm of considering fairness to be a formal
property of supervised learning with respect to protected personal attributes.
Racial identity is not simply a personal subjective quality. For people labeled
"Black" it is an ascribed political category that has consequences for social
differentiation embedded in systemic patterns of social inequality achieved
through both social and spatial segregation. In the United States, racial
classification can best be understood as a system of inherently unequal status
categories that places whites as the most privileged category while signifying
the Negro/black category as stigmatized. Social stigma is reinforced through
the unequal distribution of societal rewards and goods along racial lines that
is reinforced by state, corporate, and civic institutions and practices. This
creates a dilemma for society and designers: be blind to racial group
disparities and thereby reify racialized social inequality by no longer
measuring systemic inequality, or be conscious of racial categories in a way
that itself reifies race. We propose a third option. By preceding group
fairness interventions with unsupervised learning to dynamically detect
patterns of segregation, machine learning systems can mitigate the root cause
of social disparities, social segregation and stratification, without further
anchoring status categories of disadvantage
Learning Tasks for Multitask Learning: Heterogenous Patient Populations in the ICU
Machine learning approaches have been effective in predicting adverse
outcomes in different clinical settings. These models are often developed and
evaluated on datasets with heterogeneous patient populations. However, good
predictive performance on the aggregate population does not imply good
performance for specific groups.
In this work, we present a two-step framework to 1) learn relevant patient
subgroups, and 2) predict an outcome for separate patient populations in a
multi-task framework, where each population is a separate task. We demonstrate
how to discover relevant groups in an unsupervised way with a
sequence-to-sequence autoencoder. We show that using these groups in a
multi-task framework leads to better predictive performance of in-hospital
mortality both across groups and overall. We also highlight the need for more
granular evaluation of performance when dealing with heterogeneous populations.Comment: KDD 201
Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure
Recent research has highlighted the vulnerabilities of modern machine learning based systems to bias, especially for segments of society that are under-represented in training data. In this work, we develop a novel, tunable algorithm for mitigating the hidden, and potentially unknown, biases within training data. Our algorithm fuses the original learning task with a
variational autoencoder to learn the latent structure within the dataset and then adaptively uses the learned latent distributions to re-weight the importance of certain data points while training. While our method is generalizable across various data modalities and learning tasks, in this work we use our algorithm to address the issue of racial and gender bias in facial
detection systems. We evaluate our algorithm on the Pilot Parliaments Benchmark (PPB), a dataset specifically designed to evaluate biases in computer vision systems, and demonstrate increased overall performance as well as decreased categorical bias with our debiasing approach
Steps Towards Value-Aligned Systems
Algorithmic (including AI/ML) decision-making artifacts are an established
and growing part of our decision-making ecosystem. They are indispensable tools
for managing the flood of information needed to make effective decisions in a
complex world. The current literature is full of examples of how individual
artifacts violate societal norms and expectations (e.g. violations of fairness,
privacy, or safety norms). Against this backdrop, this discussion highlights an
under-emphasized perspective in the literature on assessing value misalignment
in AI-equipped sociotechnical systems. The research on value misalignment has a
strong focus on the behavior of individual tech artifacts. This discussion
argues for a more structured systems-level approach for assessing
value-alignment in sociotechnical systems. We rely primarily on the research on
fairness to make our arguments more concrete. And we use the opportunity to
highlight how adopting a system perspective improves our ability to explain and
address value misalignments better. Our discussion ends with an exploration of
priority questions that demand attention if we are to assure the value
alignment of whole systems, not just individual artifacts.Comment: Original version appeared in Proceedings of the 2020 AAAI ACM
Conference on AI, Ethics, and Society (AIES '20), February 7-8, 2020, New
York, NY, USA. 5 pages, 2 figures. Corrected some typos in this versio
Dancing to the Partisan Beat: A First Analysis of Political Communication on TikTok
TikTok is a video-sharing social networking service, whose popularity is
increasing rapidly. It was the world's second-most downloaded app in 2019.
Although the platform is known for having users posting videos of themselves
dancing, lip-syncing, or showcasing other talents, user-videos expressing
political views have seen a recent spurt. This study aims to perform a primary
evaluation of political communication on TikTok. We collect a set of US
partisan Republican and Democratic videos to investigate how users communicated
with each other about political issues. With the help of computer vision,
natural language processing, and statistical tools, we illustrate that
political communication on TikTok is much more interactive in comparison to
other social media platforms, with users combining multiple information
channels to spread their messages. We show that political communication takes
place in the form of communication trees since users generate branches of
responses to existing content. In terms of user demographics, we find that
users belonging to both the US parties are young and behave similarly on the
platform. However, Republican users generated more political content and their
videos received more responses; on the other hand, Democratic users engaged
significantly more in cross-partisan discussions.Comment: Accepted as a full paper at the 12th International ACM Web Science
Conference (WebSci 2020). Please cite the WebSci version; Second version
includes corrected typo
Whose Tweets are Surveilled for the Police: An Audit of Social-Media Monitoring Tool via Log Files
Social media monitoring by law enforcement is becoming commonplace, but
little is known about what software packages for it do. Through public records
requests, we obtained log files from the Corvallis (Oregon) Police Department's
use of social media monitoring software called DigitalStakeout. These log files
include the results of proprietary searches by DigitalStakeout that were
running over a period of 13 months and include 7240 social media posts. In this
paper, we focus on the Tweets logged in this data and consider the racial and
ethnic identity (through manual coding) of the users that are therein flagged
by DigitalStakeout. We observe differences in the demographics of the users
whose Tweets are flagged by DigitalStakeout compared to the demographics of the
Twitter users in the region, however, our sample size is too small to determine
significance. Further, the demographics of the Twitter users in the region do
not seem to reflect that of the residents of the region, with an apparent
higher representation of Black and Hispanic people. We also reconstruct the
keywords related to a Narcotics report set up by DigitalStakeout for the
Corvallis Police Department and find that these keywords flag Tweets unrelated
to narcotics or flag Tweets related to marijuana, a drug that is legal for
recreational use in Oregon. Almost all of the keywords have a common meaning
unrelated to narcotics (e.g.\ broken, snow, hop, high) that call into question
the utility that such a keyword based search could have to law enforcement.Comment: 21 Pages, 2 figures. To to be Published in FAT* 2020 Proceeding