28 research outputs found
An Empirical Analysis of Racial Categories in the Algorithmic Fairness Literature
Recent work in algorithmic fairness has highlighted the challenge of defining
racial categories for the purposes of anti-discrimination. These challenges are
not new but have previously fallen to the state, which enacts race through
government statistics, policies, and evidentiary standards in
anti-discrimination law. Drawing on the history of state race-making, we
examine how longstanding questions about the nature of race and discrimination
appear within the algorithmic fairness literature. Through a content analysis
of 60 papers published at FAccT between 2018 and 2020, we analyze how race is
conceptualized and formalized in algorithmic fairness frameworks. We note that
differing notions of race are adopted inconsistently, at times even within a
single analysis. We also explore the institutional influences and values
associated with these choices. While we find that categories used in
algorithmic fairness work often echo legal frameworks, we demonstrate that
values from academic computer science play an equally important role in the
construction of racial categories. Finally, we examine the reasoning behind
different operationalizations of race, finding that few papers explicitly
describe their choices and even fewer justify them. We argue that the
construction of racial categories is a value-laden process with significant
social and political consequences for the project of algorithmic fairness. The
widespread lack of justification around the operationalization of race reflects
institutional norms that allow these political decisions to remain obscured
within the backstage of knowledge production.Comment: 13 pages, 2 figures, FAccT '2
Recommended from our members
Our knowledge of knowledge infrastructures: Lessons learned and future directions
The Knowledge Infrastructures Workshop conducted at UCLA in February 2020, and funded by the Alfred P. Sloan Foundation, revisited the goals and findings of the 2012 workshop held at the University of Michigan. Thirty scholars, from a diverse array of disciplines and backgrounds, charted a course for the next decade of KI research. Such infrastructures are increasingly fragile, and often brittle, in the face of open data and open source, the demise of gatekeepers, and shifting public and private boundaries that redistribute power. Participants identified new methods and new opportunities for studying KI. Among the many scholarly products they proposed are publications, grant proposals, conference sessions, and workshops on the role of libraries in data services, the death and afterlives of KI, misinformation and disinformation in KI, KI in the Anthropocene, “N simplish rules” to grow and sustain KI, university capacities for KI, designing sustainable KI, and inclusion of underrepresented groups in the design of KI. The report, position papers, and other materials will be maintained at the KI workshop site, http://knowledgeinfrastructures.org
Jupyter notebooks as discovery mechanisms for open science: Citation practices in the astronomy community
Citing data and software is a means to give scholarly credit and to
facilitate access to research objects. Citation principles encourage authors to
provide full descriptions of objects, with stable links, in their papers. As
Jupyter notebooks aggregate data, software, and other objects, they may
facilitate or hinder citation, credit, and access to data and software. We
report on a study of references to Jupyter notebooks in astronomy over a 5-year
period (2014-2018). References increased rapidly, but fewer than half of the
references led to Jupyter notebooks that could be located and opened. Jupyter
notebooks appear better suited to supporting the research process than to
providing access to research objects. We recommend that authors cite individual
data and software objects, and that they stabilize any notebooks cited in
publications. Publishers should increase the number of citations allowed in
papers and employ descriptive metadata-rich citation styles that facilitate
credit and discovery
Police Officer-Involved Homicide Database Project
Our project explores un- and under-reported incidents of law enforcement-involved homicides, both justified and unjustified, through an analysis of extant federal and local databases with information pertaining to police officer-involved homicides, combined with mining and analysis of social media data and participatory action research methods to fill gaps in existing government and local databases. The social media information can be used in concert with other publicly available government databases to create a clearer picture of the lived realities of communities encountering police homicides in the United States. We have chosen Los Angeles County as the first community to study.ye
Recommended from our members
From Open Data to Knowledge Production: Biomedical Data Sharing and Unpredictable Data Reuses
Using a US consortium for data sharing as the primary field site, this three-year ethnographic research project examines the socio-technical, epistemic, and ethical challenges of making biomedical research data openly available and reusable. Public policy arguments for releasing scientific data for reuse by others include increasing trust in science and leveraging public investments in research. In most types of scientific research, data release occurs in parallel with associated publications, after peer-review. In the consortium studied for this project, datasets may also be released independently without an associated publication. Such research datasets are conceptualized as “hypothesis free” resources from which novel knowledge can be extracted indefinitely. Among the findings of this project are that biomedical researchers do not download and re-analyze “hypothesis free” research data from open repositories as a regular practice. Data reuse is a complex, delicate, and often time-consuming process. Metadata and ontology schemas appear to be necessary but not sufficient for data reuse processes. For scientists to test new hypotheses on “old” data, they depend on access to peer-reviewed primary analyses, pre-existing trusted relationships with the data creators, and shared research agendas. Data donors (patients, study participants, etc.), on the other hand, retain little control over how open research data are reused. Findings suggest that, in practice, it is impossible to predict – and consequently to regulate – how datasets might be reused once made openly available. Unintended consequences of reusing this consortium’s open data already are emerging, to the concern of some participants
Beyond Privacy: The Emerging Ethics of Data Reuse
The workshop will explore the meaning of “Informed Consent” and its implication for reusing human subject open data from and for biomedical research. Patients and volunteers donate their data in the context of research designs that are vetted and approved by ethics committees. When research data are released in open access – especially observational data - these can be reused to explore a number of new hypotheses. As we know from previous studies, biomedical data can be reused in many unpredictable ways – new research communities are formed around pre-existing data, and the free availability of research data increases innovation, knowledge integration, and reproducibility. At the same time, openness of data could also expose donors to surveillance and discriminatory research practices that not only have ethical implications, but also were never agreed upon by the donors at the moment of data collection. In this workshop, cases of both successful and controversial data reuse practices will be presented and discusses. We further discuss: Can and should we – and if yes how – filter the kinds of hypotheses that can be tested on a human subject dataset? Can and should we – and if yes how – permanently include ethical concerns in the provenance records of human subject datasets? How can we keep data donors actively “informed” about unpredictable reuses of research data
Recommended from our members
Beyond Privacy: The Emerging Ethics of Data Reuse
The workshop will explore the meaning of “Informed Consent” and its implication for reusing human subject open data from and for biomedical research. Patients and volunteers donate their data in the context of research designs that are vetted and approved by ethics committees. When research data are released in open access – especially observational data - these can be reused to explore a number of new hypotheses. As we know from previous studies, biomedical data can be reused in many unpredictable ways – new research communities are formed around pre-existing data, and the free availability of research data increases innovation, knowledge integration, and reproducibility. At the same time, openness of data could also expose donors to surveillance and discriminatory research practices that not only have ethical implications, but also were never agreed upon by the donors at the moment of data collection. In this workshop, cases of both successful and controversial data reuse practices will be presented and discusses. We further discuss: Can and should we – and if yes how – filter the kinds of hypotheses that can be tested on a human subject dataset? Can and should we – and if yes how – permanently include ethical concerns in the provenance records of human subject datasets? How can we keep data donors actively “informed” about unpredictable reuses of research data