19 research outputs found
An Empirical Analysis of Racial Categories in the Algorithmic Fairness Literature
Recent work in algorithmic fairness has highlighted the challenge of defining
racial categories for the purposes of anti-discrimination. These challenges are
not new but have previously fallen to the state, which enacts race through
government statistics, policies, and evidentiary standards in
anti-discrimination law. Drawing on the history of state race-making, we
examine how longstanding questions about the nature of race and discrimination
appear within the algorithmic fairness literature. Through a content analysis
of 60 papers published at FAccT between 2018 and 2020, we analyze how race is
conceptualized and formalized in algorithmic fairness frameworks. We note that
differing notions of race are adopted inconsistently, at times even within a
single analysis. We also explore the institutional influences and values
associated with these choices. While we find that categories used in
algorithmic fairness work often echo legal frameworks, we demonstrate that
values from academic computer science play an equally important role in the
construction of racial categories. Finally, we examine the reasoning behind
different operationalizations of race, finding that few papers explicitly
describe their choices and even fewer justify them. We argue that the
construction of racial categories is a value-laden process with significant
social and political consequences for the project of algorithmic fairness. The
widespread lack of justification around the operationalization of race reflects
institutional norms that allow these political decisions to remain obscured
within the backstage of knowledge production.Comment: 13 pages, 2 figures, FAccT '2
Detecting Friendship Within Dynamic Online Interaction Networks
In many complex social systems, the timing and frequency of interactions
between individuals are observable but friendship ties are hidden. Recovering
these hidden ties, particularly for casual users who are relatively less
active, would enable a wide variety of friendship-aware applications in domains
where labeled data are often unavailable, including online advertising and
national security. Here, we investigate the accuracy of multiple statistical
features, based either purely on temporal interaction patterns or on the
cooperative nature of the interactions, for automatically extracting latent
social ties. Using self-reported friendship and non-friendship labels derived
from an anonymous online survey, we learn highly accurate predictors for
recovering hidden friendships within a massive online data set encompassing 18
billion interactions among 17 million individuals of the popular online game
Halo: Reach. We find that the accuracy of many features improves as more data
accumulates, and cooperative features are generally reliable. However,
periodicities in interaction time series are sufficient to correctly classify
95% of ties, even for casual users. These results clarify the nature of
friendship in online social environments and suggest new opportunities and new
privacy concerns for friendship-aware applications that do not require the
disclosure of private friendship information.Comment: To Appear at the 7th International AAAI Conference on Weblogs and
Social Media (ICWSM '13), 11 pages, 1 table, 6 figure
Recommended from our members
Comparative, Population-Level Analysis of Social Networks in Organizations
As social behavior moves increasingly online, the study of social behavior has followed. Online traces of social systems, whether to study online behavior directly or the online traces of offline activity, have made possible previously unavailable empirical analyses of people, groups and organizations. However, practically observing any social system is nontrivial: even if we can directly instrument and measure the social constructs we wish to study, we will still observe this through the lens of the system itself. We inherit effects due to the design and history of the platform, the ecology of other online systems, the measurement tool and pre-processing of our data, and the assumptions of our models. At the same time, organizations represent a fundamental unit of human social behavior. Then, to understand social behavior, we must understand how the size, boundaries, and context of organizations impact social relationships within them. I focus on this boundary of online systems and offline activity in organizations. We exploit heterogeneities across populations of social networks to explore the boundary of online systems, online social behavior, and offline activity across different organizations. I discuss empirical work exploring how offline behavior is reflected in online systems, and conversely, how an online system relates to offline outcomes. We then turn to the relationship between the measurement of networks from online data and past work on network structure and evolution.
In this dissertation, I develop a comparative structural perspective to tease apart the roles of these exogenous and endogenous processes on network structure. Using populations of comparable networks, I explore the roles of individual social strategies, organizational environments, and network construction on network structure. First, I explore how the unique timing and setting of Facebook\u27s initial expansion to universities afforded a natural experiment, revealing differences in social strategies and network growth, and we explore empirical network scaling in this population of networks. We find that the social strategies employed by students who only interacted online differed from those who had interacted in the offline world. Second, I explore a vaunted tradition of organization theory---relating a firm\u27s informal network structure to firm performance---using a novel email network data set across a population of large firms. In this setting, I explore the previously untested heterogeneity of firms and the relationships between organization size, organization context and social network structure. There, we find a surprising amount of heterogeneity across firm types, and a lack of relationship between network structure and firm performance. We find novel scaling results, including a lack of relationship between the size of a firm and an individual\u27s number of contacts, but find that the formal geographic structure of an organization increases bottlenecks in communication across firms. Finally, reflecting on the challenges of working with social networks drawn from interaction data, I explore the connections between network construction and network evolution. To put these connections in perspective, I visit the theory of weak ties, network stability and network densification using this lens. We find evidence to confirm, reject, and suggest novel hypotheses in this literature. We find, for example, that network densification can appear as an artifact of total activity within the observed system.
The comparative approach is uncontroversial but novel in the empirical study of networks, organization theory, and computational social science. In this context, the comparative approach allows us to compare empirical scaling properties to results from random graph theory. Using networks bounded by organizations and platforms, we can leverage the boundaries of online systems to relate covariates at the platform-, organization-, or network-
The Hidden Governance in AI
Governments are increasingly using artificial intelligence (AI) systems to support policymaking, deliver public services, and manage internal people and processes. AI systems in public-facing services range from predictive machine-learning systems used in fraud and benefit determinations to chatbots used to communicate with the public about their rights and obligations across a range of settings. The integration of AI into agency decision-making processes that affect the public’s rights poses unique challenges for agencies. System design decisions about training data, model design, thresholds, and interface design can set policy—thereby affecting the public’s rights. Yet today many agencies acquire AI systems through a procurement process that lacks opportunities for public input on system design choices that embed policy, limits agencies’ access to information necessary for meaningful assessment, and lacks validation and other processes for rooting out biases that may unfairly, and at times illegally, affect the public. Even where agencies develop AI systems in house, it is unclear given the lack of publicly available documentation whether the policy relevant design choices are identified and subject to rigorous internal scrutiny, and there are only a few examples of such policy relevant design choices being subject to public vetting. AI systems can be opaque, making it difficult to fully understand the logic and processes underlying an output, therefore making it difficult to meet obligations that attach to individual decisions. Furthermore, automation bias and the interfaces and policies that shape agency use of AI tools can turn systems intended as decision support into decision displacement. Some governments have begun to grapple with the use of AI systems in public service delivery, providing guidance to agencies about how to approach the embedded policy choices within AI. Canada, for example, adopted new regulations to ensure agency use of AI in service delivery is compatible with core administrative law principles including transparency, rationality, accountability, and procedural fairness. In April 2021, the European Commission unveiled a proposed Artificial Intelligence Act which is currently wending its way through the complex EU trilogue process. If adopted, the European law will, among other things, set standards and impose an assessment process on AI systems used by governments to allocate public benefits or affect fundamental rights. These efforts are important. Nevertheless, building the capacity of administrative agencies to identify technical choices that are policy—and therefore ought to be subject to the technocratic and democratic requirements of administrative law regardless of whether AI systems are built or bought—requires tools and guidance to assist with assessments of data suitability, model design choices, validation and monitoring techniques, and additional agency expertise. There is a growing set of tools and methods for AI system documentation. Used at appropriate times in the development or procurement of an AI system, these tools can support collaborative interrogation of AI systems by domain experts and system designers. One such method is measurement modeling. Part of routine practice in the quantitative social sciences, measurement modeling is the process of developing a statistical model that links unobservable theoretical constructs (what we would like to model) to data about the world (what we are left with). We have argued elsewhere that measurement modeling provides a useful framework for understanding theoretical constructs such as fairness in computational systems, including AI systems. Here, we explain how measurement modeling, which requires clarifying the theoretical constructs to be measured and their operationalization, can assist agencies to understand the implications of AI systems, design models that reflect domain specific knowledge, and identify discrete design choices that should be subject to public scrutiny. The measurement modeling process makes the assumptions that are baked into models explicit. Too often, the assumptions behind models are not clearly stated, making it difficult to identify how and why systems do not work as intended. But these assumptions describe what is being measured by the system—what the domain-specific understanding of the system is, versus what is actually being implemented. This approach provides a key opportunity for domain experts to inform technical experts about the reasonableness of assumptions—both assumptions about which intended domain specific understanding of a concept should be used, and assumptions about how that concept is being implemented. Careful attention to the operationalization of the selected concept offers an additional opportunity to surface mismatches between technical and domain experts’ assumptions about the meaning of observable attributes used by the model. The specific tools used to test measurement modeling assumptions are reliability and construct validity. Broadly, this entails asking questions such as: What does an assumption mean? Does the assumption make sense? Does it “work” and in the way we expect? An easily overlooked yet crucial aspect of validity is consequential validity, which captures the understanding that defining a measure changes its meaning. This phenomenon includes Goodhart’s Law, which holds that once a measure is a target, it ceases to be a good measure. In other words, does putting forward a measurement change how we understand the system? As Ken Alder has written, “measures are more than a creation of society, they create society.” This means that any evaluation of a measurement model cannot occur in isolation. As with policymaking more broadly, effectiveness must be considered in the context of how a model will then be used. AI systems used to allocate benefits and services assign scores for purposes such as predicting a teacher’s or school’s quality, ranking the best nursing homes for clinical care, and determining eligibility for social support programs. Those assigned scores can be used as inputs into a broader decision-making process, such as to allocate resources or decide which teachers to fire. Consider SAS’s Education Value-Added Assessment System (EVAAS), a standardized tool that claims to measure teacher quality and school district quality. Measurement modeling can help break down what EVAAS is doing—that is, what policies are being enforced, what values are being encoded, and what harms may come to pass as a result. The EVAAS tool operationalizes the construct of “teacher quality” from a range of abstract ideals into a specific idea, a latent force that can be measured from differences in student test scores across years. To ensure that a measurement model is capturing what is intended, the designers of specific EVAAS tools need to consider the validity of the design choices involved. For instance, does the operationalization of teacher quality fully capture the ideal (content validity) or match other agreed upon measures (convergent validity)? Cathy O’Neil described examples where EVAAS scores were misaligned with teachers receiving teaching awards and support from the community. We can further ask: Are the EVAAS teacher scores reliable across years? Again, O’Neil has pointed to examples where a teacher could go from scoring six out of 100 to 96 out of 100 within one year. Teacher scores can further penalize students near the lower thresholds. Under-resourced school districts systematically result in lower teacher quality scores, which are much more likely a reflection of other social phenomena affecting the scores than teachers themselves (discriminant validity). In addition, EVAAS tools literally encourage “teaching to the test”—that is, pedagogy that emphasizes test performance—at the expense of other educational priorities. But even AI tools used for discovery are implicitly assigning scores, which are used to allocate agency attention—yet another decision. Consider a federal government-wide comment analysis tool that surfaces relevant regulatory comments, identifies novel information and suppresses duplicate comments. What are those tools doing? Sorting comments by “relevance”—but that requires finding an implicit ranking, based on some understanding and measurement of what relevance means. A measurement of relevance depends on defining or operationalizing relevance. So any system that sorts by relevance depends on this measurements. And these measurements are used to guide users’ action about what comments should be followed up on, or safely ignored, with what urgency, and so on. All this means that the definition and operationalization of relevance—or any other concept—is governance. Even though one person’s understanding of what is relevant might differ from another person’s, there is now one understanding of relevance embedded in the AI model—out of sight and upstream. Human decisions that once informed policy are now tasks defined through design in upstream processes, possibly by third-party vendors rather than expert agency staff. Previously visible and contestable decisions are now masked, and administrators have given this decision-making away. Unless of course, they have tools that help them retain it. That is where measurement modeling comes in. Although even skilled experts cannot fully understand complex AI systems through code review, measurement modeling provides a way to clarify design goals, concepts to be measured, and their operationalization. Measurement models can facilitate the collaboration between technical and domain experts necessary for AI systems that reflect agency knowledge and policy. The rigor imposed by measurement modeling is essential given that important social and political values that must guide agency action, such as fairness, are often ambiguous and contested and therefore exceedingly complex to operationalize. Moreover, the data that systems train and run on is imbued with historical biases, which makes choices about mappings between concepts and observable facts about the world fraught with possibilities for entrenching undesirable aspects of the past. When the measurement modeling process surfaces the need to formalize concepts that are under-specified in law, it alerts agencies to latent policy choices that must be subject not only to appropriate expert judgment but to the political visibility that is necessary for the legitimate adoption of algorithmic systems. Whether an agency is developing the AI system or procuring it, there are a range of methods for bringing the knowledge of outside experts and the general public into the deliberation about system design. These include notice-and-comment processes, more consultative processes, staged processes of expert review and public feedback, and co-design exercises. Measurement modeling can be used within them all. Issues warranting public participation can include decisions about the specific definition of a concept to be modeled as well as its operationalization. For example, fairness has multiple context-dependent, and sometimes even conflicting, theoretical definitions and each definition is capable of different operationalizations. Existing jurisprudence on the setting of formulas and numerical cutoffs, and the choices underlying methodologies, provides useful guidance for identifying aspects of AI systems that warrant public input. Agency decisions that translate ambiguous concepts such as what is classified as “appropriate” into a fixed number or establish preferences for false negatives or positives are clear candidates. The introduction of AI systems into processes that affect the rights of members of the public demands urgent attention. Agencies need new ways to ensure that policy choices embedded in AI systems are developed through processes that satisfy administrative law’s technocratic demands that policy decisions be the product of reasoned justifications informed by expertise. Agencies also need guidance about how to adhere to transparency, reason giving, and nondiscrimination requirements when individual determinations are informed by AI-driven systems. Agencies also need new experts and new tools to validate and monitor AI systems to protect against poor or even illegal outcomes produced by forces ranging from automation bias, model drift, and strategic human behavior. Without new approaches, the introduction of AI systems will inappropriately deny and award benefits and services to the public, diminish confidence in governments’ ability to use technical tools appropriately, and ultimately undermine the legitimacy of agencies and the market for AI tools more broadly. Measurement modeling offers agencies and the public an opportunity to collectively shape AI tools before they shape society. It can help agencies clarify and justify the assumptions behind models they choose, expose and vet them with the public, and ensure that they are appropriately validated