7 research outputs found
Classifying humans: the indirect reverse operativity of machine vision
Classifying is human. Classifying is also what machine vision technologies do. This article analyses the cybernetic loop between human and machine classification by examining artworks that depict instances of bias when machine vision is classifying humans and when humans classify visual datasets for machines. I propose the term âindirect reverse operativityâ â a concept built upon Ingrid Hoelzlâs and Remi Marieâs notion of âreverse operativityâ â to describe how classifying humans and machine classifiers operate in cybernetic information loops. Indirect reverse operativity is illustrated through two projects I have co-created: the Database of Machine Vision in Art, Games and Narrative and the artwork Suspicious Behavior. Through âartistic auditsâ of selected artworks, a data analysis of how classification is represented in 500 creative works, and a reflection on my own artistic research in the Suspicious Behavior project, this article confronts and complicates assumptions of when and how bias is introduced into and propagates through machine vision classifiers. By examining cultural conceptions of machine vision bias which exemplify how humans operate machines and how machines operate humans through images, this article contributes fresh perspectives to the emerging field of critical dataset studies.publishedVersio
Intersectional Identities and Machine Learning: Illuminating Language Biases in Twitter Algorithms
Intersectional analysis of social media data is rare. Social media data is ripe for identity and intersectionality analysis with wide accessibility and easy to parse text data yet provides a host of its own methodological challenges regarding the identification of identities. We aggregate Twitter data that was annotated by crowdsourcing for tags of âabusive,â âhateful,â or âspamâ language. Using natural language prediction models, we predict the tweeterâs race and gender and investigate whether these tags for abuse, hate, and spam have a meaningful relationship with the gendered and racialized language predictions. Are certain gender and race groups more likely to be predicted if a tweet is labeled as abusive, hateful, or spam? The findings suggest that certain racial and intersectional groups are more likely to be associated with non-normal language identification. Language consistent with white identity is most likely to be considered within the norm and non-white racial groups are more often linked to hateful, abusive, or spam language
IMPACT OF DATA COLLECTION ON ML MODELS: ANALYZING DIFFERENCES OF BIASES BETWEEN LOW- VS. HIGH-SKILLED ANNOTATORS
Labeled data is crucial for the success of machine learning-based artificial intelligence. However, companies often face a choice between collecting few annotations from high- or low-skilled annotators, possibly exhibiting different biases. This study investigates differences in biases between datasets labeled by said annotator groups and their impact on machine learning models. Therefore, we created high- and low-skilled annotated datasets measured the contained biases through entropy and trained different machine learning models to examine bias inheritance effects. Our findings on text sentiment annotations show both groups exhibit a considerable amount of bias in their annotations, although there is a significant difference regarding the error types commonly encountered. Models trained on biased annotations produce significantly different predictions, indicating bias propagation and tend to make more extreme errors than humans. As partial mitigation, we propose and show the efficiency of a hybrid approach where data is labeled by low-skilled and high-skilled workers
Recommended from our members
Envisioning Identity: The Social Production of Computer Vision
Computer vision technologies have been increasingly scrutinized in recent years for their propensity to cause harm. Computer vision systems designed to interpret visual data about humans for various tasks are perceived as particularly high risk. Broadly, the harms of computer vision focus on demographic biases (favoring one group over another) and categorical injustices (through erasure, stereotyping, or problematic labels). Prior work has focused on both uncovering these harms and mitigating them, through, for example, better dataset collection practices and guidelines for more contextual data labeling. This research has largely focused on understanding discrete computer vision artifacts, such as datasets or model outputs, and their implications for specific identity groups or for privacy. There is opportunity to further understand how human identity is embedded into computer vision not only across these artifacts, but also across the network of human workers who shape computer vision systems.
This dissertation focuses on understanding how human identity is conceptualized across two different “layers” of computer vision: (1) at the artifact layer, where the classification ontology is deployed, in the form of datasets and model inputs and outputs; and (2) at the development layer, where social decisions are made about how to implement models and annotations by traditional tech workers. Specifically, I examine how identity is represented in artifacts and how those representations are derived from human workers. I demonstrate how human workers rely on their own subjective positionalities—the worldviews they hold as a result of their own identities and experiences.
I present six studies that identify the subjectivity of computer vision. Three studies focus on artifacts, both model outputs and datasets, to discuss how identity is currently implemented and how that implementation is embedded with specific disciplinary values that often clash with more sociocultural lenses on identity. The fourth and fifth studies focus on how human workers shape these artifacts. Through interviews with both traditional tech workers (like engineers and data scientists) and contingent data workers (who apply requirements given to them by traditional tech workers), I uncover how the positionality of human actors shapes identity in computer vision. Finally, in the sixth study, I examine how power operates between these two types of workers, traditional tech workers and data workers. Identity, as a concept, is treated as an infrastructure for which to build products. Workers attempt to uncover some underlying truth about identity and capture it in technical systems. However, in reality, workers reference the nebulous and intangible concept of identity to implement their own positional perspectives. I demonstrate that traditional tech workers have a positional power in the development of identity in computer vision; traditional worker positionalities are viewed as expert perspectives to be solidified into artifacts. Meanwhile, data worker positionalities are viewed as risks to the quality and trustworthiness of those artifacts. Thus, traditional tech workers attempt to control data worker positionalities, instilling in data workers their own positional perspectives.
By synthesizing insights from these six studies, this dissertation contributes a theory on identity in developing technical artifacts. I argue that identity concepts in the process of computer vision development move from open—filled with nuance, complexity, history, and opportunity—to closed—narrowly defined and embedded into artifacts that are deployed to reify a specific worldview of identity. I describe how workers pull from the intangible meta-concept of “Identity” to shape, through the process of development, specific Attributes to embed into technologies. I show how workers transform these Attributes through the development process into narrower and narrower definitions. These definitions of identity thus become Technical Attributes, highly specific implementations of identity which are no longer malleable to different perspectives.</p