2 research outputs found
Data Stewardship: Environmental Data Curation and a Web-of-Repositories
Scientific researchers today frequently package measurements and associated metadata as digital datasets in anticipation of storage in data repositories. Through the lens of environmental data stewardship, we consider the data repository as an organizational element central to data curation. One aspect of non-commercial repositories, their distance-from-origin of the data, is explored in terms of near and remote categories. Three idealized repository types are distinguished – local, center, and archive - paralleling research, resource, and reference collection categories respectively. Repository type characteristics such as scope, structure, and goals are discussed. Repository similarities in terms of roles, activities and responsibilities are also examined. Data stewardship is related to care of research data and responsible scientific communication supported by an infrastructure that coordinates curation activities; data curation is defined as a set of repeated and repeatable activities focusing on tending data and creating data products within a particular arena. The concept of “sphere-of-context” is introduced as an aid to distinguishing repository types. Conceptualizing a “web-of-repositories” accommodates a variety of repository types and represents an ecologically inclusive approach to data curation
Recommended from our members
Envisioning Identity: The Social Production of Computer Vision
Computer vision technologies have been increasingly scrutinized in recent years for their propensity to cause harm. Computer vision systems designed to interpret visual data about humans for various tasks are perceived as particularly high risk. Broadly, the harms of computer vision focus on demographic biases (favoring one group over another) and categorical injustices (through erasure, stereotyping, or problematic labels). Prior work has focused on both uncovering these harms and mitigating them, through, for example, better dataset collection practices and guidelines for more contextual data labeling. This research has largely focused on understanding discrete computer vision artifacts, such as datasets or model outputs, and their implications for specific identity groups or for privacy. There is opportunity to further understand how human identity is embedded into computer vision not only across these artifacts, but also across the network of human workers who shape computer vision systems.
This dissertation focuses on understanding how human identity is conceptualized across two different “layers” of computer vision: (1) at the artifact layer, where the classification ontology is deployed, in the form of datasets and model inputs and outputs; and (2) at the development layer, where social decisions are made about how to implement models and annotations by traditional tech workers. Specifically, I examine how identity is represented in artifacts and how those representations are derived from human workers. I demonstrate how human workers rely on their own subjective positionalities—the worldviews they hold as a result of their own identities and experiences.
I present six studies that identify the subjectivity of computer vision. Three studies focus on artifacts, both model outputs and datasets, to discuss how identity is currently implemented and how that implementation is embedded with specific disciplinary values that often clash with more sociocultural lenses on identity. The fourth and fifth studies focus on how human workers shape these artifacts. Through interviews with both traditional tech workers (like engineers and data scientists) and contingent data workers (who apply requirements given to them by traditional tech workers), I uncover how the positionality of human actors shapes identity in computer vision. Finally, in the sixth study, I examine how power operates between these two types of workers, traditional tech workers and data workers. Identity, as a concept, is treated as an infrastructure for which to build products. Workers attempt to uncover some underlying truth about identity and capture it in technical systems. However, in reality, workers reference the nebulous and intangible concept of identity to implement their own positional perspectives. I demonstrate that traditional tech workers have a positional power in the development of identity in computer vision; traditional worker positionalities are viewed as expert perspectives to be solidified into artifacts. Meanwhile, data worker positionalities are viewed as risks to the quality and trustworthiness of those artifacts. Thus, traditional tech workers attempt to control data worker positionalities, instilling in data workers their own positional perspectives.
By synthesizing insights from these six studies, this dissertation contributes a theory on identity in developing technical artifacts. I argue that identity concepts in the process of computer vision development move from open—filled with nuance, complexity, history, and opportunity—to closed—narrowly defined and embedded into artifacts that are deployed to reify a specific worldview of identity. I describe how workers pull from the intangible meta-concept of “Identity” to shape, through the process of development, specific Attributes to embed into technologies. I show how workers transform these Attributes through the development process into narrower and narrower definitions. These definitions of identity thus become Technical Attributes, highly specific implementations of identity which are no longer malleable to different perspectives.</p