12 research outputs found
The Role of Conceptions of Value in Data Practices: A Multi-Case Study of Three Small Teams of Ecological Scientists.
This dissertation examines the role of conceptions of data's value in data practices. Based on a study of three small teams of scientists carrying out ecological research at a biological station, my study addresses the following main question: How do scientists conceive of the value of their data, and how do scientists enact conceptions of value in their data practices? I relied on interviews and participant observations for my study and analyzed my data through the lens of theories of value and meaning. I found that scientists were primarily concerned with data's value for their team's own, relatively narrow uses: addressing a gap in knowledge and producing the outputs that would garner them credit and prestige. When asked about their data's potential value beyond their studies, scientists regularly cited metaanalysis, cross-site comparison, and time-based studies as worthy secondary uses for data and assessed data's value according to how well they thought the data could serve those ends.
As they collected data and conducted their studies, scientists did not think about data's value beyond whether or not they were good as resources for addressing a gap in knowledge. However, when asked to make their data more openly available, researchers indicated that their decision to share was based strongly on data's value for producing publications for the team. Data that teams were still working with and planned to publish were regarded as too valuable to the team to make widely available. Conversely, when scientists thought data's publication value had been fully exploited for the team, they saw little threat in sharing. In addition to publication potential, scientists also suggested that study type influenced their decision to share data and told me that they felt less compelled to share data from controlled studies because they assumed such data had inherently limited value.PHDInformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/107162/1/dharmrae_1.pd
A Data-Driven Approach to Appraisal and Selection at a Domain Data Repository
Social scientists are producing an ever-expanding volume of data, leading to questions about appraisal and selection of content given finite resources to process data for reuse. We analyze users’ search activity in an established social science data repository to better understand demand for data and more effectively guide collection development. By applying a data-driven approach, we aim to ensure curation resources are applied to make the most valuable data findable, understandable, accessible, and usable. We analyze data from a domain repository for the social sciences that includes over 500,000 annual searches in 2014 and 2015 to better understand trends in user search behavior. Using a newly created search-to-study ratio technique, we identified gaps in the domain data repository’s holdings and leveraged this analysis to inform our collection and curation practices and policies. The evaluative technique we propose in this paper will serve as a baseline for future studies looking at trends in user demand over time at the domain data repository being studied with broader implications for other data repositories.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/145607/1/document.pd
Only with Your Permission: How Rights Holders Respond (or Don’t Respond) to Requests to Display Archival Materials Online
Archival repositories are increasingly considering mass digitization as a means of meeting user expectations that materials be available online, remotely. Copyright is frequently noted as a significant obstacle to these efforts, but little empirical data exist on the copyright permissions process in archives. This article reports the findings of a study of the copyright permissions process for the Jon Cohen AIDS Research Collection at the University of Michigan. Specifically, the study sought to reveal how much effort is required to seek copyright permissions, what the results of those efforts would be, and whether or not there were traits of documents or copyright holders that were associated with accept or denial status. The study found that significant time is required to contact and negotiate with rights holders and that the biggest obstacle to getting permission is non-response. Of those requests that get a response, the vast majority are to grant permission. While few of the requests were met with denial, the data suggest that commercial copyright holders are much more likely to deny permission than other types of copyright holders. The data also show that adherence to the common policy of only displaying online those documents with explicit permission will likely result in substantially incomplete online collections.John D. Evans FoundationPeer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/77412/1/DAkmonOnlyWithYour-Permission_DAFinal.pdf-
How do properties of data, their curation, and their funding relate to reuse?
Despite large public investments in facilitating the secondary use of data, there is little information about the specific factors that predict data’s reuse. Using data download logs from the Inter-university Consortium for Political and Social Research (ICPSR), this study examines how data properties, curation decisions, and repository funding models relate to data reuse. We find that datasets deposited by institutions, subject to many curatorial tasks, and whose access and preservation is funded externally are used more often. Our findings confirm that investments in data collection, curation, and preservation are associated with more data reuse.National Science Foundation grant 1930645 (LH, AP, DA)
Institute of Museum and Library Services grant LG-37-19-0134-19 (LH, DA)
National Institute of Drug Abuse contract number N01DA-14-5576 (AP)http://deepblue.lib.umich.edu/bitstream/2027.42/168212/5/Hemphill et al Data downloads.pdf4ae71d2a-01c0-4084-84c3-c32ce960e81c5836d8a9-776f-4cd5-ba6e-a0cfd10d555dSEL
Building Tools to Support Active Curation: Lessons Learned from SEAD
SEAD – a project funded by the US National Science Foundation’s DataNet program –
has spent the last five years designing, building, and deploying an integrated set of
services to better connect scientists’ research workflows to data publication and
preservation activities. Throughout the project, SEAD has promoted the concept and
practice of “active curation,” which consists of capturing data and metadata early and
refining it throughout the data life cycle. In promoting active curation, our team saw an
opportunity to develop tools that would help scientists better manage data for their own
use, improve team coordination around data, implement practices that would serve the
data better over time, and seamlessly connect with data repositories to ease the burden
of sharing and publishing.
SEAD has worked with 30 projects, dozens of researchers, and hundreds of thousands
of files, providing us with ample opportunities to learn about data and metadata,
integrating with researchers’ workflows, and building tools and services for data. In this
paper, we discuss the lessons we have learned and suggest how this might guide future
data infrastructure development efforts.National Science Foundation #OCI0940824Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/140714/1/document.pdfDescription of document.pdf : Main Articl
The Application of Archival Concepts to a Data-Intensive Environment: Working with Scientists to Understand Data Management and Preservation Needs
The collection, organization, and long-term preservation of resources are the raison d’être of archives and archivists. The archival community, however, has largely neglected science data, assuming they were outside the bounds of their professional concerns. Scientists, on the other hand, increasingly recognize that they lack the skills and expertise needed to meet the demands being placed on them with regard to data curation and are seeking the help of “data archivists” and “data curators.” This represents a significant opportunity for archivists and archival scholars but one that can only be realized if they better understand the scientific context.National Science Foundation under Grant No. 0724300Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/86738/1/Akmonetal2011.pd
NSF datanet partners update
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/111154/1/bult1720400608.pd
Restricting data’s use: A spectrum of concerns in need of flexible approaches
As researchers consider making their data available to others, they are concerned with the responsible use of data. As a result, they often seek to place restrictions on secondary use. The Research Connections archive at ICPSR makes available the datasets of dozens of studies related to childcare and early education. Of the 103 studies archived to date, 20 have some restrictions on access. While ICPSR’s data access systems were designed primarily to accommodate public use data (i.e. data without disclosure concerns) and potentially disclosive data, our interactions with depositors reveal a more nuanced range of the needs for restricting use. Some data present a relatively low risk of threatening participants’ confidentiality, yet the data producers still want to monitor who is accessing the data and how they plan to use them. Other studies contain data with such a high risk of disclosure that their use must be restricted to a virtual data enclave. Still other studies rest on agreements with participants that require continuing oversight of secondary use by data producers, funders, and participants. This paper describes data producers’ range of needs to restrict data access and discusses how systems can better accommodate these needs.https://deepblue.lib.umich.edu/bitstream/2027.42/151246/1/941-Article Text-299-2-10-20190930 (1).pd
Leveraging Machine Learning to Detect Data Curation Activities
This paper describes a machine learning approach for annotating and analyzing
data curation work logs at ICPSR, a large social sciences data archive. The
systems we studied track curation work and coordinate team decision-making at
ICPSR. Repository staff use these systems to organize, prioritize, and document
curation work done on datasets, making them promising resources for studying
curation work and its impact on data reuse, especially in combination with data
usage analytics. A key challenge, however, is classifying similar activities so
that they can be measured and associated with impact metrics. This paper
contributes: 1) a schema of data curation activities; 2) a computational model
for identifying curation actions in work log descriptions; and 3) an analysis
of frequent data curation activities at ICPSR over time. We first propose a
schema of data curation actions to help us analyze the impact of curation work.
We then use this schema to annotate a set of data curation logs, which contain
records of data transformations and project management decisions completed by
repository staff. Finally, we train a text classifier to detect the frequency
of curation actions in a large set of work logs. Our approach supports the
analysis of curation work documented in work log systems as an important step
toward studying the relationship between research data curation and data reuse.Comment: 10 pages, 4 figures. This work has been submitted to the IEEE for
possible publication. Copyright may be transferred without notice, after
which this version may no longer be accessibl
Measuring and Improving the Efficacy of Curation Activities in Data Archives
It is well known that digital curation is critically important to ensuring the preservation, accessibility, and usability of digital collections. However, we know relatively little about the impact of specific curatorial actions (that is, discrete steps taken to improve a digital object) on the usability or accessibility of digital collections and datasets. The IMLS- and NSF-funded Measuring Impacts of Curatorial Actions (MICA) project is developing curatorial metrics using the Inter-university Consortium for Political and Social Research (ICPSR) data archive to evaluate the impact and efficacy of specific data curation processes. Curatorial metrics are statistical measures similar to bibliometrics but used to assess the impact of curatorial work on the use of collections. Using curation logs and other records, we develop and analyze a range of metrics from the last five years of data curation at the ICPSR. ICPSR is a highly impactful social science data repository, and provides us with a unique site and dataset for this study.
We are additionally conducting interviews with ICPSR stakeholders to better understand how they see the impact of their work, and their data collections. Stakeholders include archive managers, data curators, data reusers, and more. These interviews provide insights into how we should prioritize curatorial actions to achieve impact, and return on our investment in data curation.
In this poster we will present preliminary work identifying curatorial activities from interviews with ICPSR stakeholders and curator work tickets and discuss next steps in our research to develop metrics.Institute of Museum and Library Services (LG-37-19-0134)National Science Foundation (#1930645).Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/163501/1/AkmonetalRDA16Poster.pdf-1Description of AkmonetalRDA16Poster.pdf : PosterSEL