Search CORE

13 research outputs found

The Diffusion of Scholarship Across Disciplinary Boundaries through Data Sharing

Author: Bleckley David
Jekielek Susan
Publication venue
Publication date: 26/05/2017
Field of study

An original data collection effort is often conducted by a scientist or group of scientists representing a single discipline. While secondary analysis of that data may occur within the same field, researchers from additional disciplines may also become interested in the data as well, creating a diffusion of the data across disciplinary boundaries. This paper investigates this idea using datasets archived in the Civic Learning, Engagement, and Action Data Sharing project at ICPSR. We compare the disciplines of the original researcher(s) involved in a data collection to the disciplines of researchers who have published findings based on analyses of these same datasets. Our analysis shows how some data become utilized by diverse disciplines over time. The paper also describes the extent to which researchers collaborate across disciplines in producing and analyzing data. Finally, we examine whether characteristics of the data (such as the breadth of the data) lead to greater diffusion across disciplinary boundaries. We conclude by discussing the value of sharing and using archival data across disciplinary boundaries.Spencer Foundation (Grant # 201500037)http://deepblue.lib.umich.edu/bitstream/2027.42/167197/1/The Diffusion of Scholarship Across Disciplinary Boundaries through Data Sharing.pdfDescription of The Diffusion of Scholarship Across Disciplinary Boundaries through Data Sharing.pdf : Slide deckSEL

Deep Blue Documents at the University of Michigan

How and Why do Researchers Reference Data? A Study of Rhetorical Features and Functions of Data References in Academic Articles

Author: Bleckley David
Hemphill Libby
Lafia Sara
Moss Elizabeth
Thomer Andrea
Publication venue
Publication date: 16/02/2023
Field of study

Data reuse is a common practice in the social sciences. While published data play an essential role in the production of social science research, they are not consistently cited, which makes it difficult to assess their full scholarly impact and give credit to the original data producers. Furthermore, it can be challenging to understand researchers' motivations for referencing data. Like references to academic literature, data references perform various rhetorical functions, such as paying homage, signaling disagreement, or drawing comparisons. This paper studies how and why researchers reference social science data in their academic writing. We develop a typology to model relationships between the entities that anchor data references, along with their features (access, actions, locations, styles, types) and functions (critique, describe, illustrate, interact, legitimize). We illustrate the use of the typology by coding multidisciplinary research articles (n=30) referencing social science data archived at the Inter-university Consortium for Political and Social Research (ICPSR). We show how our typology captures researchers' interactions with data and purposes for referencing data. Our typology provides a systematic way to document and analyze researchers' narratives about data use, extending our ability to give credit to data that support research.Comment: 35 pages, 2 appendices, 1 tabl

arXiv.org e-Print Archive

How do properties of data, their curation, and their funding relate to reuse?

Author: Akmon Dharma
Bleckley David
Hemphill Libby
Lafia Sara
Pienta Amy
Publication venue
Publication date: 17/06/2021
Field of study

Despite large public investments in facilitating the secondary use of data, there is little information about the specific factors that predict data’s reuse. Using data download logs from the Inter-university Consortium for Political and Social Research (ICPSR), this study examines how data properties, curation decisions, and repository funding models relate to data reuse. We find that datasets deposited by institutions, subject to many curatorial tasks, and whose access and preservation is funded externally are used more often. Our findings confirm that investments in data collection, curation, and preservation are associated with more data reuse.National Science Foundation grant 1930645 (LH, AP, DA) Institute of Museum and Library Services grant LG-37-19-0134-19 (LH, DA) National Institute of Drug Abuse contract number N01DA-14-5576 (AP)http://deepblue.lib.umich.edu/bitstream/2027.42/168212/5/Hemphill et al Data downloads.pdf4ae71d2a-01c0-4084-84c3-c32ce960e81c5836d8a9-776f-4cd5-ba6e-a0cfd10d555dSEL

PubMed Central

Deep Blue Documents at the University of Michigan

How and Why Do Researchers Reference Data ? A Study of Rhetorical Features and Functions of Data References in Academic Articles

Author: Bleckley David
Hemphill Libby
Lafia Sara
Moss Elizabeth
Thomer Andrea
Publication venue: Informatic and Data Science Journal STMIK Muhammadiyah Banten
Publication date
Field of study

La réutilisation des données est une pratique courante dans les sciences sociales. Il peut être difficile de comprendre les motivations pour référencer les données. Cet article étudie comment et pourquoi les chercheurs font référence aux données scientifiques dans leurs écrits universitaires. Nous illustrons l’utilisation de la typologie en codant la recherche multidisciplinaire d’ articles. La typologie offre un moyen systématique de documenter et d’analyser les récits des chercheurs

Bibliothèque numérique de l'enssib

Supporting the Identification, Monitoring and Preservation of Government Data Resources: Findings from DataLumos Outreach Efforts

Author: Alexander J. Trent
Bleckley David A.
Jekielek Susan M.
Monzon Bianca
Publication venue
Publication date: 30/04/2019
Field of study

This report documents the findings of “Identification, Monitoring, and Preservation of Government Data Resources”, an 18-month project involving outreach to government data producers, users, and intermediaries. Through this project, the Inter-university Consortium for Political and Social Research (ICPSR) sought to identify stakeholders’ most-used government datasets that they perceive to be potentially less accessible in the future, among other goals. Interviews and less formal interactions with data advocates and intermediaries, government data producers, and a variety of data users provided insights into the use of government data and perceptions of these data’s future accessibility. The most important source of data to these stakeholders is the Census Bureau, and several of its products were identified as being critical to stakeholders’ work. Data from other major statistical agencies, non-statistical federal agencies, and state and local data sources were also cited. The federal government data most used by stakeholders—and specifically the data of greatest importance to AECF-funded work—are perceived as accessible for future use. All of the federal datasets that stakeholders perceived to be potentially at risk were assessed and added to the DataLumos archive. A noteworthy finding from these interactions is that data created or collected by KIDS COUNT grantees, National Neighborhood Indicators Partnership (NNIP) participants, and other data intermediaries may not have a long-term data archiving or sharing plan. The analysts at these organizations spend significant effort gathering, aggregating, and analyzing data for their products, but they generally have no mechanism to archive or share these data. Given the investment in this work and the potential value of these data to community organizations, researchers, and even local and regional government agencies, there is a real opportunity for data intermediaries to store and share these data in a secure manner for the long term. Recommendations based on the project’s findings can be grouped into two major categories: advocacy and data sharing. Data users, intermediaries, and funders should continue to advocate that the Census Bureau and other principal statistical agencies provide access to the data products needed to successfully complete their work. Advocacy is also needed at the state and local levels, with the goals of targeting the creation of transparency laws and sunshine clauses, budget line items for data sharing, and infrastructural investments like open data portals and data application programming interfaces (APIs). Beyond traditional advocacy work, sustained and increased collaboration between government data producers and data users, intermediaries, and advocates is needed. As for data sharing, we recommend that data creators and intermediaries like KIDS COUNT grantees and NNIP partners work with data repositories like ICPSR to make their data available to others now and in the future. The archiving of these data would require both the infrastructure of a secure data repository as well as specialized curation and technical assistance related to sharing these types of data. The creation of an archive for data intermediaries’ data would extend the value of intermediaries’ important work, creating new resources for community members, institutions, and researchers.Annie E. Casey Foundationhttps://deepblue.lib.umich.edu/bitstream/2027.42/148837/1/Supporting the Identification, Monitoring and Preservation of Government Data Resources.pdfDescription of Supporting the Identification, Monitoring and Preservation of Government Data Resources.pdf : Repor

Deep Blue Documents at the University of Michigan

Assessing Participatory Development Processes Through Knowledge Building

Author: Bleckley David
Publication venue: ScholarWorks@GVSU
Publication date: 01/01/2008
Field of study

Participatory development is seen by many to be the answer to the issues of ineffectiveness and insustainability which plague externally-imposed international community development. Critics discount this, questioning the inclusivity and sustainability of participatory methods. This paper argues that stakeholders undertaking truly participatory development must balance power to create a discourse surrounding the development effort. The effect of this dialog is knowledge building. It is hypothesized that the overall effectiveness of participatory development efforts can be assessed by evaluating the knowledge building that occurs throughout the efforts. A model, based upon Bessette (2004), is presented as a means of framing such an assessment. The knowledge building associated with four participatory development case studies is analyzed using this framework. The results show that development efforts with increased knowledge building have greater overall success and sustainability

Scholarworks@GVSU

Describing Data Transformation Work in a Changing Data Curation Community

Author: David Bleckley
Libby Hemphill
Sara Lafia
Publication venue: 'Center for Open Science'
Publication date: 20/03/2023
Field of study

OSF Preprints

Digitizing and parsing semi-structured historical administrative documents from the G.I. Bill mortgage guarantee program

Author: Alexander J. Trent
Bleckley David A.
Lafia Sara
Publication venue: Journal of Documentation
Publication date: 13/06/2023
Field of study

Many libraries and archives maintain collections of research documents, such as administrative records, with paper-based formats that limit their access to in-person use. Digitization transforms paper-based collections into more accessible and analyzable formats. As collections are digitized, there is an opportunity to incorporate deep learning techniques, such as Document Image Analysis (DIA), into workflows to increase the usability of information extracted from archival documents. This paper describes our approach using digital scanning, optical character recognition (OCR), and deep learning to create a digital archive of administrative records related to the mortgage guarantee program of the Servicemen’s Readjustment Act of 1944, also known as the G.I. Bill. We used a collection of 25,744 semi-structured paper-based records from the administration of G.I. Bill Mortgages from 1946 to 1954 to develop a digitization and processing workflow. These records include the name and city of the mortgagor, the amount of the mortgage, the location of the Reconstruction Finance Corporation agent, one or more identification numbers, and the name and location of the bank handling the loan. We extracted structured information from these scanned historical records in order to create a tabular data file and link them to other authoritative individual-level data sources. We compared the flexible character accuracy of five OCR methods. We then compared the character error rate of three text extraction approaches (regular expressions, document image analysis, and named entity recognition). We were able to obtain the highest quality structured text output using DIA with the Layout Parser toolkit by post-processing with regular expressions. Through this project, we demonstrate how DIA can improve the digitization of administrative records to automatically produce a structured data resource for researchers and the public. Our workflow is readily transferable to other archival digitization projects. Through the use of digital scanning, OCR, and DIA processes, we created the first digital microdata file of administrative records related to the G.I. Bill mortgage guarantee program available to researchers and the general public. These records offer research insights into the lives of veterans who benefited from loans, the impacts on the communities built by the loans, and the institutions that implemented them.Michigan Institute for Data Science (MIDAS) Propelling Original Data Science (PODS) Granthttp://deepblue.lib.umich.edu/bitstream/2027.42/176363/1/GI Bill digitization technical paper.pdfSEL

Deep Blue Documents at the University of Michigan

Opportunities for Moral Education Researchers to Use Archived Civic Education Data with a Social Justice Emphasis

Author: Bleckley David
Sandoval-Hernandez Andres
Torney-Purta Judith
Publication venue
Publication date: 03/11/2021
Field of study

In the peri-pandemic period many researchers can anticipate challenges in collecting data to study issues of social justice as they relate to moral education. Even after strict COVID restrictions are loosened, many school authorities will be unwilling to allow students to spend class time filling out research surveys because of the need to make up for lost months of schooling. Archived and freely-available datasets from survey studies of civic- and social-justice-related topics can be accessed and analyzed by moral education researchers during this transitional time. Data about social justice issues can be accessed from the CivicLEADS.org data archive at ICPSR (University of Michigan) and from national assessment agencies in Latin America. These data can be used by researchers in many types of analysis and to address a variety of research problems.http://deepblue.lib.umich.edu/bitstream/2027.42/170908/1/Opportunities for Moral Education Researchers - FINAL2.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/170908/2/Opportunities for Moral Education Researchers - FINAL2.pnghttp://deepblue.lib.umich.edu/bitstream/2027.42/170908/3/CivicLEADS Social Justice variables.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/170908/4/Opportunities for Moral Education Researchers - Handout.pdfDescription of Opportunities for Moral Education Researchers - FINAL2.pdf : PDF version of posterDescription of Opportunities for Moral Education Researchers - FINAL2.png : Image version of the posterDescription of CivicLEADS Social Justice variables.pdf : Variable example list handout from poster sessionDescription of Opportunities for Moral Education Researchers - Handout.pdf : Handout from poster sessionSEL

Deep Blue Documents at the University of Michigan

Leveraging Machine Learning to Detect Data Curation Activities

Author: Akmon Dharma
Bleckley David
Hemphill Libby
Lafia Sara
Thomer Andrea
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/04/2021
Field of study

This paper describes a machine learning approach for annotating and analyzing data curation work logs at ICPSR, a large social sciences data archive. The systems we studied track curation work and coordinate team decision-making at ICPSR. Repository staff use these systems to organize, prioritize, and document curation work done on datasets, making them promising resources for studying curation work and its impact on data reuse, especially in combination with data usage analytics. A key challenge, however, is classifying similar activities so that they can be measured and associated with impact metrics. This paper contributes: 1) a schema of data curation activities; 2) a computational model for identifying curation actions in work log descriptions; and 3) an analysis of frequent data curation activities at ICPSR over time. We first propose a schema of data curation actions to help us analyze the impact of curation work. We then use this schema to annotate a set of data curation logs, which contain records of data transformations and project management decisions completed by repository staff. Finally, we train a text classifier to detect the frequency of curation actions in a large set of work logs. Our approach supports the analysis of curation work documented in work log systems as an important step toward studying the relationship between research data curation and data reuse.Comment: 10 pages, 4 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

arXiv.org e-Print Archive