International Journal of Digital Curation
Not a member yet
533 research outputs found
Sort by
Privacy Impact Assessments for Digital Repositories
Trustworthy data repositories ensure the security of their collections. We argue they should also ensure the privacy of researcher and research subject data. We demonstrate the use of a privacy impact assessment (PIA) to evaluate potential privacy risks to researchers using the ICPSR’s Researcher Passport as a case study. We present our workflow and discuss potential privacy risks and mitigations for those risks.
[A previous version of this article is available as an IDCC2020 Conference Paper] 
The Data Life Aquatic
This paper assesses data consumers’ perspectives on the interoperable and re-usable aspects of the FAIR Data Principles. Taking a domain-specific informatics approach, ten oceanographers were asked to think of a recent search for data and describe their process of discovery, evaluation, and use. The interview schedule, derived from the FAIR Data Principles, included questions about the interoperability and re-usability of data. Through this critical incident technique, findings on data interoperability and re-usability give data curators valuable insights into how real-world users access, evaluate, and use data. Results from this study show that oceanographers utilize tools that make re-use simple, with interoperability seamless within the systems used. The processes employed by oceanographers present a good baseline for other domains adopting the FAIR Data Principles. 
Data Management Planning for an Eight-Institution, Multi-Year Research Project
While data management planning for grant applications has become commonplace alongside articles providing guidance for such plans, examples of data plans as they have been created, implemented, and used for specific projects are only beginning to appear in the scholarly record. This article describes data management planning for an eight-institution, multi-year research project. The project leveraged four data management plans (DMP) in total, one for the funding application and one for each of the three distinct project phases. By understanding researcher roles, development and content of each DMP, team internal and external challenges, and the overall benefits of creating and using the plans, these DMPs provide a demonstration of the utility of this project management tool
Cross-tier Web Programming for Curated Databases: a Case Study
Curated databases have become important sources of information across several scientific disciplines, and as the result of manual work of experts, often become important reference works. Features such as provenance tracking, archiving, and data citation are widely regarded as important features for the curated databases, but implementing such features is challenging, and small database projects often lack the resources to do so.
A scientific database application is not just the relational database itself, but also an ecosystem of web applications to display the data, and applications which allow data curation. Supporting advanced curation features requires changing all of these components, and there is currently no way to provide such capabilities in a reusable way.
Cross-tier programming languages allow developers to write a web application in a single, uniform language. Consequently, database queries and updates can be written in the same language as the rest of the program, and it should be possible to provide curation features via program transformations. As a step towards this goal, it is important to establish that realistic curated databases can be implemented in a cross-tier programming language.
In this article, we describe such a case study: reimplementing the web frontend of a realworld scientific database, the IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb), in the Links cross-tier programming language. We show how programming language features such as language-integrated query simplify the development process, and rule out common errors. Through an automated functional correctness evaluation, we show that the Links implementation correctly implements the functionality of the official version. Through a comparative performance evaluation, we show that the Links implementation performs fewer database queries, while the time neededto handle the queries is comparable to the official Java version. Furthermore, while there is some overhead to using Links because of its comparative immaturity compared to Java, the Links version is usable as a proof-of-concept case study of cross-tier programming for curated databases
Identifying Opportunities for Collective Curation During Archaeological Excavations
Archaeological excavations are comprised of interdisciplinary teams that create, manage, and share data as they unearth and analyse material culture. These team-based settings are ripe for collective curation during these data lifecycle stages. However, findings from four excavation sites show that the data interdisciplinary teams create are not well integrated. Knowing this, we recommended opportunities for collective curation to improve use and reuse of the data within and outside of the team
FAIR Forever? Accountabilities and Responsibilities in the Preservation of Research Data
Digital preservation is a fast-moving and growing community of practice of ubiquitous relevance, but in which capability is unevenly distributed. Within the open science and research data communities, digital preservation has a close alignment to the FAIR principles and is delivered through a complex specialist infrastructure comprising technology, staff and policy. However, capacity erodes quickly, establishing a need for ongoing examination and review to ensure that skills, technology, and policy remain fit for changing purpose. To address this challenge, the Digital Preservation Coalition (DPC) conducted the FAIR Forever study, commissioned by the European Open Science Cloud (EOSC) Sustainability Working Group and funded by the EOSC Secretariat Project in 2020, to assess the current strengths, weaknesses, opportunities and threats to the preservation of research data across EOSC, and the feasibility of establishing shared approaches, workflows and services that would benefit EOSC stakeholders.
This paper draws from the FAIR Forever study to document and explore its key findings on the identified strengths, weaknesses, opportunities, and threats to the preservation of FAIR data in EOSC, and to the preservation of research data more broadly. It begins with background of the study and an overview of the methodology employed, which involved a desk-based assessment of the emerging EOSC vision, interviews with representatives of EOSC stakeholders, and focus groups with digital preservation specialists and data managers in research organizations. It summarizes key findings on the need for clarity on digital preservation in the EOSC vision and for elucidation of roles, responsibilities, and accountabilities to mitigate risks of data loss, reputation, and sustainability. It then outlines the recommendations provided in the final report presented to the EOSC Sustainability Working Group.
To better ensure that research data can be FAIRer for longer, the recommendations of the study are presented with discussion on how they can be extended and applied to various research data stakeholders in and outside of EOSC, and suggest ways to bring together research data curation, management, and preservation communities to better ensure FAIRness now and in the long term
Scaling by Optimising: Modularisation of Data Curation Services in Growing Organisations
After a century of theorising and applying management practices, we are in the middle of entering a new stage in management science: digital management. The management of digital data submerges in traditional functions of management and, at the same time, continues to recreate viable solutions and conceptualisations in its established fields, e.g. research data management. Yet, one can observe bilateral synergies and mutual enrichment of traditional and data management practices in all fields. The paper at hand addresses a case in point, in which new and old management practices amalgamate to meet a steadily, in part characterised by leaps and bounds, increasing demand of data curation services in academic institutions. The idea of modularisation, as known from software engineering, is applied to data curation workflows so that economies of scale and scope can be used. While scaling refers to both management science and data science, optimising is understood in the traditional managerial sense, that is, with respect to the cost function. By means of a situation analysis describing how data curation services were applied from one department to the entire institution and an analysis of the factors of influence, a method of modularisation is outlined that converges to an optimal state of curation workflows
Leveraging Existing Technology: Developing a Trusted Digital Repository for the U.S. Geological Survey
As Federal Government agencies in the United States pivot to increase access to scientific data (Sheehan, 2016), the U.S. Geological Survey (USGS) has made substantial progress (Kriesberg et al., 2017). USGS authors are required to make federally funded data publicly available in an approved data repository (USGS, 2016b). This type of public data product, known as a USGS data release, serves as a method for publishing reviewed and approved data. In this paper, we present major milestones in the approach the USGS took to transition an existing technology platform to a Trusted Digital Repository. We describe both the technical and the non-technical actions that contributed to a successful outcome.We highlight how initial workflows revealed patterns that were later automated, and the ways in which assessments and user feedback influenced design and implementation. The paper concludes with lessons learned, such as the importance of a community of practice, application programming interface (API)-driven technologies, iterative development, and user-centered design. This paper is intended to offer a potential roadmap for organizations pursuing similar goals.
 
Data Curation, Fisheries, and Ecosystem-based Management: the Case Study of the Pecheker Database
The scientific monitoring of the Southern Ocean French fishing industry is based on the use the Pecheker database. Pecheker is dedicated to the digital curation of the data collected on field by scientific observers and which analysis allows the scientists of the Muséum national d’Histoire naturelle institution to provide guidelines and advice for the regulation of the fishing activity, the protection of the fish stocks and the protection of the marine ecosystems. The template of Pecheker has been developed to make the database adapted to the ecosystem-based management concept. Considering the global context of biodiversity erosion, this modern approach of management aims to take account of the environmental background of the fisheries to ensure their sustainable development. Completeness and high quality of the raw data is a key element for an ecosystem-based management database such as Pecheker. Here, we present the development of this database as a case study of fisheries data curation to be shared with the readers. Full code to deploy a database based on the Pecheker template is provided in supplementary materials. Considering the success factors we could identify, we propose a discussion about how the community could build a global fisheries information system based on a network of small databases including interoperability standards
Software Must be Recognised as an Important Output of Scholarly Research
Software now lies at the heart of scholarly research. Here we argue that as well as being important from a methodological perspective, software should, in many instances, be recognised as an output of research, equivalent to an academic paper. The article discusses the different roles that software may play in research and highlights the relationship between software and research sustainability and reproducibility. It describes the challenges associated with the processes of citing and reviewing software, which differ from those used for papers. We conclude that whilst software outputs do not necessarily fit comfortably within the current publication model, there is a great deal of positive work underway that is likely to make an impact in addressing this