104 research outputs found
The Scientific Drilling Database (SDDB) - Data from Deep Earth Monitoring and Sounding
Projects in the International Scientific Continental Drilling Program (ICDP) produce large amounts of data. Since the start of ICDP, data sharing has played an important part in ICDP projects, and the ICDP Operational Support Group, which provides the infrastructure for data capturing for many ICDP projects, has facilitated dissemination of data within project groups. However, unless published in journalpapers or books the data themselves in most cases were not available outside of the respective projects (see Conze et al. 2007, p. 32 this issue). With the online Scientific Drilling Database (SDDB; http://www.scientificdrilling.org), ICDP and GeoForschungsZentrum Potsdam (GFZ), Germany created a platform for the public dissemination of drilling data
Recommended from our members
An Open Source Web Service For Registering And Managing Environmental Samples
Records of environmental samples, such as minerals, soil, rocks, water, air and plants, are distributed across legacy databases, spreadsheets or other proprietary data systems. Sharing and integration of the sample records across the Web requires globally unique identifiers. These identifiers are essential in order to locate samples unambiguously and to manage their associated metadata and data systematically. The International Geo Sample Number (IGSN) is a persistent, globally unique label for identifying environmental samples. IGSN can be resolved to a digital representation of the sample through the Handle system. IGSN names are registered by end-users through allocating agents, which are the institutions acting on behalf of the IGSN registration agency. As an IGSN allocating agent, our goal is to implement a web service based on existing open source tools to streamline the processes of registering IGSNs and for managing and disseminating sample metadata. In this paper, we present our ongoing work on the design and development of the web service, and its data schema and database model for capturing key aspects of environmental samples. We show how existing controlled vocabularies can be incorporated into the service development to support the metadata registration of different types of samples. The proposed sample registration and curating approach has been trialed in the context of the Capricorn Distal Footprints project on a range of different sample types, varying from water to hard rock samples. The initial results demonstrate the effectiveness of the service while maintaining the flexibility to adapt to various media types, which is critical in the context of a multi-disciplinary project
From Ions to Bits â Managing Data in a National Research Centre
From Ions to Bits.Managing data in active research projects is a challenging task. The innovative nature of research requires a flexible data infrastructure that is able to adapt to ad-hoc changes. How can this be reconciled with the necessity to streamline infrastructure services in order to keep cost at a sustainable level? What must data management services look like to integrate well into the everyday work of a researcher?
In the past the focus of attention has been on large volume research data. However, most research data is small and complex, already highly enriched with contextual information. Managing this âlong tailâ of research data is labour-intensive and requires new strategies and technological solutions to allow sustainable operation.
Eventually, the results of a project are published in the literature and should be accompanied by data publications. The data, now being part of the record of science, has to be citeable and has to be curated for a long period of time. Data publication and long-term preservation call for new services and for cooperation between infrastructure providers (computing centre) and memory institutions (library).
8
This talk will investigate the challenges and solutions for managing research data, taking research at GFZ as an example
Assembly and concept of a web-based GIS within the paleolimnological project CONTINENT (Lake Baikal, Russia)
Web-based Geographical Information Systems (GIS) are excellent tools within interdisciplinary and multi-national geoscience projects to exchange and visualize project data. The web-based GIS presented in this paper was designed for the paleolimnological project 'High-resolution CONTINENTal paleoclimate record in Lake Baikal' (CONTINENT) (Lake Baikal, Siberia, Russia) to allow the interactive handling of spatial data. The GIS database combines project data (core positions, sample positions, thematic maps) with auxiliary spatial data sets that were downloaded from freely available data sources on the world wide web. The reliability of the external data was evaluated and suitable new spatial datasets were processed according to the scientific questions of the project. GIS analysis of the data was used to assist studies on sediment provenance in Lake Baikal, or to help answer questions such as whether the visualization of present-day vegetation distribution and pollen distribution supports the conclusions derived from palynological analyses. The refined geodata are returned back to the scientific community by using online data publication portals. Data were made citeable by assigning persistent identifiers (DOI) and were published through the German National Library for Science and Technology (TIB Hannover, Hannover, Germany).Continen
Langzeitarchivierung von Forschungsdaten : eine Bestandsaufnahme
The relevance of research data today and for the future is well documented and discussed, in Germany as well as internationally. Ensuring that research data are accessible, sharable, and re-usable over time is increasingly becoming an essential task for researchers and research infrastructure institutions. Some reasons for this development include the following:
- research data are documented and could therefore be validated
- research data could be the basis for new research questions
- research data could be re-analyzed by using innovative digital methods
- research data could be used by other disciplines
Therefore, it is essential that research data are curated, which means they are kept accessible and interpretable over time.
In Germany, a baseline study was undertaken analyzing the situation in eleven research disciplines in 2012. The results were then published in a German-language edition. To address an international audience, the German-language edition of the study has been translated and abridged
Updating the Data Curation Continuum
The Data Curation Continuum was developed as a way of thinking about data repository infrastructure. Since its original development over a decade ago, a number of things have changed in the data infrastructure domain. This paper revisits the thinking behind the original data curation continuum and updates it to respond to changes in research objects, storage models, and the repository landscape in general.
 
Recommended from our members
Towards A Web-Enabled Geo-Sample Web: An Open Source Resource Registration and Management System for Connecting Geo-Samples to the Web
Within the earth sciences the curation and sharing of geo-samples is crucial to supporting reproducible research, in addition to extending the use of the samples in new research, and saving costs by avoiding sample loss and duplicating sampling activities. In the Commonwealth Scientific and Industrial Research Organisation (CSIRO), researchers gather various geo-samples as part of their field studies and collaborative projects. The diversity of the samples and their unsystematic management led ambiguous sample numbers, incomplete sample descriptions, and difficulties in finding the samples and their related data. These problems are also found in universities, research institutes and government agencies, which usually curate and manage diverse samples. To address this problem, we developed an open source registration and management system to identify geo-samples unambiguously and to manage their metadata and data systematically. The system supports the linking of samples and sample collections to the real world features from where they were collected, as well as to their data and reports on the Web. This paper describes the implementation of the system including its underlying design considerations, and its applications. The system was built upon the International Geo Sample Number persistent identifier system with Semantic Web technologies. It has been implemented and tested with individual users and three sample repositories in the organization
Versioning data is about more than revisions : A conceptual framework and proposed principles
A dataset, small or big, is often changed to correct errors, apply new algorithms, or add new data (e.g., as part of a time series), etc. In addition, datasets might be bundled into collections, distributed in different encodings or mirrored onto different platforms. All these differences between versions of datasets need to be understood by researchers who want to cite the exact version of the dataset that was used to underpin their research. Failing to do so reduces the reproducibility of research results. Ambiguous identification of datasets also impacts researchers and data centres who are unable to gain recognition and credit for their contributions to the collection, creation, curation and publication of individual datasets. Although the means to identify datasets using persistent identifiers have been in place for more than a decade, systematic data versioning practices are currently not available. In this work, we analysed 39 use cases and current practices of data versioning across 33 organisations. We noticed that the term âversionâ was used in a very general sense, extending beyond the more common understanding of âversionâ to refer primarily to revisions and replacements. Using concepts developed in software versioning and the Functional Requirements for Bibliographic Records (FRBR) as a conceptual framework, we developed six foundational principles for versioning of datasets: Revision, Release, Granularity, Manifestation, Provenance and Citation. These six principles provide a high-level framework for guiding the consistent practice of data versioning and can also serve as guidance for data centres or data providers when setting up their own data revision and version protocols and procedures.Peer reviewe
Distributed Persistent Identifiers System Design
The need to identify both digital and physical objects is ubiquitous in our society. Past and present persistent identifier (PID) systems, of which there is a great variety in terms of technical and social implementation, have evolved with the advent of the Internet, which has allowed for globally unique and globally resolvable identifiers. PID systems have, by in large, catered for identifier uniqueness, integrity, and persistence, regardless of the identifierâs application domain. Trustworthiness of these systems has been measured by the criteria first defined by BĂŒtikofer (2009) and further elaborated by Golodoniuc 'et al'. (2016) and Car 'et al'. (2017). Since many PID systems have been largely conceived and developed by a single organisation they faced challenges for widespread adoption and, most importantly, the ability to survive change of technology. We believe that a cause of PID systems that were once successful fading away is the centralisation of support infrastructure â both organisational and computing and data storage systems. In this paper, we propose a PID system design that implements the pillars of a trustworthy system â ensuring identifiersâ independence of any particular technology or organisation, implementation of core PID system functions, separation from data delivery, and enabling the system to adapt for future change. We propose decentralisation at all levels â persistent identifiers and information objects registration, resolution, and data delivery â using Distributed Hash Tables and traditional peer-to-peer networks with information replication and caching mechanisms, thus eliminating the need for a central PID data store. This will increase overall system fault tolerance thus ensuring its trustworthiness. We also discuss important aspects of the distributed systemâs governance, such as the notion of the authoritative source and data integrit
Making Research Data Repositories Visible: The re3data.org Registry
Researchers require infrastructures that ensure a maximum of accessibility, stability and reliability to facilitate working with and sharing of research data. Such infrastructures are being increasingly summarized under the term Research Data Repositories (RDR). The project re3data.org-Registry of Research Data Repositories-has begun to index research data repositories in 2012 and offers researchers, funding organizations, libraries and publishers an overview of the heterogeneous research data repository landscape. In July 2013 re3data.org lists 400 research data repositories and counting. 288 of these are described in detail using the re3data.org vocabulary. Information icons help researchers to easily identify an adequate repository for the storage and reuse of their data. This article describes the heterogeneous RDR landscape and presents a typology of institutional, disciplinary, multidisciplinary and project-specific RDR. Further the article outlines the features of re3data.org, and shows how this registry helps to identify appropriate repositories for storage and search of research data
- âŠ