1,581 research outputs found

    Interoperability and FAIRness through a novel combination of Web technologies

    Get PDF
    Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs

    Exposing Provenance Metadata Using Different RDF Models

    Full text link
    A standard model for exposing structured provenance metadata of scientific assertions on the Semantic Web would increase interoperability, discoverability, reliability, as well as reproducibility for scientific discourse and evidence-based knowledge discovery. Several Resource Description Framework (RDF) models have been proposed to track provenance. However, provenance metadata may not only be verbose, but also significantly redundant. Therefore, an appropriate RDF provenance model should be efficient for publishing, querying, and reasoning over Linked Data. In the present work, we have collected millions of pairwise relations between chemicals, genes, and diseases from multiple data sources, and demonstrated the extent of redundancy of provenance information in the life science domain. We also evaluated the suitability of several RDF provenance models for this crowdsourced data set, including the N-ary model, the Singleton Property model, and the Nanopublication model. We examined query performance against three commonly used large RDF stores, including Virtuoso, Stardog, and Blazegraph. Our experiments demonstrate that query performance depends on both RDF store as well as the RDF provenance model

    Share and share alike: Encouraging the reuse of academic resources through the Scottish electronic staff development library

    Get PDF
    This paper reports on the findings of a consultancy procedure conducted within the Scottish Higher Education staff development community and focusing on the reuse and sharing of communications and information technology resources for teaching and learning. While this consultancy was conducted primarily to inform the development of the Scottish electronic Staff Development Library (SeSDL), its findings, will be of relevance to colleagues working in the fields of staff development and C&IT and all those involved in the creation of shared teaching and learning resources. The consultancy identified general staff development demands, specific pedagogical requirements, and concerns relating to the provision, reuse and sharing of staff development resources. The SeSDL Project will attempt to address these demands through the development of a Web‐based resource centre, which will facilitate the reuse and sharing of high‐quality staff development resources. Library materials are stored in the form of granules which are branded with IMS compatible metadata and which are classified using a controlled educational taxonomy. Staff developers will be able to assemble these granular components to build customized lessons tailored to meet the needs of their own departments and institutions

    Building a Disciplinary, World-Wide Data Infrastructure

    Full text link
    Sharing scientific data, with the objective of making it fully discoverable, accessible, assessable, intelligible, usable, and interoperable, requires work at the disciplinary level to define in particular how the data should be formatted and described. Each discipline has its own organization and history as a starting point, and this paper explores the way a range of disciplines, namely materials science, crystallography, astronomy, earth sciences, humanities and linguistics get organized at the international level to tackle this question. In each case, the disciplinary culture with respect to data sharing, science drivers, organization and lessons learnt are briefly described, as well as the elements of the specific data infrastructure which are or could be shared with others. Commonalities and differences are assessed. Common key elements for success are identified: data sharing should be science driven; defining the disciplinary part of the interdisciplinary standards is mandatory but challenging; sharing of applications should accompany data sharing. Incentives such as journal and funding agency requirements are also similar. For all, it also appears that social aspects are more challenging than technological ones. Governance is more diverse, and linked to the discipline organization. CODATA, the RDA and the WDS can facilitate the establishment of disciplinary interoperability frameworks. Being problem-driven is also a key factor of success for building bridges to enable interdisciplinary research.Comment: Proceedings of the session "Building a disciplinary, world-wide data infrastructure" of SciDataCon 2016, held in Denver, CO, USA, 12-14 September 2016, to be published in ICSU CODATA Data Science Journal in 201

    Towards an Interoperable Ecosystem of Research Cohort and Real-world Data Catalogues Enabling Multi-center Studies

    Get PDF
    Objectives : Existing individual-level human data cover large populations on many dimensions such as lifestyle, demography, laboratory measures, clinical parameters, etc. Recent years have seen large investments in data catalogues to FAIRify data descriptions to capitalise on this great promise, i.e. make catalogue contents more Findable, Accessible, Interoperable and Reusable. However, their valuable diversity also created heterogeneity, which poses challenges to optimally exploit their richness. Methods : In this opinion review, we analyse catalogues for human subject research ranging from cohort studies to surveillance, administrative and healthcare records. Results : We observe that while these catalogues are heterogeneous, have various scopes, and use different terminologies, still the underlying concepts seem potentially harmonizable. We propose a unified framework to enable catalogue data sharing, with catalogues of multi-center cohorts nested as a special case in catalogues of real-world data sources. Moreover, we list recommendations to create an integrated community of metadata catalogues and an open catalogue ecosystem to sustain these efforts and maximise impact. Conclusions : We propose to embrace the autonomy of motivated catalogue teams and invest in their collaboration via minimal standardisation efforts such as clear data licensing, persistent identifiers for linking same records between catalogues, minimal metadata ‘common data elements’ using shared ontologies, symmetric architectures for data sharing (push/pull) with clear provenance tracks to process updates and acknowledge original contributors. And most importantly, we encourage the creation of environments for collaboration and resource sharing between catalogue developers, building on international networks such as OpenAIRE and research data alliance, as well as domain specific ESFRIs such as BBMRI and ELIXIR

    Dynamic Data Citation Service-Subset Tool for Operational Data Management

    Get PDF
    In earth observation and climatological sciences, data and their data services grow on a daily basis in a large spatial extent due to the high coverage rate of satellite sensors, model calculations, but also by continuous meteorological in situ observations. In order to reuse such data, especially data fragments as well as their data services in a collaborative and reproducible manner by citing the origin source, data analysts, e.g., researchers or impact modelers, need a possibility to identify the exact version, precise time information, parameter, and names of the dataset used. A manual process would make the citation of data fragments as a subset of an entire dataset rather complex and imprecise to obtain. Data in climate research are in most cases multidimensional, structured grid data that can change partially over time. The citation of such evolving content requires the approach of "dynamic data citation". The applied approach is based on associating queries with persistent identifiers. These queries contain the subsetting parameters, e.g., the spatial coordinates of the desired study area or the time frame with a start and end date, which are automatically included in the metadata of the newly generated subset and thus represent the information about the data history, the data provenance, which has to be established in data repository ecosystems. The Research Data Alliance Data Citation Working Group (RDA Data Citation WG) summarized the scientific status quo as well as the state of the art from existing citation and data management concepts and developed the scalable dynamic data citation methodology of evolving data. The Data Centre at the Climate Change Centre Austria (CCCA) has implemented the given recommendations and offers since 2017 an operational service on dynamic data citation on climate scenario data. With the consciousness that the objective of this topic brings a lot of dependencies on bibliographic citation research which is still under discussion, the CCCA service on Dynamic Data Citation focused on the climate domain specific issues, like characteristics of data, formats, software environment, and usage behavior. The current effort beyond spreading made experiences will be the scalability of the implementation, e.g., towards the potential of an Open Data Cube solution
    • 

    corecore