1,581 research outputs found
Interoperability and FAIRness through a novel combination of Web technologies
Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs
Exposing Provenance Metadata Using Different RDF Models
A standard model for exposing structured provenance metadata of scientific
assertions on the Semantic Web would increase interoperability,
discoverability, reliability, as well as reproducibility for scientific
discourse and evidence-based knowledge discovery. Several Resource Description
Framework (RDF) models have been proposed to track provenance. However,
provenance metadata may not only be verbose, but also significantly redundant.
Therefore, an appropriate RDF provenance model should be efficient for
publishing, querying, and reasoning over Linked Data. In the present work, we
have collected millions of pairwise relations between chemicals, genes, and
diseases from multiple data sources, and demonstrated the extent of redundancy
of provenance information in the life science domain. We also evaluated the
suitability of several RDF provenance models for this crowdsourced data set,
including the N-ary model, the Singleton Property model, and the
Nanopublication model. We examined query performance against three commonly
used large RDF stores, including Virtuoso, Stardog, and Blazegraph. Our
experiments demonstrate that query performance depends on both RDF store as
well as the RDF provenance model
Share and share alike: Encouraging the reuse of academic resources through the Scottish electronic staff development library
This paper reports on the findings of a consultancy procedure conducted within the Scottish Higher Education staff development community and focusing on the reuse and sharing of communications and information technology resources for teaching and learning. While this consultancy was conducted primarily to inform the development of the Scottish electronic Staff Development Library (SeSDL), its findings, will be of relevance to colleagues working in the fields of staff development and C&IT and all those involved in the creation of shared teaching and learning resources. The consultancy identified general staff development demands, specific pedagogical requirements, and concerns relating to the provision, reuse and sharing of staff development resources. The SeSDL Project will attempt to address these demands through the development of a Webâbased resource centre, which will facilitate the reuse and sharing of highâquality staff development resources. Library materials are stored in the form of granules which are branded with IMS compatible metadata and which are classified using a controlled educational taxonomy. Staff developers will be able to assemble these granular components to build customized lessons tailored to meet the needs of their own departments and institutions
Recommended from our members
FAIR principles and the IEDB: short-term improvements and a long-term vision of OBO-foundry mediated machine-actionable interoperability.
The Immune Epitope Database (IEDB), at www.iedb.org, has the mission to make published experimental data relating to the recognition of immune epitopes easily available to the scientific public. By presenting curated data in a searchable database, we have liberated it from the tables and figures of journal articles, making it more accessible and usable by immunologists. Recently, the principles of Findability, Accessibility, Interoperability and Reusability have been formulated as goals that data repositories should meet to enhance the usefulness of their data holdings. We here examine how the IEDB complies with these principles and identify broad areas of success, but also areas for improvement. We describe short-term improvements to the IEDB that are being implemented now, as well as a long-term vision of true 'machine-actionable interoperability', which we believe will require community agreement on standardization of knowledge representation that can be built on top of the shared use of ontologies
Building a Disciplinary, World-Wide Data Infrastructure
Sharing scientific data, with the objective of making it fully discoverable,
accessible, assessable, intelligible, usable, and interoperable, requires work
at the disciplinary level to define in particular how the data should be
formatted and described. Each discipline has its own organization and history
as a starting point, and this paper explores the way a range of disciplines,
namely materials science, crystallography, astronomy, earth sciences,
humanities and linguistics get organized at the international level to tackle
this question. In each case, the disciplinary culture with respect to data
sharing, science drivers, organization and lessons learnt are briefly
described, as well as the elements of the specific data infrastructure which
are or could be shared with others. Commonalities and differences are assessed.
Common key elements for success are identified: data sharing should be science
driven; defining the disciplinary part of the interdisciplinary standards is
mandatory but challenging; sharing of applications should accompany data
sharing. Incentives such as journal and funding agency requirements are also
similar. For all, it also appears that social aspects are more challenging than
technological ones. Governance is more diverse, and linked to the discipline
organization. CODATA, the RDA and the WDS can facilitate the establishment of
disciplinary interoperability frameworks. Being problem-driven is also a key
factor of success for building bridges to enable interdisciplinary research.Comment: Proceedings of the session "Building a disciplinary, world-wide data
infrastructure" of SciDataCon 2016, held in Denver, CO, USA, 12-14 September
2016, to be published in ICSU CODATA Data Science Journal in 201
Towards an Interoperable Ecosystem of Research Cohort and Real-world Data Catalogues Enabling Multi-center Studies
Objectives : Existing individual-level human data cover large populations on many dimensions such as lifestyle, demography, laboratory measures, clinical parameters, etc. Recent years have seen large investments in data catalogues to FAIRify data descriptions to capitalise on this great promise, i.e. make catalogue contents more Findable, Accessible, Interoperable and Reusable. However, their valuable diversity also created heterogeneity, which poses challenges to optimally exploit their richness. Methods : In this opinion review, we analyse catalogues for human subject research ranging from cohort studies to surveillance, administrative and healthcare records. Results : We observe that while these catalogues are heterogeneous, have various scopes, and use different terminologies, still the underlying concepts seem potentially harmonizable. We propose a unified framework to enable catalogue data sharing, with catalogues of multi-center cohorts nested as a special case in catalogues of real-world data sources. Moreover, we list recommendations to create an integrated community of metadata catalogues and an open catalogue ecosystem to sustain these efforts and maximise impact. Conclusions : We propose to embrace the autonomy of motivated catalogue teams and invest in their collaboration via minimal standardisation efforts such as clear data licensing, persistent identifiers for linking same records between catalogues, minimal metadata âcommon data elementsâ using shared ontologies, symmetric architectures for data sharing (push/pull) with clear provenance tracks to process updates and acknowledge original contributors. And most importantly, we encourage the creation of environments for collaboration and resource sharing between catalogue developers, building on international networks such as OpenAIRE and research data alliance, as well as domain specific ESFRIs such as BBMRI and ELIXIR
Dynamic Data Citation Service-Subset Tool for Operational Data Management
In earth observation and climatological sciences, data and their data services grow on a daily
basis in a large spatial extent due to the high coverage rate of satellite sensors, model calculations, but
also by continuous meteorological in situ observations. In order to reuse such data, especially data
fragments as well as their data services in a collaborative and reproducible manner by citing the origin
source, data analysts, e.g., researchers or impact modelers, need a possibility to identify the exact
version, precise time information, parameter, and names of the dataset used. A manual process would
make the citation of data fragments as a subset of an entire dataset rather complex and imprecise to
obtain. Data in climate research are in most cases multidimensional, structured grid data that can
change partially over time. The citation of such evolving content requires the approach of "dynamic
data citation". The applied approach is based on associating queries with persistent identifiers. These
queries contain the subsetting parameters, e.g., the spatial coordinates of the desired study area or the
time frame with a start and end date, which are automatically included in the metadata of the newly
generated subset and thus represent the information about the data history, the data provenance,
which has to be established in data repository ecosystems. The Research Data Alliance Data Citation
Working Group (RDA Data Citation WG) summarized the scientific status quo as well as the state of
the art from existing citation and data management concepts and developed the scalable dynamic
data citation methodology of evolving data. The Data Centre at the Climate Change Centre Austria
(CCCA) has implemented the given recommendations and offers since 2017 an operational service
on dynamic data citation on climate scenario data. With the consciousness that the objective of this
topic brings a lot of dependencies on bibliographic citation research which is still under discussion,
the CCCA service on Dynamic Data Citation focused on the climate domain specific issues, like
characteristics of data, formats, software environment, and usage behavior. The current effort beyond
spreading made experiences will be the scalability of the implementation, e.g., towards the potential
of an Open Data Cube solution
- âŠ