41 research outputs found
Decentralized provenance-aware publishing with nanopublications
Publication and archival of scientific results is still commonly considered the responsability of classical publishing companies. Classical forms of publishing, however, which center around printed narrative articles, no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. In this article, we propose to design scientific data publishing as a web-based bottom-up process, without top-down control of central authorities such as publishing companies. Based on a novel combination of existing concepts and technologies, we present a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data. We show how this approach allows researchers to publish, retrieve, verify, and recombine datasets of nanopublications in a reliable and trustworthy manner, and we argue that this architecture could be used as a low-level data publication layer to serve the Semantic Web in general. Our evaluation of the current network shows that this system is efficient and reliable
nanopub-java: A Java Library for Nanopublications
The concept of nanopublications was first proposed about six years ago, but
it lacked openly available implementations. The library presented here is the
first one that has become an official implementation of the nanopublication
community. Its core features are stable, but it also contains unofficial and
experimental extensions: for publishing to a decentralized server network, for
defining sets of nanopublications with indexes, for informal assertions, and
for digitally signing nanopublications. Most of the features of the library can
also be accessed via an online validator interface.Comment: Proceedings of 5th Workshop on Linked Science 201
Making Digital Artifacts on the Web Verifiable and Reliable
The current Web has no general mechanisms to make digital artifacts --- such
as datasets, code, texts, and images --- verifiable and permanent. For digital
artifacts that are supposed to be immutable, there is moreover no commonly
accepted method to enforce this immutability. These shortcomings have a serious
negative impact on the ability to reproduce the results of processes that rely
on Web resources, which in turn heavily impacts areas such as science where
reproducibility is important. To solve this problem, we propose trusty URIs
containing cryptographic hash values. We show how trusty URIs can be used for
the verification of digital artifacts, in a manner that is independent of the
serialization format in the case of structured data files such as
nanopublications. We demonstrate how the contents of these files become
immutable, including dependencies to external digital artifacts and thereby
extending the range of verifiability to the entire reference tree. Our approach
sticks to the core principles of the Web, namely openness and decentralized
architecture, and is fully compatible with existing standards and protocols.
Evaluation of our reference implementations shows that these design goals are
indeed accomplished by our approach, and that it remains practical even for
very large files.Comment: Extended version of conference paper: arXiv:1401.577
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data
To make digital resources on the web verifiable, immutable, and permanent, we
propose a technique to include cryptographic hash values in URIs. We call them
trusty URIs and we show how they can be used for approaches like
nanopublications to make not only specific resources but their entire reference
trees verifiable. Digital artifacts can be identified not only on the byte
level but on more abstract levels such as RDF graphs, which means that
resources keep their hash values even when presented in a different format. Our
approach sticks to the core principles of the web, namely openness and
decentralized architecture, is fully compatible with existing standards and
protocols, and can therefore be used right away. Evaluation of our reference
implementations shows that these desired properties are indeed accomplished by
our approach, and that it remains practical even for very large files.Comment: Small error corrected in the text (table data was correct) on page
13: "All average values are below 0.8s (0.03s for batch mode). Using Java in
batch mode even requires only 1ms per file.
A Unified Nanopublication Model for Effective and User-Friendly Access to the Elements of Scientific Publishing
Scientific publishing is the means by which we communicate and share
scientific knowledge, but this process currently often lacks transparency and
machine-interpretable representations. Scientific articles are published in
long coarse-grained text with complicated structures, and they are optimized
for human readers and not for automated means of organization and access. Peer
reviewing is the main method of quality assessment, but these peer reviews are
nowadays rarely published and their own complicated structure and linking to
the respective articles is not accessible. In order to address these problems
and to better align scientific publishing with the principles of the Web and
Linked Data, we propose here an approach to use nanopublications as a unifying
model to represent in a semantic way the elements of publications, their
assessments, as well as the involved processes, actors, and provenance in
general. To evaluate our approach, we present a dataset of 627 nanopublications
representing an interlinked network of the elements of articles (such as
individual paragraphs) and their reviews (such as individual review comments).
Focusing on the specific scenario of editors performing a meta-review, we
introduce seven competency questions and show how they can be executed as
SPARQL queries. We then present a prototype of a user interface for that
scenario that shows different views on the set of review comments provided for
a given manuscript, and we show in a user study that editors find the interface
useful to answer their competency questions. In summary, we demonstrate that a
unified and semantic publication model based on nanopublications can make
scientific communication more effective and user-friendly
Provenance-Centered Dataset of Drug-Drug Interactions
Over the years several studies have demonstrated the ability to identify
potential drug-drug interactions via data mining from the literature (MEDLINE),
electronic health records, public databases (Drugbank), etc. While each one of
these approaches is properly statistically validated, they do not take into
consideration the overlap between them as one of their decision making
variables. In this paper we present LInked Drug-Drug Interactions (LIDDI), a
public nanopublication-based RDF dataset with trusty URIs that encompasses some
of the most cited prediction methods and sources to provide researchers a
resource for leveraging the work of others into their prediction methods. As
one of the main issues to overcome the usage of external resources is their
mappings between drug names and identifiers used, we also provide the set of
mappings we curated to be able to compare the multiple sources we aggregate in
our dataset.Comment: In Proceedings of the 14th International Semantic Web Conference
(ISWC) 201
Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
Making available and archiving scientific results is for the most part still
considered the task of classical publishing companies, despite the fact that
classical forms of publishing centered around printed narrative articles no
longer seem well-suited in the digital age. In particular, there exist
currently no efficient, reliable, and agreed-upon methods for publishing
scientific datasets, which have become increasingly important for science. Here
we propose to design scientific data publishing as a Web-based bottom-up
process, without top-down control of central authorities such as publishing
companies. Based on a novel combination of existing concepts and technologies,
we present a server network to decentrally store and archive data in the form
of nanopublications, an RDF-based format to represent scientific data. We show
how this approach allows researchers to publish, retrieve, verify, and
recombine datasets of nanopublications in a reliable and trustworthy manner,
and we argue that this architecture could be used for the Semantic Web in
general. Evaluation of the current small network shows that this system is
efficient and reliable.Comment: In Proceedings of the 14th International Semantic Web Conference
(ISWC) 201
Using nanopublications as a distributed ledger of digital truth
With the increase in volume of research publications, it is very difficult for researchers to keep abreast of all work in their area. Additionally, the claims in
classical publications are not machine-readable making it challenging to retrieve,
integrate, and link prior work. Several semantic publishing approaches have been
proposed to address these challenges, including Research Object, Executable Paper,
Micropublications, and Nanopublications.
Nanopublications are a granular way of publishing research-based claims, their
associated provenance, and publication information (metadata of the nanopublication) in a machine-readable form. To date, over 10 million nanopublications have
been published, covering a wide range of topics, predominantly in the life sciences.
Nanopublications are immutable, decentralised/distributed, uniformly structured,
granular level, and authentic. These features of nanopublications allow them to
be used as a Distributed Ledger of Digital Truth. Such a ledger enables detecting
conflicting claims and generating the timeline of discussion on a particular topic.
However, the inability to identify all nanopublications related to a given topic prevent existing nanopublications forming a ledger.
In this dissertation, we make the following contributions: (i) Identify quality
issues regarding misuse of authorship properties and linkrot which impact on the
quality of the digital ledger. We argue that the Nanopub community needs to be
developed a set of guidelines for publishing nanopublications. (ii) Provide a framework for generating a timeline of discourse over a collection of nanopublications by
retrieving and combining nanopublications on a particular topic to provide interoperability between them. (iii) Detect contradictory claims between nanopublications
automatically highlighting the conflicts and provide explanations based on the provenance information in the nanopublications. Through these contributions, we show
that nanopublications can form a distributed ledger of digital truth, providing key
benefits such as citability, timelines of discourse, and conflict detection, to users of
the ledger