8 research outputs found

    Decentralized provenance-aware publishing with nanopublications

    Get PDF
    Publication and archival of scientific results is still commonly considered the responsability of classical publishing companies. Classical forms of publishing, however, which center around printed narrative articles, no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. In this article, we propose to design scientific data publishing as a web-based bottom-up process, without top-down control of central authorities such as publishing companies. Based on a novel combination of existing concepts and technologies, we present a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data. We show how this approach allows researchers to publish, retrieve, verify, and recombine datasets of nanopublications in a reliable and trustworthy manner, and we argue that this architecture could be used as a low-level data publication layer to serve the Semantic Web in general. Our evaluation of the current network shows that this system is efficient and reliable

    Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

    Get PDF
    Making available and archiving scientific results is for the most part still considered the task of classical publishing companies, despite the fact that classical forms of publishing centered around printed narrative articles no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. Here we propose to design scientific data publishing as a Web-based bottom-up process, without top-down control of central authorities such as publishing companies. Based on a novel combination of existing concepts and technologies, we present a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data. We show how this approach allows researchers to publish, retrieve, verify, and recombine datasets of nanopublications in a reliable and trustworthy manner, and we argue that this architecture could be used for the Semantic Web in general. Evaluation of the current small network shows that this system is efficient and reliable.Comment: In Proceedings of the 14th International Semantic Web Conference (ISWC) 201

    Using nanopublications as a distributed ledger of digital truth

    Get PDF
    With the increase in volume of research publications, it is very difficult for researchers to keep abreast of all work in their area. Additionally, the claims in classical publications are not machine-readable making it challenging to retrieve, integrate, and link prior work. Several semantic publishing approaches have been proposed to address these challenges, including Research Object, Executable Paper, Micropublications, and Nanopublications. Nanopublications are a granular way of publishing research-based claims, their associated provenance, and publication information (metadata of the nanopublication) in a machine-readable form. To date, over 10 million nanopublications have been published, covering a wide range of topics, predominantly in the life sciences. Nanopublications are immutable, decentralised/distributed, uniformly structured, granular level, and authentic. These features of nanopublications allow them to be used as a Distributed Ledger of Digital Truth. Such a ledger enables detecting conflicting claims and generating the timeline of discussion on a particular topic. However, the inability to identify all nanopublications related to a given topic prevent existing nanopublications forming a ledger. In this dissertation, we make the following contributions: (i) Identify quality issues regarding misuse of authorship properties and linkrot which impact on the quality of the digital ledger. We argue that the Nanopub community needs to be developed a set of guidelines for publishing nanopublications. (ii) Provide a framework for generating a timeline of discourse over a collection of nanopublications by retrieving and combining nanopublications on a particular topic to provide interoperability between them. (iii) Detect contradictory claims between nanopublications automatically highlighting the conflicts and provide explanations based on the provenance information in the nanopublications. Through these contributions, we show that nanopublications can form a distributed ledger of digital truth, providing key benefits such as citability, timelines of discourse, and conflict detection, to users of the ledger

    Provenance, propagation and quality of biological annotation

    Get PDF
    PhD ThesisBiological databases have become an integral part of the life sciences, being used to store, organise and share ever-increasing quantities and types of data. Biological databases are typically centred around raw data, with individual entries being assigned to a single piece of biological data, such as a DNA sequence. Although essential, a reader can obtain little information from the raw data alone. Therefore, many databases aim to supplement their entries with annotation, allowing the current knowledge about the underlying data to be conveyed to a reader. Although annotations come in many di erent forms, most databases provide some form of free text annotation. Given that annotations can form the foundations of future work, it is important that a user is able to evaluate the quality and correctness of an annotation. However, this is rarely straightforward. The amount of annotation, and the way in which it is curated, varies between databases. For example, the production of an annotation in some databases is entirely automated, without any manual intervention. Further, sections of annotations may be reused, being propagated between entries and, potentially, external databases. This provenance and curation information is not always apparent to a user. The work described within this thesis explores issues relating to biological annotation quality. While the most valuable annotation is often contained within free text, its lack of structure makes it hard to assess. Initially, this work describes a generic approach that allows textual annotations to be quantitatively measured. This approach is based upon the application of Zipf's Law to words within textual annotation, resulting in a single value, . The relationship between the value and Zipf's principle of least e ort provides an indication as to the annotations quality, whilst also allowing annotations to be quantitatively compared. Secondly, the thesis focuses on determining annotation provenance and tracking any subsequent propagation. This is achieved through the development of a visualisation - i - framework, which exploits the reuse of sentences within annotations. Utilising this framework a number of propagation patterns were identi ed, which on analysis appear to indicate low quality and erroneous annotation. Together, these approaches increase our understanding in the textual characteristics of biological annotation, and suggests that this understanding can be used to increase the overall quality of these resources

    Converting neXtProt into Linked Data and nanopublications

    No full text
    The development of Linked Data provides the opportunity for databases to supply extensive volumes of biological data, information, and knowledge in a machine interpretable format to make previously isolated data silos interoperable. To increase ease of use, often databases incorporate annotations from several different resources. Linked Data can overcome many formatting and identifier issues that prevent data interoperability, but the extensive cross incorporation of annotations between databases makes the tracking of provenance in open, decentralized systems especially important. With the diversity of published data, provenance information becomes critical to providing reliable and trustworthy services to scientists. The nanopublication system addresses many of these challenges. We have developed the neXtProt Linked Data by serializing in RDF/XML annotations specific to neXtProt and started employing the nanopublication model to give appropriate attribution to all data. Specifically, a use case demonstrates the handling of post-translational modification (PTM) data modeled as nanopublications to illustrate the how the different levels of provenance and data quality thresholds can be captured in this model
    corecore