Search CORE

12 research outputs found

Making Digital Artifacts on the Web Verifiable and Reliable

Author: Dumontier Michel
Kuhn Tobias
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

The current Web has no general mechanisms to make digital artifacts --- such as datasets, code, texts, and images --- verifiable and permanent. For digital artifacts that are supposed to be immutable, there is moreover no commonly accepted method to enforce this immutability. These shortcomings have a serious negative impact on the ability to reproduce the results of processes that rely on Web resources, which in turn heavily impacts areas such as science where reproducibility is important. To solve this problem, we propose trusty URIs containing cryptographic hash values. We show how trusty URIs can be used for the verification of digital artifacts, in a manner that is independent of the serialization format in the case of structured data files such as nanopublications. We demonstrate how the contents of these files become immutable, including dependencies to external digital artifacts and thereby extending the range of verifiability to the entire reference tree. Our approach sticks to the core principles of the Web, namely openness and decentralized architecture, and is fully compatible with existing standards and protocols. Evaluation of our reference implementations shows that these design goals are indeed accomplished by our approach, and that it remains practical even for very large files.Comment: Extended version of conference paper: arXiv:1401.577

arXiv.org e-Print Archive

Maastricht University Research Portal

VU Research Portal

Crossref

nanopub-java: A Java Library for Nanopublications

Author: Kuhn Tobias
Publication venue
Publication date: 20/08/2015
Field of study

The concept of nanopublications was first proposed about six years ago, but it lacked openly available implementations. The library presented here is the first one that has become an official implementation of the nanopublication community. Its core features are stable, but it also contains unofficial and experimental extensions: for publishing to a decentralized server network, for defining sets of nanopublications with indexes, for informal assertions, and for digitally signing nanopublications. Most of the features of the library can also be accessed via an online validator interface.Comment: Proceedings of 5th Workshop on Linked Science 201

arXiv.org e-Print Archive

VU Research Portal

Provenance-Centered Dataset of Drug-Drug Interactions

Author: A Callahan
A Gottlieb
B Mons
DS Wishart
J Lazarou
K Haerian
L Zhang
M Dumontier
NP Tatonetti
P Avillach
P Groth
RL Bushardt
S Vilar
SV Iyer
T Kuhn
Publication venue
Publication date: 01/01/2015
Field of study

Over the years several studies have demonstrated the ability to identify potential drug-drug interactions via data mining from the literature (MEDLINE), electronic health records, public databases (Drugbank), etc. While each one of these approaches is properly statistically validated, they do not take into consideration the overlap between them as one of their decision making variables. In this paper we present LInked Drug-Drug Interactions (LIDDI), a public nanopublication-based RDF dataset with trusty URIs that encompasses some of the most cited prediction methods and sources to provide researchers a resource for leveraging the work of others into their prediction methods. As one of the main issues to overcome the usage of external resources is their mappings between drug names and identifiers used, we also provide the set of mappings we curated to be able to compare the multiple sources we aggregate in our dataset.Comment: In Proceedings of the 14th International Semantic Web Conference (ISWC) 201

arXiv.org e-Print Archive

Maastricht University Research Portal

Crossref

VU Research Portal

FigShare

Decentralized provenance-aware publishing with nanopublications

Author: Chichester Christine
Dumontier Michel
Giannakopoulos George
Krauthammer Michael
Kuhn Tobias
Ngonga Ngomo Axel-Cyrille
Queralt-Rosinach Núria
Verborgh Ruben
Viglianti Raffaele
Publication venue: 'PeerJ'
Publication date: 01/01/2016
Field of study

Publication and archival of scientific results is still commonly considered the responsability of classical publishing companies. Classical forms of publishing, however, which center around printed narrative articles, no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. In this article, we propose to design scientific data publishing as a web-based bottom-up process, without top-down control of central authorities such as publishing companies. Based on a novel combination of existing concepts and technologies, we present a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data. We show how this approach allows researchers to publish, retrieve, verify, and recombine datasets of nanopublications in a reliable and trustworthy manner, and we argue that this architecture could be used as a low-level data publication layer to serve the Semantic Web in general. Our evaluation of the current network shows that this system is efficient and reliable

VU Research Portal

Ghent University Academic Bibliography

Directory of Open Access Journals

Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Author: Chichester Christine
Dumontier Michel
Krauthammer Michael
Kuhn Tobias
Publication venue
Publication date: 01/01/2015
Field of study

Making available and archiving scientific results is for the most part still considered the task of classical publishing companies, despite the fact that classical forms of publishing centered around printed narrative articles no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. Here we propose to design scientific data publishing as a Web-based bottom-up process, without top-down control of central authorities such as publishing companies. Based on a novel combination of existing concepts and technologies, we present a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data. We show how this approach allows researchers to publish, retrieve, verify, and recombine datasets of nanopublications in a reliable and trustworthy manner, and we argue that this architecture could be used for the Semantic Web in general. Evaluation of the current small network shows that this system is efficient and reliable.Comment: In Proceedings of the 14th International Semantic Web Conference (ISWC) 201

arXiv.org e-Print Archive

CiteSeerX

Maastricht University Research Portal

VU Research Portal

Table 1: Existing datasets in the nanopublication format, five of which were used for the first part of the evaluation.

Author: Banda
Belhajjame
Berners-Lee
Bradley
Buil-Aranda
Carroll
Chichester
Chichester
Clarke
Cohen
Feigenbaum
Filali
Freedman
Fu
Golden
Gray
Groth
Han
Harris
Hartig
Jacobson
Kuhn
Kuhn
Kuhn
Kuhn
Kuhn
Kuhn
Kuhn
Kuhn
Ladwig
Markman
McCusker
Miller
Mons
NP Index RA7SuQ0e66
NP Index RACy0I4f_w
NP Index RAR5dwELYL
NP Index RAVEKRW0m6
NP Index RAXFlG04YM
NP Index RAXy332hxq
NP Index RAY_lQruua
Paskin
Patrinos
Proell
Queralt-Rosinach
Sequeda
Speicher
Verborgh
Wilkinson
Williams
Publication venue: 'PeerJ'
Publication date
Field of study

Crossref

Quantifiable integrity for Linked Data on the web

Author: Braun Christoph H.-J.
Käfer Tobias
Publication venue: IOS Press
Publication date: 16/01/2024
Field of study

We present an approach to publish Linked Data on the Web with quantifiable integrity using Web technologies, and in which rational agents are incentivised to contribute to the integrity of the link network. To this end, we introduce self-verifying resource representations, that include Linked Data Signatures whose signature value is used as a suffix in the resource’s URI. Links among such representations, typically managed as web documents, contribute therefore to preserving the integrity of the resulting document graphs. To quantify how well a document’s integrity can be relied on, we introduce the notion of trust scores and present an interpretation based on hubs and authorities. In addition, we present how specific agent behaviour may be induced by the choice of trust score regarding their optimisation, e.g., in general but also using a heuristic strategy called Additional Reach Strategy (ARS). We discuss our approach in a three-fold evaluation: First, we evaluate the effect of different graph metrics as trust scores on induced agent behaviour and resulting evolution of the document graph. We show that trust scores based on hubs and authorities induce agent behaviour that contributes to integrity preservation in the document graph. Next, we evaluate different heuristics for agents to optimise trust scores when general optimisation strategies are not applicable. We show that ARS outperforms other potential optimisation strategies. Last, we evaluate the whole approach by examining the resilience of integrity preservation in a document graph when resources are deleted. To this end, we propose a simulation system based on the Watts–Strogatz model for simulating a social network. We show that our approach produces a document graph that can recover from such attacks or failures in the document graph

KITopen

Hashes Are Not Suitable to Verify Fixity of the Public Archived Web

Author: Alam Sawood
Aturban Mohamed
Klein Martin
Nelson Michael L.
Sompel Herbert Van de
Weigle Michele C.
Publication venue: ODU Digital Commons
Publication date: 01/01/2023
Field of study

Web archives, such as the Internet Archive, preserve the web and allow access to prior states of web pages. We implicitly trust their versions of archived pages, but as their role moves from preserving curios of the past to facilitating present day adjudication, we are concerned with verifying the fixity of archived web pages, or mementos, to ensure they have always remained unaltered. A widely used technique in digital preservation to verify the fixity of an archived resource is to periodically compute a cryptographic hash value on a resource and then compare it with a previous hash value. If the hash values generated on the same resource are identical, then the fixity of the resource is verified. We tested this process by conducting a study on 16,627 mementos from 17 public web archives. We replayed and downloaded the mementos 39 times using a headless browser over a period of 442 days and generated a hash for each memento after each download, resulting in 39 hashes per memento. The hash is calculated by including not only the content of the base HTML of a memento but also all embedded resources, such as images and style sheets. We expected to always observe the same hash for a memento regardless of the number of downloads. However, our results indicate that 88.45% of mementos produce more than one unique hash value, and about 16% (or one in six) of those mementos always produce different hash values. We identify and quantify the types of changes that cause the same memento to produce different hashes. These results point to the need for defining an archive-aware hashing function, as conventional hashing functions are not suitable for replayed archived web pages

Old Dominion University

A Framework for Verifying the Fixity of Archived Web Resources

Author: Aturban Mohamed
Publication venue: ODU Digital Commons
Publication date: 01/08/2020
Field of study

The number of public and private web archives has increased, and we implicitly trust content delivered by these archives. Fixity is checked to ensure that an archived resource has remained unaltered (i.e., fixed) since the time it was captured. Currently, end users do not have the ability to easily verify the fixity of content preserved in web archives. For instance, if a web page is archived in 1999 and replayed in 2019, how do we know that it has not been tampered with during those 20 years? In order for the users of web archives to verify that archived web resources have not been altered, they should have access to fixity information associated with these resources. However, most web archives do not allow accessing fixity information and, more importantly, even if fixity information is available, it is provided by the same archive delivering the resource, not by an independent archive or service. In this research, we present a framework for establishing and checking the fixity on the playback of archived resources, or mementos. The framework defines an archive-aware hashing function that consists of several guidelines for generating repeatable fixity information on the playback of mementos. These guidelines are results of our 14-month study for identifying and quantifying changes in replayed mementos over time that affect generating repeatable fixity information. Changes on the playback of mementos may be caused by JavaScript, transient errors, inconsistency in the availability of mementos over time, and archive-specific resources. Changes are also caused by transformations in the content of archived resources applied by web archives to appropriately replay these resources in a user’s browser. The study also shows that only 11.55% of mementos always produce the same fixity information after each replay, while about 16.06% of mementos always produce different fixity information after each replay. The remaining 72.39% of mementos produce multiple unique fixity information. We also find that mementos may disappear when web archives move to different domains or archives. In addition to defining multiple guidelines for generating fixity information, the framework introduces two approaches, Atomic and Block, that can be used to disseminate fixity information to web archives. The main difference between the two approaches is that, in the Atomic approach, the fixity information of each archived web page is stored in a separate file before being disseminated to several on-demand web archives, while in the Block approach, we batch together fixity information of multiple archived pages to a single binary-searchable file before being disseminated to archives. The framework defines the structure of URLs used to publish fixity information on the web and retrieve archived fixity information from web archives. Our framework does not require changes in the current web archiving infrastructure, and it is built based on well-known web archiving standards, such as the Memento protocol. The proposed framework will allow users to generate fixity information on any archived page at any time, preserve the fixity information independently from the archive delivering the archived page, and verify the fixity of the archived page at any time in the future

Old Dominion University