12 research outputs found
Broadening the Scope of Nanopublications
In this paper, we present an approach for extending the existing concept of
nanopublications --- tiny entities of scientific results in RDF representation
--- to broaden their application range. The proposed extension uses English
sentences to represent informal and underspecified scientific claims. These
sentences follow a syntactic and semantic scheme that we call AIDA (Atomic,
Independent, Declarative, Absolute), which provides a uniform and succinct
representation of scientific assertions. Such AIDA nanopublications are
compatible with the existing nanopublication concept and enjoy most of its
advantages such as information sharing, interlinking of scientific findings,
and detailed attribution, while being more flexible and applicable to a much
wider range of scientific results. We show that users are able to create AIDA
sentences for given scientific results quickly and at high quality, and that it
is feasible to automatically extract and interlink AIDA nanopublications from
existing unstructured data sources. To demonstrate our approach, a web-based
interface is introduced, which also exemplifies the use of nanopublications for
non-scientific content, including meta-nanopublications that describe other
nanopublications.Comment: To appear in the Proceedings of the 10th Extended Semantic Web
Conference (ESWC 2013
Decentralized provenance-aware publishing with nanopublications
Publication and archival of scientific results is still commonly considered the responsability of classical publishing companies. Classical forms of publishing, however, which center around printed narrative articles, no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. In this article, we propose to design scientific data publishing as a web-based bottom-up process, without top-down control of central authorities such as publishing companies. Based on a novel combination of existing concepts and technologies, we present a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data. We show how this approach allows researchers to publish, retrieve, verify, and recombine datasets of nanopublications in a reliable and trustworthy manner, and we argue that this architecture could be used as a low-level data publication layer to serve the Semantic Web in general. Our evaluation of the current network shows that this system is efficient and reliable
Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
Making available and archiving scientific results is for the most part still
considered the task of classical publishing companies, despite the fact that
classical forms of publishing centered around printed narrative articles no
longer seem well-suited in the digital age. In particular, there exist
currently no efficient, reliable, and agreed-upon methods for publishing
scientific datasets, which have become increasingly important for science. Here
we propose to design scientific data publishing as a Web-based bottom-up
process, without top-down control of central authorities such as publishing
companies. Based on a novel combination of existing concepts and technologies,
we present a server network to decentrally store and archive data in the form
of nanopublications, an RDF-based format to represent scientific data. We show
how this approach allows researchers to publish, retrieve, verify, and
recombine datasets of nanopublications in a reliable and trustworthy manner,
and we argue that this architecture could be used for the Semantic Web in
general. Evaluation of the current small network shows that this system is
efficient and reliable.Comment: In Proceedings of the 14th International Semantic Web Conference
(ISWC) 201
Using nanopublications as a distributed ledger of digital truth
With the increase in volume of research publications, it is very difficult for researchers to keep abreast of all work in their area. Additionally, the claims in
classical publications are not machine-readable making it challenging to retrieve,
integrate, and link prior work. Several semantic publishing approaches have been
proposed to address these challenges, including Research Object, Executable Paper,
Micropublications, and Nanopublications.
Nanopublications are a granular way of publishing research-based claims, their
associated provenance, and publication information (metadata of the nanopublication) in a machine-readable form. To date, over 10 million nanopublications have
been published, covering a wide range of topics, predominantly in the life sciences.
Nanopublications are immutable, decentralised/distributed, uniformly structured,
granular level, and authentic. These features of nanopublications allow them to
be used as a Distributed Ledger of Digital Truth. Such a ledger enables detecting
conflicting claims and generating the timeline of discussion on a particular topic.
However, the inability to identify all nanopublications related to a given topic prevent existing nanopublications forming a ledger.
In this dissertation, we make the following contributions: (i) Identify quality
issues regarding misuse of authorship properties and linkrot which impact on the
quality of the digital ledger. We argue that the Nanopub community needs to be
developed a set of guidelines for publishing nanopublications. (ii) Provide a framework for generating a timeline of discourse over a collection of nanopublications by
retrieving and combining nanopublications on a particular topic to provide interoperability between them. (iii) Detect contradictory claims between nanopublications
automatically highlighting the conflicts and provide explanations based on the provenance information in the nanopublications. Through these contributions, we show
that nanopublications can form a distributed ledger of digital truth, providing key
benefits such as citability, timelines of discourse, and conflict detection, to users of
the ledger
Serviços de integração de dados para aplicações biomédicas
Doutoramento em Informática (MAP-i)In the last decades, the field of biomedical science has fostered
unprecedented scientific advances. Research is stimulated by the
constant evolution of information technology, delivering novel and
diverse bioinformatics tools. Nevertheless, the proliferation of new and
disconnected solutions has resulted in massive amounts of resources
spread over heterogeneous and distributed platforms. Distinct
data types and formats are generated and stored in miscellaneous
repositories posing data interoperability challenges and delays in
discoveries. Data sharing and integrated access to these resources
are key features for successful knowledge extraction.
In this context, this thesis makes contributions towards accelerating
the semantic integration, linkage and reuse of biomedical resources.
The first contribution addresses the connection of distributed and
heterogeneous registries. The proposed methodology creates a
holistic view over the different registries, supporting semantic
data representation, integrated access and querying. The second
contribution addresses the integration of heterogeneous information
across scientific research, aiming to enable adequate data-sharing
services. The third contribution presents a modular architecture to
support the extraction and integration of textual information, enabling
the full exploitation of curated data. The last contribution lies
in providing a platform to accelerate the deployment of enhanced
semantic information systems. All the proposed solutions were
deployed and validated in the scope of rare diseases.Nas últimas décadas, o campo das ciências biomédicas proporcionou
grandes avanços científicos estimulados pela constante evolução das
tecnologias de informação. A criação de diversas ferramentas na
área da bioinformática e a falta de integração entre novas soluções
resultou em enormes quantidades de dados distribuídos por diferentes
plataformas. Dados de diferentes tipos e formatos são gerados
e armazenados em vários repositórios, o que origina problemas de
interoperabilidade e atrasa a investigação. A partilha de informação
e o acesso integrado a esses recursos são características fundamentais
para a extração bem sucedida do conhecimento científico.
Nesta medida, esta tese fornece contribuições para acelerar a
integração, ligação e reutilização semântica de dados biomédicos. A
primeira contribuição aborda a interconexão de registos distribuídos e
heterogéneos. A metodologia proposta cria uma visão holística sobre
os diferentes registos, suportando a representação semântica de dados
e o acesso integrado. A segunda contribuição aborda a integração
de diversos dados para investigações científicas, com o objetivo de
suportar serviços interoperáveis para a partilha de informação. O
terceiro contributo apresenta uma arquitetura modular que apoia a
extração e integração de informações textuais, permitindo a exploração
destes dados. A última contribuição consiste numa plataforma web
para acelerar a criação de sistemas de informação semânticos. Todas
as soluções propostas foram validadas no âmbito das doenças raras
Theory and Practice of Data Citation
Citations are the cornerstone of knowledge propagation and the primary means
of assessing the quality of research, as well as directing investments in
science. Science is increasingly becoming "data-intensive", where large volumes
of data are collected and analyzed to discover complex patterns through
simulations and experiments, and most scientific reference works have been
replaced by online curated datasets. Yet, given a dataset, there is no
quantitative, consistent and established way of knowing how it has been used
over time, who contributed to its curation, what results have been yielded or
what value it has.
The development of a theory and practice of data citation is fundamental for
considering data as first-class research objects with the same relevance and
centrality of traditional scientific products. Many works in recent years have
discussed data citation from different viewpoints: illustrating why data
citation is needed, defining the principles and outlining recommendations for
data citation systems, and providing computational methods for addressing
specific issues of data citation.
The current panorama is many-faceted and an overall view that brings together
diverse aspects of this topic is still missing. Therefore, this paper aims to
describe the lay of the land for data citation, both from the theoretical (the
why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association
for Information Science and Technology (JASIST), 201
Knowledge Graph Building Blocks: An easy-to-use Framework for developing FAIREr Knowledge Graphs
Knowledge graphs and ontologies provide promising technical solutions for
implementing the FAIR Principles for Findable, Accessible, Interoperable, and
Reusable data and metadata. However, they also come with their own challenges.
Nine such challenges are discussed and associated with the criterion of
cognitive interoperability and specific FAIREr principles (FAIR + Explorability
raised) that they fail to meet. We introduce an easy-to-use, open source
knowledge graph framework that is based on knowledge graph building blocks
(KGBBs). KGBBs are small information modules for knowledge-processing, each
based on a specific type of semantic unit. By interrelating several KGBBs, one
can specify a KGBB-driven FAIREr knowledge graph. Besides implementing semantic
units, the KGBB Framework clearly distinguishes and decouples an internal
in-memory data model from data storage, data display, and data access/export
models. We argue that this decoupling is essential for solving many problems of
knowledge management systems. We discuss the architecture of the KGBB Framework
as we envision it, comprising (i) an openly accessible KGBB-Repository for
different types of KGBBs, (ii) a KGBB-Engine for managing and operating FAIREr
knowledge graphs (including automatic provenance tracking, editing changelog,
and versioning of semantic units); (iii) a repository for KGBB-Functions; (iv)
a low-code KGBB-Editor with which domain experts can create new KGBBs and
specify their own FAIREr knowledge graph without having to think about semantic
modelling. We conclude with discussing the nine challenges and how the KGBB
Framework provides solutions for the issues they raise. While most of what we
discuss here is entirely conceptual, we can point to two prototypes that
demonstrate the principle feasibility of using semantic units and KGBBs to
manage and structure knowledge graphs