35 research outputs found
Broadening the Scope of Nanopublications
In this paper, we present an approach for extending the existing concept of
nanopublications --- tiny entities of scientific results in RDF representation
--- to broaden their application range. The proposed extension uses English
sentences to represent informal and underspecified scientific claims. These
sentences follow a syntactic and semantic scheme that we call AIDA (Atomic,
Independent, Declarative, Absolute), which provides a uniform and succinct
representation of scientific assertions. Such AIDA nanopublications are
compatible with the existing nanopublication concept and enjoy most of its
advantages such as information sharing, interlinking of scientific findings,
and detailed attribution, while being more flexible and applicable to a much
wider range of scientific results. We show that users are able to create AIDA
sentences for given scientific results quickly and at high quality, and that it
is feasible to automatically extract and interlink AIDA nanopublications from
existing unstructured data sources. To demonstrate our approach, a web-based
interface is introduced, which also exemplifies the use of nanopublications for
non-scientific content, including meta-nanopublications that describe other
nanopublications.Comment: To appear in the Proceedings of the 10th Extended Semantic Web
Conference (ESWC 2013
Exposing Provenance Metadata Using Different RDF Models
A standard model for exposing structured provenance metadata of scientific
assertions on the Semantic Web would increase interoperability,
discoverability, reliability, as well as reproducibility for scientific
discourse and evidence-based knowledge discovery. Several Resource Description
Framework (RDF) models have been proposed to track provenance. However,
provenance metadata may not only be verbose, but also significantly redundant.
Therefore, an appropriate RDF provenance model should be efficient for
publishing, querying, and reasoning over Linked Data. In the present work, we
have collected millions of pairwise relations between chemicals, genes, and
diseases from multiple data sources, and demonstrated the extent of redundancy
of provenance information in the life science domain. We also evaluated the
suitability of several RDF provenance models for this crowdsourced data set,
including the N-ary model, the Singleton Property model, and the
Nanopublication model. We examined query performance against three commonly
used large RDF stores, including Virtuoso, Stardog, and Blazegraph. Our
experiments demonstrate that query performance depends on both RDF store as
well as the RDF provenance model
structured representation of scientific evidence in the biomedical domain using Semantic Web techniques
Background Accounts of evidence are vital to evaluate and reproduce scientific
findings and integrate data on an informed basis. Currently, such accounts are
often inadequate, unstandardized and inaccessible for computational knowledge
engineering even though computational technologies, among them those of the
semantic web, are ever more employed to represent, disseminate and integrate
biomedical data and knowledge. Results We present SEE (Semantic EvidencE), an
RDF/OWL based approach for detailed representation of evidence in terms of the
argumentative structure of the supporting background for claims even in
complex settings. We derive design principles and identify minimal components
for the representation of evidence. We specify the Reasoning and Discourse
Ontology (RDO), an OWL representation of the model of scientific claims, their
subjects, their provenance and their argumentative relations underlying the
SEE approach. We demonstrate the application of SEE and illustrate its design
patterns in a case study by providing an expressive account of the evidence
for certain claims regarding the isolation of the enzyme glutamine synthetase.
Conclusions SEE is suited to provide coherent and computationally accessible
representations of evidence-related information such as the materials,
methods, assumptions, reasoning and information sources used to establish a
scientific finding by adopting a consistently claim-based perspective on
scientific results and their evidence. SEE allows for extensible evidence
representations, in which the level of detail can be adjusted and which can be
extended as needed. It supports representation of arbitrary many consecutive
layers of interpretation and attribution and different evaluations of the same
data. SEE and its underlying model could be a valuable component in a variety
of use cases that require careful representation or examination of evidence
for data presented on the semantic web or in other formats
HyQue: evaluating hypotheses using Semantic Web technologies
<p>Abstract</p> <p>Background</p> <p>Key to the success of e-Science is the ability to computationally evaluate expert-composed hypotheses for validity against experimental data. Researchers face the challenge of collecting, evaluating and integrating large amounts of diverse information to compose and evaluate a hypothesis. Confronted with rapidly accumulating data, researchers currently do not have the software tools to undertake the required information integration tasks.</p> <p>Results</p> <p>We present HyQue, a Semantic Web tool for querying scientific knowledge bases with the purpose of evaluating user submitted hypotheses. HyQue features a knowledge model to accommodate diverse hypotheses structured as events and represented using Semantic Web languages (RDF/OWL). Hypothesis validity is evaluated against experimental and literature-sourced evidence through a combination of SPARQL queries and evaluation rules. Inference over OWL ontologies (for type specifications, subclass assertions and parthood relations) and retrieval of facts stored as Bio2RDF linked data provide support for a given hypothesis. We evaluate hypotheses of varying levels of detail about the genetic network controlling galactose metabolism in <it>Saccharomyces cerevisiae</it> to demonstrate the feasibility of deploying such semantic computing tools over a growing body of structured knowledge in Bio2RDF.</p> <p>Conclusions</p> <p>HyQue is a query-based hypothesis evaluation system that can currently evaluate hypotheses about the galactose metabolism in <it>S. cerevisiae</it>. Hypotheses as well as the supporting or refuting data are represented in RDF and directly linked to one another allowing scientists to browse from data to hypothesis and <it>vice versa.</it> HyQue hypotheses and data are available at <url>http://semanticscience.org/projects/hyque</url>.</p
Using nanopublications as a distributed ledger of digital truth
With the increase in volume of research publications, it is very difficult for researchers to keep abreast of all work in their area. Additionally, the claims in
classical publications are not machine-readable making it challenging to retrieve,
integrate, and link prior work. Several semantic publishing approaches have been
proposed to address these challenges, including Research Object, Executable Paper,
Micropublications, and Nanopublications.
Nanopublications are a granular way of publishing research-based claims, their
associated provenance, and publication information (metadata of the nanopublication) in a machine-readable form. To date, over 10 million nanopublications have
been published, covering a wide range of topics, predominantly in the life sciences.
Nanopublications are immutable, decentralised/distributed, uniformly structured,
granular level, and authentic. These features of nanopublications allow them to
be used as a Distributed Ledger of Digital Truth. Such a ledger enables detecting
conflicting claims and generating the timeline of discussion on a particular topic.
However, the inability to identify all nanopublications related to a given topic prevent existing nanopublications forming a ledger.
In this dissertation, we make the following contributions: (i) Identify quality
issues regarding misuse of authorship properties and linkrot which impact on the
quality of the digital ledger. We argue that the Nanopub community needs to be
developed a set of guidelines for publishing nanopublications. (ii) Provide a framework for generating a timeline of discourse over a collection of nanopublications by
retrieving and combining nanopublications on a particular topic to provide interoperability between them. (iii) Detect contradictory claims between nanopublications
automatically highlighting the conflicts and provide explanations based on the provenance information in the nanopublications. Through these contributions, we show
that nanopublications can form a distributed ledger of digital truth, providing key
benefits such as citability, timelines of discourse, and conflict detection, to users of
the ledger
FAIR digital twins for data-intensive research
Although all the technical components supporting fully orchestrated Digital Twins (DT) currently exist, what remains missing is a conceptual clarification and analysis of a more generalized concept of a DT that is made FAIR, that is, universally machine actionable. This methodological overview is a first step toward this clarification. We present a review of previously developed semantic artifacts and how they may be used to compose a higher-order data model referred to here as a FAIR Digital Twin (FDT). We propose an architectural design to compose, store and reuse FDTs supporting data intensive research, with emphasis on privacy by design and their use in GDPR compliant open science.Analytical BioScience
Theoretical and technological building blocks for an innovation accelerator
The scientific system that we use today was devised centuries ago and is
inadequate for our current ICT-based society: the peer review system encourages
conservatism, journal publications are monolithic and slow, data is often not
available to other scientists, and the independent validation of results is
limited. Building on the Innovation Accelerator paper by Helbing and Balietti
(2011) this paper takes the initial global vision and reviews the theoretical
and technological building blocks that can be used for implementing an
innovation (in first place: science) accelerator platform driven by
re-imagining the science system. The envisioned platform would rest on four
pillars: (i) Redesign the incentive scheme to reduce behavior such as
conservatism, herding and hyping; (ii) Advance scientific publications by
breaking up the monolithic paper unit and introducing other building blocks
such as data, tools, experiment workflows, resources; (iii) Use machine
readable semantics for publications, debate structures, provenance etc. in
order to include the computer as a partner in the scientific process, and (iv)
Build an online platform for collaboration, including a network of trust and
reputation among the different types of stakeholders in the scientific system:
scientists, educators, funding agencies, policy makers, students and industrial
innovators among others. Any such improvements to the scientific system must
support the entire scientific process (unlike current tools that chop up the
scientific process into disconnected pieces), must facilitate and encourage
collaboration and interdisciplinarity (again unlike current tools), must
facilitate the inclusion of intelligent computing in the scientific process,
must facilitate not only the core scientific process, but also accommodate
other stakeholders such science policy makers, industrial innovators, and the
general public
Serviços de integração de dados para aplicações biomédicas
Doutoramento em Informática (MAP-i)In the last decades, the field of biomedical science has fostered
unprecedented scientific advances. Research is stimulated by the
constant evolution of information technology, delivering novel and
diverse bioinformatics tools. Nevertheless, the proliferation of new and
disconnected solutions has resulted in massive amounts of resources
spread over heterogeneous and distributed platforms. Distinct
data types and formats are generated and stored in miscellaneous
repositories posing data interoperability challenges and delays in
discoveries. Data sharing and integrated access to these resources
are key features for successful knowledge extraction.
In this context, this thesis makes contributions towards accelerating
the semantic integration, linkage and reuse of biomedical resources.
The first contribution addresses the connection of distributed and
heterogeneous registries. The proposed methodology creates a
holistic view over the different registries, supporting semantic
data representation, integrated access and querying. The second
contribution addresses the integration of heterogeneous information
across scientific research, aiming to enable adequate data-sharing
services. The third contribution presents a modular architecture to
support the extraction and integration of textual information, enabling
the full exploitation of curated data. The last contribution lies
in providing a platform to accelerate the deployment of enhanced
semantic information systems. All the proposed solutions were
deployed and validated in the scope of rare diseases.Nas últimas décadas, o campo das ciências biomédicas proporcionou
grandes avanços científicos estimulados pela constante evolução das
tecnologias de informação. A criação de diversas ferramentas na
área da bioinformática e a falta de integração entre novas soluções
resultou em enormes quantidades de dados distribuídos por diferentes
plataformas. Dados de diferentes tipos e formatos são gerados
e armazenados em vários repositórios, o que origina problemas de
interoperabilidade e atrasa a investigação. A partilha de informação
e o acesso integrado a esses recursos são características fundamentais
para a extração bem sucedida do conhecimento científico.
Nesta medida, esta tese fornece contribuições para acelerar a
integração, ligação e reutilização semântica de dados biomédicos. A
primeira contribuição aborda a interconexão de registos distribuídos e
heterogéneos. A metodologia proposta cria uma visão holística sobre
os diferentes registos, suportando a representação semântica de dados
e o acesso integrado. A segunda contribuição aborda a integração
de diversos dados para investigações científicas, com o objetivo de
suportar serviços interoperáveis para a partilha de informação. O
terceiro contributo apresenta uma arquitetura modular que apoia a
extração e integração de informações textuais, permitindo a exploração
destes dados. A última contribuição consiste numa plataforma web
para acelerar a criação de sistemas de informação semânticos. Todas
as soluções propostas foram validadas no âmbito das doenças raras