35 research outputs found

    Broadening the Scope of Nanopublications

    Full text link
    In this paper, we present an approach for extending the existing concept of nanopublications --- tiny entities of scientific results in RDF representation --- to broaden their application range. The proposed extension uses English sentences to represent informal and underspecified scientific claims. These sentences follow a syntactic and semantic scheme that we call AIDA (Atomic, Independent, Declarative, Absolute), which provides a uniform and succinct representation of scientific assertions. Such AIDA nanopublications are compatible with the existing nanopublication concept and enjoy most of its advantages such as information sharing, interlinking of scientific findings, and detailed attribution, while being more flexible and applicable to a much wider range of scientific results. We show that users are able to create AIDA sentences for given scientific results quickly and at high quality, and that it is feasible to automatically extract and interlink AIDA nanopublications from existing unstructured data sources. To demonstrate our approach, a web-based interface is introduced, which also exemplifies the use of nanopublications for non-scientific content, including meta-nanopublications that describe other nanopublications.Comment: To appear in the Proceedings of the 10th Extended Semantic Web Conference (ESWC 2013

    Exposing Provenance Metadata Using Different RDF Models

    Full text link
    A standard model for exposing structured provenance metadata of scientific assertions on the Semantic Web would increase interoperability, discoverability, reliability, as well as reproducibility for scientific discourse and evidence-based knowledge discovery. Several Resource Description Framework (RDF) models have been proposed to track provenance. However, provenance metadata may not only be verbose, but also significantly redundant. Therefore, an appropriate RDF provenance model should be efficient for publishing, querying, and reasoning over Linked Data. In the present work, we have collected millions of pairwise relations between chemicals, genes, and diseases from multiple data sources, and demonstrated the extent of redundancy of provenance information in the life science domain. We also evaluated the suitability of several RDF provenance models for this crowdsourced data set, including the N-ary model, the Singleton Property model, and the Nanopublication model. We examined query performance against three commonly used large RDF stores, including Virtuoso, Stardog, and Blazegraph. Our experiments demonstrate that query performance depends on both RDF store as well as the RDF provenance model

    structured representation of scientific evidence in the biomedical domain using Semantic Web techniques

    Get PDF
    Background Accounts of evidence are vital to evaluate and reproduce scientific findings and integrate data on an informed basis. Currently, such accounts are often inadequate, unstandardized and inaccessible for computational knowledge engineering even though computational technologies, among them those of the semantic web, are ever more employed to represent, disseminate and integrate biomedical data and knowledge. Results We present SEE (Semantic EvidencE), an RDF/OWL based approach for detailed representation of evidence in terms of the argumentative structure of the supporting background for claims even in complex settings. We derive design principles and identify minimal components for the representation of evidence. We specify the Reasoning and Discourse Ontology (RDO), an OWL representation of the model of scientific claims, their subjects, their provenance and their argumentative relations underlying the SEE approach. We demonstrate the application of SEE and illustrate its design patterns in a case study by providing an expressive account of the evidence for certain claims regarding the isolation of the enzyme glutamine synthetase. Conclusions SEE is suited to provide coherent and computationally accessible representations of evidence-related information such as the materials, methods, assumptions, reasoning and information sources used to establish a scientific finding by adopting a consistently claim-based perspective on scientific results and their evidence. SEE allows for extensible evidence representations, in which the level of detail can be adjusted and which can be extended as needed. It supports representation of arbitrary many consecutive layers of interpretation and attribution and different evaluations of the same data. SEE and its underlying model could be a valuable component in a variety of use cases that require careful representation or examination of evidence for data presented on the semantic web or in other formats

    HyQue: evaluating hypotheses using Semantic Web technologies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Key to the success of e-Science is the ability to computationally evaluate expert-composed hypotheses for validity against experimental data. Researchers face the challenge of collecting, evaluating and integrating large amounts of diverse information to compose and evaluate a hypothesis. Confronted with rapidly accumulating data, researchers currently do not have the software tools to undertake the required information integration tasks.</p> <p>Results</p> <p>We present HyQue, a Semantic Web tool for querying scientific knowledge bases with the purpose of evaluating user submitted hypotheses. HyQue features a knowledge model to accommodate diverse hypotheses structured as events and represented using Semantic Web languages (RDF/OWL). Hypothesis validity is evaluated against experimental and literature-sourced evidence through a combination of SPARQL queries and evaluation rules. Inference over OWL ontologies (for type specifications, subclass assertions and parthood relations) and retrieval of facts stored as Bio2RDF linked data provide support for a given hypothesis. We evaluate hypotheses of varying levels of detail about the genetic network controlling galactose metabolism in <it>Saccharomyces cerevisiae</it> to demonstrate the feasibility of deploying such semantic computing tools over a growing body of structured knowledge in Bio2RDF.</p> <p>Conclusions</p> <p>HyQue is a query-based hypothesis evaluation system that can currently evaluate hypotheses about the galactose metabolism in <it>S. cerevisiae</it>. Hypotheses as well as the supporting or refuting data are represented in RDF and directly linked to one another allowing scientists to browse from data to hypothesis and <it>vice versa.</it> HyQue hypotheses and data are available at <url>http://semanticscience.org/projects/hyque</url>.</p

    Using nanopublications as a distributed ledger of digital truth

    Get PDF
    With the increase in volume of research publications, it is very difficult for researchers to keep abreast of all work in their area. Additionally, the claims in classical publications are not machine-readable making it challenging to retrieve, integrate, and link prior work. Several semantic publishing approaches have been proposed to address these challenges, including Research Object, Executable Paper, Micropublications, and Nanopublications. Nanopublications are a granular way of publishing research-based claims, their associated provenance, and publication information (metadata of the nanopublication) in a machine-readable form. To date, over 10 million nanopublications have been published, covering a wide range of topics, predominantly in the life sciences. Nanopublications are immutable, decentralised/distributed, uniformly structured, granular level, and authentic. These features of nanopublications allow them to be used as a Distributed Ledger of Digital Truth. Such a ledger enables detecting conflicting claims and generating the timeline of discussion on a particular topic. However, the inability to identify all nanopublications related to a given topic prevent existing nanopublications forming a ledger. In this dissertation, we make the following contributions: (i) Identify quality issues regarding misuse of authorship properties and linkrot which impact on the quality of the digital ledger. We argue that the Nanopub community needs to be developed a set of guidelines for publishing nanopublications. (ii) Provide a framework for generating a timeline of discourse over a collection of nanopublications by retrieving and combining nanopublications on a particular topic to provide interoperability between them. (iii) Detect contradictory claims between nanopublications automatically highlighting the conflicts and provide explanations based on the provenance information in the nanopublications. Through these contributions, we show that nanopublications can form a distributed ledger of digital truth, providing key benefits such as citability, timelines of discourse, and conflict detection, to users of the ledger

    FAIR digital twins for data-intensive research

    Get PDF
    Although all the technical components supporting fully orchestrated Digital Twins (DT) currently exist, what remains missing is a conceptual clarification and analysis of a more generalized concept of a DT that is made FAIR, that is, universally machine actionable. This methodological overview is a first step toward this clarification. We present a review of previously developed semantic artifacts and how they may be used to compose a higher-order data model referred to here as a FAIR Digital Twin (FDT). We propose an architectural design to compose, store and reuse FDTs supporting data intensive research, with emphasis on privacy by design and their use in GDPR compliant open science.Analytical BioScience

    Theoretical and technological building blocks for an innovation accelerator

    Get PDF
    The scientific system that we use today was devised centuries ago and is inadequate for our current ICT-based society: the peer review system encourages conservatism, journal publications are monolithic and slow, data is often not available to other scientists, and the independent validation of results is limited. Building on the Innovation Accelerator paper by Helbing and Balietti (2011) this paper takes the initial global vision and reviews the theoretical and technological building blocks that can be used for implementing an innovation (in first place: science) accelerator platform driven by re-imagining the science system. The envisioned platform would rest on four pillars: (i) Redesign the incentive scheme to reduce behavior such as conservatism, herding and hyping; (ii) Advance scientific publications by breaking up the monolithic paper unit and introducing other building blocks such as data, tools, experiment workflows, resources; (iii) Use machine readable semantics for publications, debate structures, provenance etc. in order to include the computer as a partner in the scientific process, and (iv) Build an online platform for collaboration, including a network of trust and reputation among the different types of stakeholders in the scientific system: scientists, educators, funding agencies, policy makers, students and industrial innovators among others. Any such improvements to the scientific system must support the entire scientific process (unlike current tools that chop up the scientific process into disconnected pieces), must facilitate and encourage collaboration and interdisciplinarity (again unlike current tools), must facilitate the inclusion of intelligent computing in the scientific process, must facilitate not only the core scientific process, but also accommodate other stakeholders such science policy makers, industrial innovators, and the general public

    Serviços de integração de dados para aplicações biomédicas

    Get PDF
    Doutoramento em Informática (MAP-i)In the last decades, the field of biomedical science has fostered unprecedented scientific advances. Research is stimulated by the constant evolution of information technology, delivering novel and diverse bioinformatics tools. Nevertheless, the proliferation of new and disconnected solutions has resulted in massive amounts of resources spread over heterogeneous and distributed platforms. Distinct data types and formats are generated and stored in miscellaneous repositories posing data interoperability challenges and delays in discoveries. Data sharing and integrated access to these resources are key features for successful knowledge extraction. In this context, this thesis makes contributions towards accelerating the semantic integration, linkage and reuse of biomedical resources. The first contribution addresses the connection of distributed and heterogeneous registries. The proposed methodology creates a holistic view over the different registries, supporting semantic data representation, integrated access and querying. The second contribution addresses the integration of heterogeneous information across scientific research, aiming to enable adequate data-sharing services. The third contribution presents a modular architecture to support the extraction and integration of textual information, enabling the full exploitation of curated data. The last contribution lies in providing a platform to accelerate the deployment of enhanced semantic information systems. All the proposed solutions were deployed and validated in the scope of rare diseases.Nas últimas décadas, o campo das ciências biomédicas proporcionou grandes avanços científicos estimulados pela constante evolução das tecnologias de informação. A criação de diversas ferramentas na área da bioinformática e a falta de integração entre novas soluções resultou em enormes quantidades de dados distribuídos por diferentes plataformas. Dados de diferentes tipos e formatos são gerados e armazenados em vários repositórios, o que origina problemas de interoperabilidade e atrasa a investigação. A partilha de informação e o acesso integrado a esses recursos são características fundamentais para a extração bem sucedida do conhecimento científico. Nesta medida, esta tese fornece contribuições para acelerar a integração, ligação e reutilização semântica de dados biomédicos. A primeira contribuição aborda a interconexão de registos distribuídos e heterogéneos. A metodologia proposta cria uma visão holística sobre os diferentes registos, suportando a representação semântica de dados e o acesso integrado. A segunda contribuição aborda a integração de diversos dados para investigações científicas, com o objetivo de suportar serviços interoperáveis para a partilha de informação. O terceiro contributo apresenta uma arquitetura modular que apoia a extração e integração de informações textuais, permitindo a exploração destes dados. A última contribuição consiste numa plataforma web para acelerar a criação de sistemas de informação semânticos. Todas as soluções propostas foram validadas no âmbito das doenças raras
    corecore