1,743 research outputs found

    Provenance-Centered Dataset of Drug-Drug Interactions

    Get PDF
    Over the years several studies have demonstrated the ability to identify potential drug-drug interactions via data mining from the literature (MEDLINE), electronic health records, public databases (Drugbank), etc. While each one of these approaches is properly statistically validated, they do not take into consideration the overlap between them as one of their decision making variables. In this paper we present LInked Drug-Drug Interactions (LIDDI), a public nanopublication-based RDF dataset with trusty URIs that encompasses some of the most cited prediction methods and sources to provide researchers a resource for leveraging the work of others into their prediction methods. As one of the main issues to overcome the usage of external resources is their mappings between drug names and identifiers used, we also provide the set of mappings we curated to be able to compare the multiple sources we aggregate in our dataset.Comment: In Proceedings of the 14th International Semantic Web Conference (ISWC) 201

    Decentralized provenance-aware publishing with nanopublications

    Get PDF
    Publication and archival of scientific results is still commonly considered the responsability of classical publishing companies. Classical forms of publishing, however, which center around printed narrative articles, no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. In this article, we propose to design scientific data publishing as a web-based bottom-up process, without top-down control of central authorities such as publishing companies. Based on a novel combination of existing concepts and technologies, we present a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data. We show how this approach allows researchers to publish, retrieve, verify, and recombine datasets of nanopublications in a reliable and trustworthy manner, and we argue that this architecture could be used as a low-level data publication layer to serve the Semantic Web in general. Our evaluation of the current network shows that this system is efficient and reliable

    Knowledge-based Biomedical Data Science 2019

    Full text link
    Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

    PAV ontology: provenance, authoring and versioning

    Full text link

    HDTQ: Managing RDF Datasets in Compressed Space

    Get PDF
    HDT (Header-Dictionary-Triples) is a compressed representation of RDF data that supports retrieval features without prior decompression. Yet, RDF datasets often contain additional graph information, such as the origin, version or validity time of a triple. Traditional HDT is not capable of handling this additional parameter(s). This work introduces HDTQ (HDT Quads), an extension of HDT that is able to represent quadruples (or quads) while still being highly compact and queryable. Two HDTQ-based approaches are introduced: Annotated Triples and Annotated Graphs, and their performance is compared to the leading open-source RDF stores on the market. Results show that HDTQ achieves the best compression rates and is a competitive alternative to well-established systems

    Contextual Analysis of Large-Scale Biomedical Associations for the Elucidation and Prioritization of Genes and their Roles in Complex Disease

    Get PDF
    Vast amounts of biomedical associations are easily accessible in public resources, spanning gene-disease associations, tissue-specific gene expression, gene function and pathway annotations, and many other data types. Despite this mass of data, information most relevant to the study of a particular disease remains loosely coupled and difficult to incorporate into ongoing research. Current public databases are difficult to navigate and do not interoperate well due to the plethora of interfaces and varying biomedical concept identifiers used. Because no coherent display of data within a specific problem domain is available, finding the latent relationships associated with a disease of interest is impractical. This research describes a method for extracting the contextual relationships embedded within associations relevant to a disease of interest. After applying the method to a small test data set, a large-scale integrated association network is constructed for application of a network propagation technique that helps uncover more distant latent relationships. Together these methods are adept at uncovering highly relevant relationships without any a priori knowledge of the disease of interest. The combined contextual search and relevance methods power a tool which makes pertinent biomedical associations easier to find, easier to assimilate into ongoing work, and more prominent than currently available databases. Increasing the accessibility of current information is an important component to understanding high-throughput experimental results and surviving the data deluge

    Decentralized Identity and Access Management Framework for Internet of Things Devices

    Get PDF
    The emerging Internet of Things (IoT) domain is about connecting people and devices and systems together via sensors and actuators, to collect meaningful information from the devices surrounding environment and take actions to enhance productivity and efficiency. The proliferation of IoT devices from around few billion devices today to over 25 billion in the next few years spanning over heterogeneous networks defines a new paradigm shift for many industrial and smart connectivity applications. The existing IoT networks faces a number of operational challenges linked to devices management and the capability of devices’ mutual authentication and authorization. While significant progress has been made in adopting existing connectivity and management frameworks, most of these frameworks are designed to work for unconstrained devices connected in centralized networks. On the other hand, IoT devices are constrained devices with tendency to work and operate in decentralized and peer-to-peer arrangement. This tendency towards peer-to-peer service exchange resulted that many of the existing frameworks fails to address the main challenges faced by the need to offer ownership of devices and the generated data to the actual users. Moreover, the diversified list of devices and offered services impose that more granular access control mechanisms are required to limit the exposure of the devices to external threats and provide finer access control policies under control of the device owner without the need for a middleman. This work addresses these challenges by utilizing the concepts of decentralization introduced in Distributed Ledger (DLT) technologies and capability of automating business flows through smart contracts. The proposed work utilizes the concepts of decentralized identifiers (DIDs) for establishing a decentralized devices identity management framework and exploits Blockchain tokenization through both fungible and non-fungible tokens (NFTs) to build a self-controlled and self-contained access control policy based on capability-based access control model (CapBAC). The defined framework provides a layered approach that builds on identity management as the foundation to enable authentication and authorization processes and establish a mechanism for accounting through the adoption of standardized DLT tokenization structure. The proposed framework is demonstrated through implementing a number of use cases that addresses issues related identity management in industries that suffer losses in billions of dollars due to counterfeiting and lack of global and immutable identity records. The framework extension to support applications for building verifiable data paths in the application layer were addressed through two simple examples. The system has been analyzed in the case of issuing authorization tokens where it is expected that DLT consensus mechanisms will introduce major performance hurdles. A proof of concept emulating establishing concurrent connections to a single device presented no timed-out requests at 200 concurrent connections and a rise in the timed-out requests ratio to 5% at 600 connections. The analysis showed also that a considerable overhead in the data link budget of 10.4% is recorded due to the use of self-contained policy token which is a trade-off between building self-contained access tokens with no middleman and link cost

    Enabling Web-scale data integration in biomedicine through Linked Open Data

    Get PDF
    The biomedical data landscape is fragmented with several isolated, heterogeneous data and knowledge sources, which use varying formats, syntaxes, schemas, and entity notations, existing on the Web. Biomedical researchers face severe logistical and technical challenges to query, integrate, analyze, and visualize data from multiple diverse sources in the context of available biomedical knowledge. Semantic Web technologies and Linked Data principles may aid toward Web-scale semantic processing and data integration in biomedicine. The biomedical research community has been one of the earliest adopters of these technologies and principles to publish data and knowledge on the Web as linked graphs and ontologies, hence creating the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we provide our perspective on some opportunities proffered by the use of LSLOD to integrate biomedical data and knowledge in three domains: (1) pharmacology, (2) cancer research, and (3) infectious diseases. We will discuss some of the major challenges that hinder the wide-spread use and consumption of LSLOD by the biomedical research community. Finally, we provide a few technical solutions and insights that can address these challenges. Eventually, LSLOD can enable the development of scalable, intelligent infrastructures that support artificial intelligence methods for augmenting human intelligence to achieve better clinical outcomes for patients, to enhance the quality of biomedical research, and to improve our understanding of living systems
    • …
    corecore