461 research outputs found

    Computational and human-based methods for knowledge discovery over knowledge graphs

    Get PDF
    The modern world has evolved, accompanied by the huge exploitation of data and information. Daily, increasing volumes of data from various sources and formats are stored, resulting in a challenging strategy to manage and integrate them to discover new knowledge. The appropriate use of data in various sectors of society, such as education, healthcare, e-commerce, and industry, provides advantages for decision support in these areas. However, knowledge discovery becomes challenging since data may come from heterogeneous sources with important information hidden. Thus, new approaches that adapt to the new challenges of knowledge discovery in such heterogeneous data environments are required. The semantic web and knowledge graphs (KGs) are becoming increasingly relevant on the road to knowledge discovery. This thesis tackles the problem of knowledge discovery over KGs built from heterogeneous data sources. We provide a neuro-symbolic artificial intelligence system that integrates symbolic and sub-symbolic frameworks to exploit the semantics encoded in a KG and its structure. The symbolic system relies on existing approaches of deductive databases to make explicit, implicit knowledge encoded in a KG. The proposed deductive database DSDS can derive new statements to ego networks given an abstract target prediction. Thus, DSDS minimizes data sparsity in KGs. In addition, a sub-symbolic system relies on knowledge graph embedding (KGE) models. KGE models are commonly applied in the KG completion task to represent entities in a KG in a low-dimensional vector space. However, KGE models are known to suffer from data sparsity, and a symbolic system assists in overcoming this fact. The proposed approach discovers knowledge given a target prediction in a KG and extracts unknown implicit information related to the target prediction. As a proof of concept, we have implemented the neuro-symbolic system on top of a KG for lung cancer to predict polypharmacy treatment effectiveness. The symbolic system implements a deductive system to deduce pharmacokinetic drug-drug interactions encoded in a set of rules through the Datalog program. Additionally, the sub-symbolic system predicts treatment effectiveness using a KGE model, which preserves the KG structure. An ablation study on the components of our approach is conducted, considering state-of-the-art KGE methods. The observed results provide evidence for the benefits of the neuro-symbolic integration of our approach, where the neuro-symbolic system for an abstract target prediction exhibits improved results. The enhancement of the results occurs because the symbolic system increases the prediction capacity of the sub-symbolic system. Moreover, the proposed neuro-symbolic artificial intelligence system in Industry 4.0 (I4.0) is evaluated, demonstrating its effectiveness in determining relatedness among standards and analyzing their properties to detect unknown relations in the I4.0KG. The results achieved allow us to conclude that the proposed neuro-symbolic approach for an abstract target prediction improves the prediction capability of KGE models by minimizing data sparsity in KGs

    Distributed Web Service Coordination for Collaboration Applications and Biological Workflows

    Get PDF
    In this dissertation work, we have investigated the main research thrust of decentralized coordination of workflows over web services. To address distributed workflow coordination, first we have developed “Web Coordination Bonds” as a capable set of dependency modeling primitives that enable each web service to manage its own dependencies. Web bond primitives are as powerful as extended Petri nets and have sufficient modeling and expressive capabilities to model workflow dependencies. We have designed and prototyped our “Web Service Coordination Management Middleware” (WSCMM) system that enhances current web services infrastructure to accommodate web bond enabled web services. Finally, based on core concepts of web coordination bonds and WSCMM, we have developed the “BondFlow” system that allows easy configuration distributed coordination of workflows. The footprint of the BonFlow runtime is 24KB and the additional third party software packages, SOAP client and XML parser, account for 115KB

    Introducing distributed dynamic data-intensive (D3) science: Understanding applications and infrastructure

    Get PDF
    A common feature across many science and engineering applications is the amount and diversity of data and computation that must be integrated to yield insights. Data sets are growing larger and becoming distributed; and their location, availability and properties are often time-dependent. Collectively, these characteristics give rise to dynamic distributed data-intensive applications. While "static" data applications have received significant attention, the characteristics, requirements, and software systems for the analysis of large volumes of dynamic, distributed data, and data-intensive applications have received relatively less attention. This paper surveys several representative dynamic distributed data-intensive application scenarios, provides a common conceptual framework to understand them, and examines the infrastructure used in support of applications.Comment: 38 pages, 2 figure

    Drug Repurposing

    Get PDF
    This book focuses on various aspects and applications of drug repurposing, the understanding of which is important for treating diseases. Due to the high costs and time associated with the new drug discovery process, the inclination toward drug repurposing is increasing for common as well as rare diseases. A major focus of this book is understanding the role of drug repurposing to develop drugs for infectious diseases, including antivirals, antibacterial and anticancer drugs, as well as immunotherapeutics

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Intellectual Property Management in Health and Agricultural Innovation: A Handbook of Best Practices, Vol. 1

    Get PDF
    Prepared by and for policy-makers, leaders of public sector research establishments, technology transfer professionals, licensing executives, and scientists, this online resource offers up-to-date information and strategies for utilizing the power of both intellectual property and the public domain. Emphasis is placed on advancing innovation in health and agriculture, though many of the principles outlined here are broadly applicable across technology fields. Eschewing ideological debates and general proclamations, the authors always keep their eye on the practical side of IP management. The site is based on a comprehensive Handbook and Executive Guide that provide substantive discussions and analysis of the opportunities awaiting anyone in the field who wants to put intellectual property to work. This multi-volume work contains 153 chapters on a full range of IP topics and over 50 case studies, composed by over 200 authors from North, South, East, and West. If you are a policymaker, a senior administrator, a technology transfer manager, or a scientist, we invite you to use the companion site guide available at http://www.iphandbook.org/index.html The site guide distills the key points of each IP topic covered by the Handbook into simple language and places it in the context of evolving best practices specific to your professional role within the overall picture of IP management

    Nanogenomics and Nanoproteomics Enabling Personalized, Predictive and Preventive Medicine

    Get PDF
    Since the discovery of the nucleic acid, molecular biology has made tremendous progresses, achieving a lot of results. Despite this, there is still a gap between the classical and traditional medical approach and the molecular world. Inspired by the incredible wealth of data generated by the "omics"-driven techniques and the “high-trouhgput technologies” (HTTs), I have tried to develop a protocol that could reduce the actually extant barrier between the phenomenological medicine and the molecular medicine, facilitating a translational shift from the lab to the patient bedside. I also felt the urgent need to integrate the most important omics sciences, that is to say genomics and proteomics. Nucleic Acid Programmable Protein Arrays (NAPPA) can do this, by utilizing a complex mammalian cell free expression system to produce proteins in situ. In alternative to fluorescent-labeled approaches a new label free method, emerging from the combined utilization of three independent and complementary nanobiotechnological approaches, appears capable to analyze gene and protein function, gene-protein, gene-drug, protein-protein and protein-drug interactions in studies promising for personalized medicine. Quartz Micro Circuit nanogravimetry (QCM), based on frequency and dissipation factor, mass spectrometry (MS) and anodic porous alumina (APA) overcomes indeed the limits of correlated fluorescence detection plagued by the background still present after extensive washes. Work is in progress to further optimize this approach a homogeneous and well defined bacterial cell free expression system able to realize the ambitious objective to quantify the regulatory gene and protein networks in humans. Implications for personalized medicine of the above label free protein array using different test genes and proteins are reported in this PhD thesis
    corecore