98 research outputs found
Learning Logical Rules from Knowledge Graphs
Ph.D. (Integrated) ThesisExpressing and extracting regularities in multi-relational data, where data points are interrelated
and heterogeneous, requires well-designed knowledge representation. Knowledge Graphs (KGs),
as a graph-based representation of multi-relational data, have seen a rapidly growing presence in
industry and academia, where many real-world applications and academic research are either
enabled or augmented through the incorporation of KGs. However, due to the way KGs are
constructed, they are inherently noisy and incomplete. In this thesis, we focus on developing
logic-based graph reasoning systems that utilize logical rules to infer missing facts for the
completion of KGs. Unlike most rule learners that primarily mine abstract rules that contain
no constants, we are particularly interested in learning instantiated rules that contain constants
due to their ability to represent meaningful patterns and correlations that can not be expressed
by abstract rules. The inclusion of instantiated rules often leads to exponential growth in the
search space. Therefore, it is necessary to develop optimization strategies to balance between
scalability and expressivity. To such an end, we propose GPFL, a probabilistic rule learning
system optimized to mine instantiated rules through the implementation of a novel two-stage
rule generation mechanism. Through experiments, we demonstrate that GPFL not only performs
competitively on knowledge graph completion but is also much more efficient then existing
methods at mining instantiated rules. With GPFL, we also reveal overfitting instantiated rules
and provide detailed analyses about their impact on system performance. Then, we propose RHF,
a generic framework for constructing rule hierarchies from a given set of rules. We demonstrate
through experiments that with RHF and the hierarchical pruning techniques enabled by it,
significant reductions in runtime and rule size are observed due to the pruning of unpromising
rules. Eventually, to test the practicability of rule learning systems, we develop Ranta, a novel
drug repurposing system that relies on logical rules as features to make interpretable inferences.
Ranta outperforms existing methods by a large margin in predictive performance and can make
reasonable repurposing suggestions with interpretable evidence
Knowledge-augmented Graph Machine Learning for Drug Discovery: A Survey from Precision to Interpretability
The integration of Artificial Intelligence (AI) into the field of drug
discovery has been a growing area of interdisciplinary scientific research.
However, conventional AI models are heavily limited in handling complex
biomedical structures (such as 2D or 3D protein and molecule structures) and
providing interpretations for outputs, which hinders their practical
application. As of late, Graph Machine Learning (GML) has gained considerable
attention for its exceptional ability to model graph-structured biomedical data
and investigate their properties and functional relationships. Despite
extensive efforts, GML methods still suffer from several deficiencies, such as
the limited ability to handle supervision sparsity and provide interpretability
in learning and inference processes, and their ineffectiveness in utilising
relevant domain knowledge. In response, recent studies have proposed
integrating external biomedical knowledge into the GML pipeline to realise more
precise and interpretable drug discovery with limited training instances.
However, a systematic definition for this burgeoning research direction is yet
to be established. This survey presents a comprehensive overview of
long-standing drug discovery principles, provides the foundational concepts and
cutting-edge techniques for graph-structured data and knowledge databases, and
formally summarises Knowledge-augmented Graph Machine Learning (KaGML) for drug
discovery. A thorough review of related KaGML works, collected following a
carefully designed search methodology, are organised into four categories
following a novel-defined taxonomy. To facilitate research in this promptly
emerging field, we also share collected practical resources that are valuable
for intelligent drug discovery and provide an in-depth discussion of the
potential avenues for future advancements
Recommended from our members
Knowledge Graphs: Opportunities and Challenges
With the explosive growth of artificial intelligence (AI) and big data, it has become vitally important to organize and represent the enormous volume of knowledge appropriately. As graph data, knowledge graphs accumulate and convey knowledge of the real world. It has been well-recognized that knowledge graphs effectively represent complex information; hence, they rapidly gain the attention of academia and industry in recent years. Thus to develop a deeper understanding of knowledge graphs, this paper presents a systematic overview of this field. Specifically, we focus on the opportunities and challenges of knowledge graphs. We first review the opportunities of knowledge graphs in terms of two aspects: (1) AI systems built upon knowledge graphs; (2) potential application fields of knowledge graphs. Then, we thoroughly discuss severe technical challenges in this field, such as knowledge graph embeddings, knowledge acquisition, knowledge graph completion, knowledge fusion, and knowledge reasoning. We expect that this survey will shed new light on future research and the development of knowledge graphs
Graph Representation Learning in Biomedicine
Biomedical networks are universal descriptors of systems of interacting
elements, from protein interactions to disease networks, all the way to
healthcare systems and scientific knowledge. With the remarkable success of
representation learning in providing powerful predictions and insights, we have
witnessed a rapid expansion of representation learning techniques into
modeling, analyzing, and learning with such networks. In this review, we put
forward an observation that long-standing principles of networks in biology and
medicine -- while often unspoken in machine learning research -- can provide
the conceptual grounding for representation learning, explain its current
successes and limitations, and inform future advances. We synthesize a spectrum
of algorithmic approaches that, at their core, leverage graph topology to embed
networks into compact vector spaces, and capture the breadth of ways in which
representation learning is proving useful. Areas of profound impact include
identifying variants underlying complex traits, disentangling behaviors of
single cells and their effects on health, assisting in diagnosis and treatment
of patients, and developing safe and effective medicines
Recommended from our members
Knowledge Graphs for the Life Sciences: Recent Developments, Challenges and Opportunities
The term life sciences refers to the disciplines that study living organisms and life processes, and include chemistry, biology, medicine, and a range of other related disciplines. Research efforts in life sciences are heavily data-driven, as they produce and consume vast amounts of scientific data, much of which is intrinsically relational and graphstructured.
The volume of data and the complexity of scientific concepts and relations referred to therein promote the application of advanced knowledgedriven technologies for managing and interpreting data, with the ultimate aim to advance scientific discovery.
In this survey and position paper, we discuss recent developments and advances in the use of graph-based technologies in life sciences and set out a vision for how these technologies will impact these fields into the future. We focus on three broad topics: the construction and management of Knowledge Graphs (KGs), the use of KGs and associated technologies in the discovery of new knowledge, and the use of KGs in artificial intelligence applications to support explanations (explainable AI). We select a few exemplary use cases for each topic, discuss the challenges and open research questions within these topics, and conclude with a perspective and outlook that summarizes the overarching challenges and their potential solutions as a guide for future research
Network-driven strategies to integrate and exploit biomedical data
[eng] In the quest for understanding complex biological systems, the scientific community has been delving into protein, chemical and disease biology, populating biomedical databases with a wealth of data and knowledge. Currently, the field of biomedicine has entered a Big Data era, in which computational-driven research can largely benefit from existing knowledge to better understand and characterize biological and chemical entities. And yet, the heterogeneity and complexity of biomedical data trigger the need for a proper integration and representation of this knowledge, so that it can be effectively and efficiently exploited.
In this thesis, we aim at developing new strategies to leverage the current biomedical knowledge, so that meaningful information can be extracted and fused into downstream applications. To this goal, we have capitalized on network analysis algorithms to integrate and exploit biomedical data in a wide variety of scenarios, providing a better understanding of pharmacoomics experiments while helping accelerate the drug discovery process. More specifically, we have (i) devised an approach to identify functional gene sets associated with drug response mechanisms of action, (ii) created a resource of biomedical descriptors able to anticipate cellular drug response and identify new drug repurposing opportunities, (iii) designed a tool to annotate biomedical support for a given set of experimental observations, and (iv) reviewed different chemical and biological descriptors relevant for drug discovery, illustrating how they can be used to provide solutions to current challenges in biomedicine.[cat] En la cerca dâuna millor comprensiĂł dels sistemes biològics complexos, la comunitat cientĂfica ha estat aprofundint en la biologia de les proteĂŻnes, fĂ rmacs i malalties, poblant les bases de dades biomèdiques amb un gran volum de dades i coneixement. En lâactualitat, el camp de la biomedicina es troba en una era de âdades massivesâ (Big Data), on la investigaciĂł duta a terme per ordinadors seân pot beneficiar per entendre i caracteritzar millor les entitats quĂmiques i biològiques. No obstant, la heterogeneĂŻtat i complexitat de les dades biomèdiques requereix que aquestes sâintegrin i es representin dâuna manera idònia, permetent aixĂ explotar aquesta informaciĂł dâuna manera efectiva i eficient.
Lâobjectiu dâaquesta tesis doctoral ĂŠs desenvolupar noves estratègies que permetin explotar el coneixement biomèdic actual i aixĂ extreure informaciĂł rellevant per aplicacions biomèdiques futures. Per aquesta finalitat, em fet servir algoritmes de xarxes per tal dâintegrar i explotar el coneixement biomèdic en diferents tasques, proporcionant un millor enteniment dels experiments farmacoòmics per tal dâajudar accelerar el procĂŠs de descobriment de nous fĂ rmacs. Com a resultat, en aquesta tesi hem (i) dissenyat una estratègia per identificar grups funcionals de gens associats a la resposta de lĂnies cel¡lulars als fĂ rmacs, (ii) creat una col¡lecciĂł de descriptors biomèdics capaços, entre altres coses, dâanticipar com les cèl¡lules responen als fĂ rmacs o trobar nous usos per fĂ rmacs existents, (iii) desenvolupat una eina per descobrir quins contextos biològics corresponen a una associaciĂł biològica observada experimentalment i, finalment, (iv) hem explorat diferents descriptors quĂmics i biològics rellevants pel procĂŠs de descobriment de nous fĂ rmacs, mostrant com aquests poden ser utilitzats per trobar solucions a reptes actuals dins el camp de la biomedicina
Computational Approaches to Drug Profiling and Drug-Protein Interactions
Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a
long period of stagnation in drug approvals. Due to the extreme costs associated with
introducing a drug to the market, locating and understanding the reasons for clinical failure
is key to future productivity. As part of this PhD, three main contributions were made in
this respect. First, the web platform, LigNFam enables users to interactively explore
similarity relationships between âdrug likeâ molecules and the proteins they bind. Secondly,
two deep-learning-based binding site comparison tools were developed, competing with
the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the
open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold
relationships and has already been used in multiple projects, including integration into a
virtual screening pipeline to increase the tractability of ultra-large screening experiments.
Together, and with existing tools, the contributions made will aid in the understanding of
drug-protein relationships, particularly in the fields of off-target prediction and drug
repurposing, helping to design better drugs faster
Recommended from our members
Exploiting multimodality and structure in world representations
An essential aim of artificial intelligence research is to design agents that will eventually cooperate with humans within the real world. To this end, embodied learning is emerging as one of the most important efforts contributed by the machine learning community towards this goal. Recently developing sub-fields concern various aspects of such systems---visual reasoning, language representations, causal mechanisms, robustness to out-of-distribution inputs, to name only a few.
In particular, multimodal learning and language grounding are vital to achieving a strong understanding of the real world. Humans build internal representations via interacting with their environment, learning complex associations between visual, auditory and linguistic concepts. Since the world abounds with structure, graph-based encodings are also likely to be incorporated in reasoning and decision-making modules. Furthermore, these relational representations are rather symbolic in nature---providing advantages over other formats, such as raw pixels---and can encode various types of links (temporal, causal, spatial) which can be essential for understanding and acting in the real world.
This thesis presents three research works that study and develop likely aspects of future intelligent agents. The first contribution centers on vision-and-language learning, introducing a challenging embodied task that shifts the focus of an existing one to the visual reasoning problem. By extending popular visual question answering (VQA) paradigms, I also designed several models that were evaluated on the novel dataset. This produced initial performance estimates for environment understanding, through the lens of a more challenging VQA downstream task. The second work presents two ways of obtaining hierarchical representations of graph-structured data. These methods either scaled to much larger graphs than the ones processed by the best-performing method at the time, or incorporated theoretical properties via the use of topological data analysis algorithms. Both approaches competed with contemporary state-of-the-art graph classification methods, even outside social domains in the second case, where the inductive bias was PageRank-driven. Finally, the third contribution delves further into relational learning, presenting a probabilistic treatment of graph representations in complex settings such as few-shot, multi-task learning and scarce-labelled data regimes. By adding relational inductive biases to neural processes, the resulting framework can model an entire distribution of functions which generate datasets with structure. This yielded significant performance gains, especially in the aforementioned complex scenarios, with semantically-accurate uncertainty estimates that drastically improved over the neural process baseline. This type of framework may eventually contribute to developing lifelong-learning systems, due to its ability to adapt to novel tasks and distributions.
The benchmark, methods and frameworks that I have devised during my doctoral studies suggest important future directions for embodied and graph representation learning research. These areas have increasingly proved their relevance to designing intelligent and collaborative agents, which we may interact with in the near future. By addressing several challenges in this problem space, my contributions therefore take a few steps towards building machine learning systems to be deployed in real-life settings.DREAM CD
- âŚ