692 research outputs found

    Biomedical knowledge graph embeddings for personalized medicine: Predicting disease‐gene associations

    Get PDF
    Personalized medicine is a concept that has been subject of increasing interest in medical research and practice in the last few years. However, significant challenges stand in the way of practical implementations, namely in regard to extracting clinically valuable insights from the vast amount of biomedical knowledge generated in the last few years. Here, we describe an approach that uses Knowledge Graph Embedding (KGE) methods on a biomedical Knowledge Graph (KG) as a path to reasoning over the wealth of information stored in publicly accessible databases. We built a Knowledge Graph using data from DisGeNET and GO, containing relationships between genes, diseases and other biological entities. The KG contains 93,657 nodes of 5 types and 1,705,585 relationships of 59 types. We applied KGE methods to this KG, obtaining an excellent performance in predicting gene-disease associations (MR 0.13, MRR 0.96, HITS@1 0.93, HITS@3 0.99, and HITS@10 0.99). The optimal hyperparameter set was used to predict all possible novel gene-disease associations. An in-depth analysis of novel gene-disease predictions for disease terms related to Autism Spectrum Disorder (ASD) shows that this approach produces predictions consistent with known candidate genes and biological pathways and yields relevant insights into the biology of this paradigmatic complex disorder.Fundação para a Ciência e a Tecnologia, Grant/Award Numbers: SAICTPAC/0010/2015, POCI- 01-0145-FEDER-016428-PAC, EXPL/CCI-BIO/0126/2021, PTDC/MED-OUT/28937/2017, UIDP/04046/2020, UIDB/04046/2020; Fundo Europeu de Desenvolvimento Regional, Grant/Award Number: 022153info:eu-repo/semantics/publishedVersio

    Biological mechanisms of aging predict age-related disease co-occurrence in patients

    Get PDF
    Genetic, environmental, and pharmacological interventions into the aging process can confer resistance to multiple age-related diseases in laboratory animals, including rhesus monkeys. These findings imply that individual mechanisms of aging might contribute to the co-occurrence of age-related diseases in humans and could be targeted to prevent these conditions simultaneously. To address this question, we text mined 917,645 literature abstracts followed by manual curation and found strong, non-random associations between age-related diseases and aging mechanisms in humans, confirmed by gene set enrichment analysis of GWAS data. Integration of these associations with clinical data from 3.01 million patients showed that age-related diseases associated with each of five aging mechanisms were more likely than chance to be present together in patients. Genetic evidence revealed that innate and adaptive immunity, the intrinsic apoptotic signaling pathway and activity of the ERK1/2 pathway were associated with multiple aging mechanisms and diverse age-related diseases. Mechanisms of aging hence contribute both together and individually to age-related disease co-occurrence in humans and could potentially be targeted accordingly to prevent multimorbidity

    Discovering lesser known molecular players and mechanistic patterns in Alzheimer's disease using an integrative disease modelling approach

    Get PDF
    Convergence of exponentially advancing technologies is driving medical research with life changing discoveries. On the contrary, repeated failures of high-profile drugs to battle Alzheimer's disease (AD) has made it one of the least successful therapeutic area. This failure pattern has provoked researchers to grapple with their beliefs about Alzheimer's aetiology. Thus, growing realisation that Amyloid-β and tau are not 'the' but rather 'one of the' factors necessitates the reassessment of pre-existing data to add new perspectives. To enable a holistic view of the disease, integrative modelling approaches are emerging as a powerful technique. Combining data at different scales and modes could considerably increase the predictive power of the integrative model by filling biological knowledge gaps. However, the reliability of the derived hypotheses largely depends on the completeness, quality, consistency, and context-specificity of the data. Thus, there is a need for agile methods and approaches that efficiently interrogate and utilise existing public data. This thesis presents the development of novel approaches and methods that address intrinsic issues of data integration and analysis in AD research. It aims to prioritise lesser-known AD candidates using highly curated and precise knowledge derived from integrated data. Here much of the emphasis is put on quality, reliability, and context-specificity. This thesis work showcases the benefit of integrating well-curated and disease-specific heterogeneous data in a semantic web-based framework for mining actionable knowledge. Furthermore, it introduces to the challenges encountered while harvesting information from literature and transcriptomic resources. State-of-the-art text-mining methodology is developed to extract miRNAs and its regulatory role in diseases and genes from the biomedical literature. To enable meta-analysis of biologically related transcriptomic data, a highly-curated metadata database has been developed, which explicates annotations specific to human and animal models. Finally, to corroborate common mechanistic patterns — embedded with novel candidates — across large-scale AD transcriptomic data, a new approach to generate gene regulatory networks has been developed. The work presented here has demonstrated its capability in identifying testable mechanistic hypotheses containing previously unknown or emerging knowledge from public data in two major publicly funded projects for Alzheimer's, Parkinson's and Epilepsy diseases

    Biological mechanisms of aging predict age-related disease multimorbidities in patients

    Get PDF
    Genetic, environmental and pharmacological interventions into the aging process can confer resistance to a multiple age-related diseases in laboratory animals, including rhesus monkeys. These findings imply that mechanisms of aging might contribute to patterns of multimorbidity in humans, and hence could be targeted to prevent multiple conditions simultaneously. To address this question, we text mined 917,645 literature abstracts followed by manual curation, and found strong, non-random associations between age-related diseases and aging mechanisms, confirmed by gene set enrichment analysis of GWAS data. Integration of these associations with clinical data from 3.01 million patients showed that age-related diseases associated with each of five aging mechanisms were more likely than chance to be present together in patients. Genetic evidence revealed that innate and adaptive immunity, the intrinsic apoptotic signalling pathway and activity of the ERK1/2 pathway played a significant role across multiple aging mechanisms and multiple, diverse age-related diseases. Mechanisms of aging therefore contribute to multiple age-related diseases and to patterns of human age-related multimorbidity, and could potentially be targeted to prevent more than one age-related condition in the same patient

    Use of Text Data in Identifying and Prioritizing Potential Drug Repositioning Candidates

    Get PDF
    New drug development costs between 500 million and 2 billion dollars and takes 10-15 years, with a success rate of less than 10%. Drug repurposing (defined as discovering new indications for existing drugs) could play a significant role in drug development, especially considering the declining success rates of developing novel drugs. In the period 2007-2009, drug repurposing led to the launching of 30-40% of new drugs. Typically, new indications for existing medications are identified by accident. However, new technologies and a large number of available resources enable the development of systematic approaches to identify and validate drug-repurposing candidates with significantly lower cost. A variety of resources have been utilized to identify novel drug repurposing candidates such as biomedical literature, clinical notes, and genetic data. In this dissertation, we focused on using text data in identifying and prioritizing drug repositioning candidates and conducted five studies. In the first study, we aimed to assess the feasibility of using patient reviews from social media to identify potential candidates for drug repurposing. We retrieved patient reviews of 180 medications from an online forum, WebMD. Using dictionary-based and machine learning approaches, we identified disease names in the reviews. Several publicly available resources were used to exclude comments containing known indications and adverse drug effects. After manually reviewing some of the remaining comments, we implemented a rule-based system to identify beneficial effects. The dictionary-based system and machine learning system identified 2178 and 6171 disease names respectively in 64,616 patient comments. We provided a list of 10 common patterns that patients used to report any beneficial effects or uses of medication. After manually reviewing the comments tagged by our rule-based system, we identified five potential drug repurposing candidates. To our knowledge, this was the first study to consider using social media data to identify drug-repurposing candidates. We found that even a rule-based system, with a limited number of rules, could identify beneficial effect mentions in the comments of patients. Our preliminary study shows that social media has the potential to be used in drug repurposing. In the second study, we investigated the significance of extracting information from multiple sentences specifically in the context of drug-disease relation discovery. We used multiple resources such as Semantic Medline, a literature-based resource, and Medline search (for filtering spurious results) and inferred 8,772 potential drug-disease pairs. Our analysis revealed that 6,450 (73.5%) of the 8,772 potential drug-disease relations did not occur in a single sentence. Moreover, only 537 of the drug-disease pairs matched the curated gold standard in the Comparative Toxicogenomics Database (CTD), a trusted resource for drug-disease relations. Among the 537, nearly 75% (407) of the drug-disease pairs occur in multiple sentences. Our analysis revealed that the drug-disease pairs inferred from Semantic Medline or retrieved from CTD could be extracted from multiple sentences in the literature. This highlights the significance of the need for discourse-level analysis in extracting the relations from biomedical literature. In the third and fourth study, we focused on prioritizing drug repositioning candidates extracted from biomedical literature which we refer to as Literature-Based Discovery (LBD). In the third study, we used drug-gene and gene-disease semantic predications extracted from Medline abstracts to generate a list of potential drug-disease pairs. We further ranked the generated pairs, by assigning scores based on the predicates that qualify drug-gene and gene-disease relationships. On comparing the top-ranked drug-disease pairs against the Comparative Toxicogenomics Database, we found that a significant percentage of top-ranked pairs appeared in CTD. Co-occurrence of these high-ranked pairs in Medline abstracts is then used to improve the rankings of the inferred drug-disease relations. Finally, manual evaluation of the top-ten pairs ranked by our approach revealed that nine of them have good potential for biological significance based on expert judgment. In the fourth study, we proposed a method, utilizing information surrounding causal findings, to prioritize discoveries generated by LBD systems. We focused on discovering drug-disease relations, which have the potential to identify drug repositioning candidates or adverse drug reactions. Our LBD system used drug-gene and gene-disease semantic predication in SemMedDB as causal findings and Swanson’s ABC model to generate potential drug-disease relations. Using sentences, as a source of causal findings, our ranking method trained a binary classifier to classify generated drug-disease relations into desired classes. We trained and tested our classifier for three different purposes: a) drug repositioning b) adverse drug-event detection and c) drug-disease relation detection. The classifier obtained 0.78, 0.86, and 0.83 F-measures respectively for these tasks. The number of causal findings of each hypothesis, which were classified as positive by the classifier, is the main metric for ranking hypotheses in the proposed method. To evaluate the ranking method, we counted and compared the number of true relations in the top 100 pairs, ranked by our method and one of the previous methods. Out of 181 true relations in the test dataset, the proposed method ranked 20 of them in the top 100 relations while this number was 13 for the other method. In the last study, we used biomedical literature and clinical trials in ranking potential drug repositioning candidates identified by Phenome-Wide Association Studies (PheWAS). Unlike previous approaches, in this study, we did not limit our method to LBD. First, we generated a list of potential drug repositioning candidates using PheWAS. We retrieved 212,851 gene-disease associations from PheWAS catalog and 14,169 gene-drug relationships from DrugBank. Following Swanson’s model, we generated 52,966 potential drug repositioning candidates. Then, we developed an information retrieval system to retrieve any evidence of those candidates co-occurring in the biomedical literature and clinical trials. We identified nearly 14,800 drug-disease pairs with some evidence of support. In addition, we identified more than 38,000 novel candidates for re-purposing, encompassing hundreds of different disease states and over 1,000 individual medications. We anticipate that these results will be highly useful for hypothesis generation in the field of drug repurposing

    Non-Coding RNAs Improve the Predictive Power of Network Medicine

    Full text link
    Network Medicine has improved the mechanistic understanding of disease, offering quantitative insights into disease mechanisms, comorbidities, and novel diagnostic tools and therapeutic treatments. Yet, most network-based approaches rely on a comprehensive map of protein-protein interactions, ignoring interactions mediated by non-coding RNAs (ncRNAs). Here, we systematically combine experimentally confirmed binding interactions mediated by ncRNA with protein-protein interactions, constructing the first comprehensive network of all physical interactions in the human cell. We find that the inclusion of ncRNA, expands the number of genes in the interactome by 46% and the number of interactions by 107%, significantly enhancing our ability to identify disease modules. Indeed, we find that 132 diseases, lacked a statistically significant disease module in the protein-based interactome, but have a statistically significant disease module after inclusion of ncRNA-mediated interactions, making these diseases accessible to the tools of network medicine. We show that the inclusion of ncRNAs helps unveil disease-disease relationships that were not detectable before and expands our ability to predict comorbidity patterns between diseases. Taken together, we find that including non-coding interactions improves both the breath and the predictive accuracy of network medicine.Comment: Paper and S

    Conceptualization of Computational Modeling Approaches and Interpretation of the Role of Neuroimaging Indices in Pathomechanisms for Pre-Clinical Detection of Alzheimer Disease

    Get PDF
    With swift advancements in next-generation sequencing technologies alongside the voluminous growth of biological data, a diversity of various data resources such as databases and web services have been created to facilitate data management, accessibility, and analysis. However, the burden of interoperability between dynamically growing data resources is an increasingly rate-limiting step in biomedicine, specifically concerning neurodegeneration. Over the years, massive investments and technological advancements for dementia research have resulted in large proportions of unmined data. Accordingly, there is an essential need for intelligent as well as integrative approaches to mine available data and substantiate novel research outcomes. Semantic frameworks provide a unique possibility to integrate multiple heterogeneous, high-resolution data resources with semantic integrity using standardized ontologies and vocabularies for context- specific domains. In this current work, (i) the functionality of a semantically structured terminology for mining pathway relevant knowledge from the literature, called Pathway Terminology System, is demonstrated and (ii) a context-specific high granularity semantic framework for neurodegenerative diseases, known as NeuroRDF, is presented. Neurodegenerative disorders are especially complex as they are characterized by widespread manifestations and the potential for dramatic alterations in disease progression over time. Early detection and prediction strategies through clinical pointers can provide promising solutions for effective treatment of AD. In the current work, we have presented the importance of bridging the gap between clinical and molecular biomarkers to effectively contribute to dementia research. Moreover, we address the need for a formalized framework called NIFT to automatically mine relevant clinical knowledge from the literature for substantiating high-resolution cause-and-effect models

    Computational Systems Analysis on Polycystic Ovarian Syndrome (PCOS)

    Get PDF
    Complex diseases are caused by a combination of genetic and environmental factors. Unraveling the molecular pathways from the genetic factors that affect a phenotype is always difficult, but in the case of complex diseases, this is further complicated since genetic factors in affected individuals might be different. Polycystic ovarian syndrome (PCOS) is an example of a complex disease with limited molecular information. Recently, PCOS molecular omics data have increasingly appeared in many publications. We conduct extensive bioinformatics analyses on the data and perform strong integration of experimental and computational biology to understand its complex biological systems in examining multiple interacting genes and their products. PCOS involves networks of genes, and to understand them, those networks must be mapped. This approach has emerged as powerful tools for studying complex diseases and been coined as network biology. Network biology encompasses wide range of network types including those based on physical interactions between and among cellular components and those baised on similarity among patients or diseases. Each of these offers distinct biological clues that may help scientists transform their cellular parts list into insights about complex diseases. This chapter will discuss some computational analysis aspects on the omics studies that have been conducted in PCOS

    Comorbidities in the diseasome are more apparent than real: What Bayesian filtering reveals about the comorbidities of depression

    Get PDF
    Comorbidity patterns have become a major source of information to explore shared mechanisms of pathogenesis between disorders. In hypothesis-free exploration of comorbid conditions, disease-disease networks are usually identified by pairwise methods. However, interpretation of the results is hindered by several confounders. In particular a very large number of pairwise associations can arise indirectly through other comorbidity associations and they increase exponentially with the increasing breadth of the investigated diseases. To investigate and filter this effect, we computed and compared pairwise approaches with a systems-based method, which constructs a sparse Bayesian direct multimorbidity map (BDMM) by systematically eliminating disease-mediated comorbidity relations. Additionally, focusing on depression-related parts of the BDMM, we evaluated correspondence with results from logistic regression, text-mining and molecular-level measures for comorbidities such as genetic overlap and the interactome-based association score. We used a subset of the UK Biobank Resource, a cross-sectional dataset including 247 diseases and 117,392 participants who filled out a detailed questionnaire about mental health. The sparse comorbidity map confirmed that depressed patients frequently suffer from both psychiatric and somatic comorbid disorders. Notably, anxiety and obesity show strong and direct relationships with depression. The BDMM identified further directly co-morbid somatic disorders, e.g. irritable bowel syndrome, fibromyalgia, or migraine. Using the subnetwork of depression and metabolic disorders for functional analysis, the interactome-based system-level score showed the best agreement with the sparse disease network. This indicates that these epidemiologically strong disease-disease relations have improved correspondence with expected molecular-level mechanisms. The substantially fewer number of comorbidity relations in the BDMM compared to pairwise methods implies that biologically meaningful comorbid relations may be less frequent than earlier pairwise methods suggested. The computed interactive comprehensive multimorbidity views over the diseasome are available on the web at Co=MorNet: bioinformatics.mit.bme.hu/UKBNetworks
    corecore