2,266 research outputs found

    MOLIERE: Automatic Biomedical Hypothesis Generation System

    Get PDF
    Hypothesis generation is becoming a crucial time-saving technique which allows biomedical researchers to quickly discover implicit connections between important concepts. Typically, these systems operate on domain-specific fractions of public medical data. MOLIERE, in contrast, utilizes information from over 24.5 million documents. At the heart of our approach lies a multi-modal and multi-relational network of biomedical objects extracted from several heterogeneous datasets from the National Center for Biotechnology Information (NCBI). These objects include but are not limited to scientific papers, keywords, genes, proteins, diseases, and diagnoses. We model hypotheses using Latent Dirichlet Allocation applied on abstracts found near shortest paths discovered within this network, and demonstrate the effectiveness of MOLIERE by performing hypothesis generation on historical data. Our network, implementation, and resulting data are all publicly available for the broad scientific community

    MOLIERE: Automatic Biomedical Hypothesis Generation System

    Get PDF
    Hypothesis generation is becoming a crucial time-saving technique which allows biomedical researchers to quickly discover implicit connections between important concepts. Typically, these systems operate on domain-specific fractions of public medical data. MOLIERE, in contrast, utilizes information from over 24.5 million documents. At the heart of our approach lies a multi-modal and multi-relational network of biomedical objects extracted from several heterogeneous datasets from the National Center for Biotechnology Information (NCBI). These objects include but are not limited to scientific papers, keywords, genes, proteins, diseases, and diagnoses. We model hypotheses using Latent Dirichlet Allocation applied on abstracts found near shortest paths discovered within this network, and demonstrate the effectiveness of MOLIERE by performing hypothesis generation on historical data. Our network, implementation, and resulting data are all publicly available for the broad scientific community

    Integrating differential expression, co-expression and gene network analysis for the identification of common genes associated with tumor angiogenesis deregulation

    Get PDF
    Angiogenesis is essential for tumor growth and cancer metastasis. Identifying the molecular pathways involved in this process is the first step in the rational design of new therapeutic strategies to improve cancer treatment. In recent years, RNA-seq data analysis has helped to determine the genetic and molecular factors associated with different types of cancer. In this work we performed integrative analysis using RNA-seq data from human umbilical vein endothelial cells (HUVEC) and patients with angiogenesis-dependent diseases to find genes that serve as potential candidates to improve the prognosis of tumor angiogenesis deregulation and understand how this process is orchestrated at the genetic and molecular level. We downloaded four RNA-seq datasets (including cellular models of tumor angiogenesis and ischaemic heart disease) from the Sequence Read Archive. Our integrative analysis includes a first step to determine differentially and co-expressed genes. For this, we used the ExpHunter Suite, an R package that performs differential expression, co-expression and functional analysis of RNA-seq data. We used both differentially and co-expressed genes to explore the human gene interaction network and determine which genes were found in the different datasets that may be key for the angiogenesis deregulation. Finally, we performed drug repositioning analysis to find potential targets related to angiogenesis inhibition...This work was supported by the Spanish Ministry of Science, Innovation and Universities (grant PID2019-105010RB-I00, grant PID2019-108096RB-C21), the Andalusian Government and FEDER (grants UMA18-FEDERJA-102, UMA18-FEDERJA-220, PY20_00257, PY20_00372, RH-0079-2021 and funds from the group PAIDI BIO 267); the Spanish Ministry of Economy and Competitiveness (grant PID2019-108096RB-C21), the Institute of Health Carlos III (project IMPaCT-Data, exp. IMP/00019), co-funded by the European Union, European Regional Development Fund (ERDF, ‘‘A way to make Europe"); and the European Union (HORIZON-HLTH-2022-DISEASE-06, Project ID: 101080580) to JAGR. JRP holds a research grant from the Andalusian Government (Fundacion Progreso y Salud) [PI-0075-2017]. BM is awarded of the Ayudas para la formación del profesorado universitario (FPU18/00755, Ministerio de Universidades). The ‘‘CIBER de Enfermedades Raras’’ is an initiative from the ISCIII (Spain). The funders had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript. Funding for open access charge: Universidad de Málaga / CBU

    Discovering lesser known molecular players and mechanistic patterns in Alzheimer's disease using an integrative disease modelling approach

    Get PDF
    Convergence of exponentially advancing technologies is driving medical research with life changing discoveries. On the contrary, repeated failures of high-profile drugs to battle Alzheimer's disease (AD) has made it one of the least successful therapeutic area. This failure pattern has provoked researchers to grapple with their beliefs about Alzheimer's aetiology. Thus, growing realisation that Amyloid-β and tau are not 'the' but rather 'one of the' factors necessitates the reassessment of pre-existing data to add new perspectives. To enable a holistic view of the disease, integrative modelling approaches are emerging as a powerful technique. Combining data at different scales and modes could considerably increase the predictive power of the integrative model by filling biological knowledge gaps. However, the reliability of the derived hypotheses largely depends on the completeness, quality, consistency, and context-specificity of the data. Thus, there is a need for agile methods and approaches that efficiently interrogate and utilise existing public data. This thesis presents the development of novel approaches and methods that address intrinsic issues of data integration and analysis in AD research. It aims to prioritise lesser-known AD candidates using highly curated and precise knowledge derived from integrated data. Here much of the emphasis is put on quality, reliability, and context-specificity. This thesis work showcases the benefit of integrating well-curated and disease-specific heterogeneous data in a semantic web-based framework for mining actionable knowledge. Furthermore, it introduces to the challenges encountered while harvesting information from literature and transcriptomic resources. State-of-the-art text-mining methodology is developed to extract miRNAs and its regulatory role in diseases and genes from the biomedical literature. To enable meta-analysis of biologically related transcriptomic data, a highly-curated metadata database has been developed, which explicates annotations specific to human and animal models. Finally, to corroborate common mechanistic patterns — embedded with novel candidates — across large-scale AD transcriptomic data, a new approach to generate gene regulatory networks has been developed. The work presented here has demonstrated its capability in identifying testable mechanistic hypotheses containing previously unknown or emerging knowledge from public data in two major publicly funded projects for Alzheimer's, Parkinson's and Epilepsy diseases

    Unblocking Blockbusters: Using Boolean Text-Mining to Optimise Clinical Trial Design and Timeline for Novel Anticancer Drugs

    Get PDF
    Two problems now threaten the future of anticancer drug development: (i) the information explosion has made research into new target-specific drugs more duplication-prone, and hence less cost-efficient; and (ii) high-throughput genomic technologies have failed to deliver the anticipated early windfall of novel first-in-class drugs. Here it is argued that the resulting crisis of blockbuster drug development may be remedied in part by innovative exploitation of informatic power. Using scenarios relating to oncology, it is shown that rapid data-mining of the scientific literature can refine therapeutic hypotheses and thus reduce empirical reliance on preclinical model development and early-phase clinical trials. Moreover, as personalised medicine evolves, this approach may inform biomarker-guided phase III trial strategies for noncytotoxic (antimetastatic) drugs that prolong patient survival without necessarily inducing tumor shrinkage. Though not replacing conventional gold standards, these findings suggest that this computational research approach could reduce costly ‘blue skies’ R&D investment and time to market for new biological drugs, thereby helping to reverse unsustainable drug price inflation

    Knowledge Management approaches to model pathophysiological mechanisms and discover drug targets in Multiple Sclerosis

    Get PDF
    Multiple Sclerosis (MS) is one of the most prevalent neurodegenerative diseases for which a cure is not yet available. MS is a complex disease for numerous reasons; its etiology is unknown, the diagnosis is not exclusive, the disease course is unpredictable and therapeutic response varies from patient to patient. There are four established subtypes of MS, which are segregated based on different characteristics. Many environmental and genetic factors are considered to play a role in MS etiology, including viral infection, vitamin D deficiency, epigenetical changes and some genes. Despite the large body of diverse scientific knowledge, from laboratory findings to clinical trials, no integrated model which portrays the underlying mechanisms of the disease state of MS is available. Contemporary therapies only provide reduction in the severity of the disease, and there is an unmet need of efficient drugs. The present thesis provides a knowledge-based rationale to model MS disease mechanisms and identify potential drug candidates by using systems biology approaches. Systems biology is an emerging field which utilizes the computational methods to integrate datasets of various granularities and simulate the disease outcome. It provides a framework to model molecular dynamics with their precise interaction and contextual details. The proposed approaches were used to extract knowledge from literature by state of the art text mining technologies, integrate it with proprietary data using semantic platforms, and build different models (molecular interactions map, agent based models to simulate disease outcome, and MS disease progression model with respect to time). For better information representation, disease ontology was also developed and a methodology of automatic enrichment was derived. The models provide an insight into the disease, and several pathways were explored by combining the therapeutics and the disease-specific prescriptions. The approaches and models developed in this work resulted in the identification of novel drug candidates that are backed up by existing experimental and clinical knowledge

    Explainable artificial intelligence for patient stratification and drug repositioning

    Get PDF
    Enabling precision medicine requires developing robust patient stratification methods as well as drugs tailored to homogeneous subgroups of patients from a heterogeneous population. Developing de novo drugs is expensive and time consuming with an ultimately low FDA approval rate. These limitations make developing new drugs for a small portion of a disease population unfeasible. Therefore, drug repositioning is an essential alternative for developing new drugs for a disease subpopulation. There is a crucial need to develop data-driven approaches that find druggable homogeneous subgroups within the disease population and reposition the drugs for these subgroups. In this study, we developed an explainable AI approach for patient stratification and drug repositioning. Exploratory mining mimicking the trial recruitment process as well as network analysis were used to discover homogeneous subgroups within a disease population. For each subgroup, a biomedical network analysis was done to find the drugs that are most relevant to a given subgroup of patients. The set of candidate drugs for each subgroup was ranked using an aggregated drug score assigned to each drug. The method represents a human-in-the-loop framework, where medical experts use data-driven results to generate hypotheses and obtain insights into potential therapeutic candidates for patients who belong to a subgroup. To examine the validity of our method, we implemented our method on individual cancer types and on pan-cancer data to consider the inter- and intra-heterogeneity within a cancer type and among cancer types. Patients' phenotypic and genotypic data was utilized with a heterogeneous knowledge base because it gives a multi-view perspective for finding new indications for drugs outside of their original use. Our analysis of the top candidate drugs for the subgroups showed that most of these drugs are FDA-approved drugs for cancer, and others are non-cancer related, but have the potential to be repurposed for cancer. We have discovered novel cancer-related mechanisms that these drugs can target in different cancer types to reduce cancer treatment costs and improve patient survival. Further wet lab experiments to validate these findings are required prior to initiating clinical trials using these repurposed therapies.Includes bibliographical references

    Literature-Based Enrichment Insights into Redox Control of Vascular Biology

    Get PDF
    In cellular physiology and signaling, reactive oxygen species (ROS) play one of the most critical roles. ROS overproduction leads to cellular oxidative stress. This may lead to an irrecoverable imbalance of redox (oxidation-reduction reaction) function that deregulates redox homeostasis, which itself could lead to several diseases including neurodegenerative disease, cardiovascular disease, and cancers. In this study, we focus on the redox effects related to vascular systems in mammals. To support research in this domain, we developed an online knowledge base, DES-RedoxVasc, which enables exploration of information contained in the biomedical scientific literature. The DES-RedoxVasc system analyzed 233399 documents consisting of PubMed abstracts and PubMed Central full-text articles related to different aspects of redox biology in vascular systems. It allows researchers to explore enriched concepts from 28 curated thematic dictionaries, as well as literature-derived potential associations of pairs of such enriched concepts, where associations themselves are statistically enriched. For example, the system allows exploration of associations of pathways, diseases, mutations, genes/proteins, miRNAs, long ncRNAs, toxins, drugs, biological processes, molecular functions, etc. that allow for insights about different aspects of redox effects and control of processes related to the vascular system. Moreover, we deliver case studies about some existing or possibly novel knowledge regarding redox of vascular biology demonstrating the usefulness of DES-RedoxVasc. DES-RedoxVasc is the first compiled knowledge base using text mining for the exploration of this topic