74 research outputs found
Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer.
Inhibiting protein kinases (PKs) that cause cancers has been an important topic in cancer therapy for years. So far, almost 8% of \u3e530 PKs have been targeted by FDA-approved medications, and around 150 protein kinase inhibitors (PKIs) have been tested in clinical trials. We present an approach based on natural language processing and machine learning to investigate the relations between PKs and cancers, predicting PKs whose inhibition would be efficacious to treat a certain cancer. Our approach represents PKs and cancers as semantically meaningful 100-dimensional vectors based on word and concept neighborhoods in PubMed abstracts. We use information about phase I-IV trials in ClinicalTrials.gov to construct a training set for random forest classification. Our results with historical data show that associations between PKs and specific cancers can be predicted years in advance with good accuracy. Our tool can be used to predict the relevance of inhibiting PKs for specific cancers and to support the design of well-focused clinical trials to discover novel PKIs for cancer therapy
Integrative Data Analytic Framework to Enhance Cancer Precision Medicine
With the advancement of high-throughput biotechnologies, we increasingly
accumulate biomedical data about diseases, especially cancer. There is a need
for computational models and methods to sift through, integrate, and extract
new knowledge from the diverse available data to improve the mechanistic
understanding of diseases and patient care. To uncover molecular mechanisms and
drug indications for specific cancer types, we develop an integrative framework
able to harness a wide range of diverse molecular and pan-cancer data. We show
that our approach outperforms competing methods and can identify new
associations. Furthermore, through the joint integration of data sources, our
framework can also uncover links between cancer types and molecular entities
for which no prior knowledge is available. Our new framework is flexible and
can be easily reformulated to study any biomedical problems.Comment: 18 page
Effective Entity Augmentation By Querying External Data Sources
Users often want to augment and enrich entities in their datasets with relevant information from external data sources. As many external sources are accessible only via keyword-search interfaces, a user usually has to manually formulate a keyword query that extract relevant information for each entity. This approach is challenging as many data sources contain numerous tuples, only a small fraction of which may contain entity-relevant information. Furthermore, different datasets may represent the same information in distinct forms and under different terms (e.g., different data source may use different names to refer to the same person). In such cases, it is difficult to formulate a query that precisely retrieves information relevant to an entity. Current methods for information enrichment mainly rely on lengthy and resource-intensive manual effort to formulate queries to discover relevant information. However, in increasingly many settings, it is important for users to get initial answers quickly and without substantial investment in resources (such as human attention). We propose a progressive approach to discovering entity-relevant information from external sources with minimal expert intervention. It leverages end users\u27 feedback to progressively learn how to retrieve information relevant to each entity in a dataset from external data sources. Our empirical evaluation shows that our approach learns accurate strategies to deliver relevant information quickly
GUILDify v2.0:A Tool to Identify Molecular Networks Underlying Human Diseases, Their Comorbidities and Their Druggable Targets
The genetic basis of complex diseases involves alterations on multiple genes. Unraveling the interplay between these genetic factors is key to the discovery of new biomarkers and treatments. In 2014, we introduced GUILDify, a web server that searches for genes associated to diseases, finds novel disease genes applying various network-based prioritization algorithms and proposes candidate drugs. Here, we present GUILDify v2.0, a major update and improvement of the original method, where we have included protein interaction data for seven species and 22 human tissues and incorporated the disease-gene associations from DisGeNET. To infer potential disease relationships associated with multi-morbidities, we introduced a novel feature for estimating the genetic and functional overlap of two diseases using the top-ranking genes and the associated enrichment of biological functions and pathways (as defined by GO and Reactome). The analysis of this overlap helps to identify the mechanistic role of genes and protein-protein interactions in comorbidities. Finally, we provided an R package, guildifyR, to facilitate programmatic access to GUILDify v2.0 (http://sbi.upf.edu/guildify2).The authors received support from: ISCIII-FEDER (PI13/00082, CP10/00524, CPII16/00026); IMI-JU
under grants agreements no. 116030 (TransQST) and no. 777365 (eTRANSAFE), resources of which
are composed of financial contribution from the EU-FP7 (FP7/2007- 2013) and EFPIA companies in
kind contribution; the EU H2020 Programme 2014-2020 under grant agreements no. 634143
(MedBioinformatics) and no. 676559 (Elixir-Excelerate); the Spanish Ministry of Economy (MINECO)
[BIO2017-85329-R] [RYC-2015-17519]; "Unidad de Excelencia María de Maeztu", funded by the
Spanish Ministry of Economy [ref: MDM-2014-0370]. The Research Programme on Biomedical
Informatics (GRIB) is a member of the Spanish National Bioinformatics Institute (INB), PRB2-ISCIII
and is supported by grant PT13/0001/0023, of the PE I+D+i 2013-2016, funded by ISCIII and FEDER
Learning to Denoise Unreliable Interactions for Link Prediction on Biomedical Knowledge Graph
Link prediction in biomedical knowledge graphs (KGs) aims at predicting
unknown interactions between entities, including drug-target interaction (DTI)
and drug-drug interaction (DDI), which is critical for drug discovery and
therapeutics. Previous methods prefer to utilize the rich semantic relations
and topological structure of the KG to predict missing links, yielding
promising outcomes. However, all these works only focus on improving the
predictive performance without considering the inevitable noise and unreliable
interactions existing in the KGs, which limits the development of KG-based
computational methods. To address these limitations, we propose a Denoised Link
Prediction framework, called DenoisedLP. DenoisedLP obtains reliable
interactions based on the local subgraph by denoising noisy links in a
learnable way, providing a universal module for mining underlying task-relevant
relations. To collaborate with the smoothed semantic information, DenoisedLP
introduces the semantic subgraph by blurring conflict relations around the
predicted link. By maximizing the mutual information between the reliable
structure and smoothed semantic relations, DenoisedLP emphasizes the
informative interactions for predicting relation-specific links. Experimental
results on real-world datasets demonstrate that DenoisedLP outperforms
state-of-the-art methods on DTI and DDI prediction tasks, and verify the
effectiveness and robustness of denoising unreliable interactions on the
contaminated KGs
SuperDRUG2: a one stop resource for approved/marketed drugs
Regular monitoring of drug regulatory agency web sites and similar resources for information on new drug approvals and changes to legal status of marketed drugs is impractical. It requires navigation through several resources to find complete information about a drug as none of the publicly accessible drug databases provide all features essential to complement in silico drug discovery. Here, we propose SuperDRUG2 (http://cheminfo.charite.de/superdrug2) as a comprehensive knowledge-base of approved and marketed drugs. We provide the largest collection of drugs (containing 4587 active pharmaceutical ingredients) which include small molecules, biological products and other drugs. The database is intended to serve as a one-stop resource providing data on: chemical structures, regulatory details, indications, drug targets, side-effects, physicochemical properties, pharmacokinetics and drug-drug interactions. We provide a 3D-superposition feature that facilitates estimation of the fit of a drug in the active site of a target with a known ligand bound to it. Apart from multiple other search options, we introduced pharmacokinetics simulation as a unique feature that allows users to visualise the 'plasma concentration versus time' profile for a given dose of drug with few other adjustable parameters to simulate the kinetics in a healthy individual and poor or extensive metabolisers
CDEK: Clinical Drug Experience Knowledgebase
The Clinical Drug Experience Knowledgebase (CDEK) is a database and web platform of active pharmaceutical ingredients with evidence of clinical testing as well as the organizations involved in their research and development. CDEK was curated by disambiguating intervention and organization names from ClinicalTrials.gov and cross-referencing these entries with other prominent drug databases. Approximately 43% of active pharmaceutical ingredients in the CDEK database were sourced from ClinicalTrials.gov and cannot be found in any other prominent compound-oriented database. The contents of CDEK are structured around three pillars: active pharmaceutical ingredients (n = 22 292), clinical trials (n = 127 223) and organizations (n = 24 728). The envisioned use of the CDEK is to support the investigation of many aspects of drug development, including discovery, repurposing opportunities, chemo- and bio-informatics, clinical and translational research and regulatory sciences
DrugVirus.info 2.0 : an integrative data portal for broad-spectrum antivirals (BSA) and BSA-containing drug combinations (BCCs)
Viruses can cross species barriers and cause unpredictable outbreaks in man with substantial economic and public health burdens. Broad-spectrum antivirals, (BSAs, compounds inhibiting several human viruses), and BSA-containing drug combinations (BCCs) are deemed as immediate therapeutic options that fill the void between virus identification and vaccine development. Here, we present DrugVirus.info 2.0 (https://drugvirus.info), an integrative interactive portal for exploration and analysis of BSAs and BCCs, that greatly expands the database and functionality of DrugVirus.info 1.0 webserver. Through the data portal that now expands the spectrum of BSAs and provides information on BCCs, we developed two modules for (i) interactive analysis of users' own antiviral drug and combination screening data and their comparison with published datasets, and (ii) exploration of the structure-activity relationship between various BSAs. The updated portal provides an essential toolbox for antiviral drug development and repurposing applications aiming to identify existing and novel treatments of emerging and re-emerging viral threats. [GRAPHICS] .Peer reviewe
KG-Hub-building and exchanging biological knowledge graphs.
MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking.
RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification.
AVAILABILITY AND IMPLEMENTATION: https://kghub.org
- …