589 research outputs found
Identifying Compound-Target Associations by Combining Bioactivity Profile Similarity Search and Public Databases Mining
Molecular target identification is of central importance to drug discovery. Here, we developed a computational approach, named bioactivity profile similarity search (BASS), for associating targets to small molecules by using the known target annotations of related compounds from public databases. To evaluate BASS, a bioactivity profile database was constructed using 4296 compounds that were commonly tested in the US National Cancer Institute 60 human tumor cell line anticancer drug screen (NCI-60). Each compound was used as a query to search against the entire bioactivity profile database, and reference compounds with similar bioactivity profiles above a threshold of 0.75 were considered as neighbor compounds of the query. Potential targets were subsequently linked to the identified neighbor compounds by using the known targets o
Network-driven strategies to integrate and exploit biomedical data
[eng] In the quest for understanding complex biological systems, the scientific community has been delving into protein, chemical and disease biology, populating biomedical databases with a wealth of data and knowledge. Currently, the field of biomedicine has entered a Big Data era, in which computational-driven research can largely benefit from existing knowledge to better understand and characterize biological and chemical entities. And yet, the heterogeneity and complexity of biomedical data trigger the need for a proper integration and representation of this knowledge, so that it can be effectively and efficiently exploited.
In this thesis, we aim at developing new strategies to leverage the current biomedical knowledge, so that meaningful information can be extracted and fused into downstream applications. To this goal, we have capitalized on network analysis algorithms to integrate and exploit biomedical data in a wide variety of scenarios, providing a better understanding of pharmacoomics experiments while helping accelerate the drug discovery process. More specifically, we have (i) devised an approach to identify functional gene sets associated with drug response mechanisms of action, (ii) created a resource of biomedical descriptors able to anticipate cellular drug response and identify new drug repurposing opportunities, (iii) designed a tool to annotate biomedical support for a given set of experimental observations, and (iv) reviewed different chemical and biological descriptors relevant for drug discovery, illustrating how they can be used to provide solutions to current challenges in biomedicine.[cat] En la cerca d’una millor comprensió dels sistemes biològics complexos, la comunitat científica ha estat aprofundint en la biologia de les proteïnes, fàrmacs i malalties, poblant les bases de dades biomèdiques amb un gran volum de dades i coneixement. En l’actualitat, el camp de la biomedicina es troba en una era de “dades massives” (Big Data), on la investigació duta a terme per ordinadors se’n pot beneficiar per entendre i caracteritzar millor les entitats químiques i biològiques. No obstant, la heterogeneïtat i complexitat de les dades biomèdiques requereix que aquestes s’integrin i es representin d’una manera idònia, permetent així explotar aquesta informació d’una manera efectiva i eficient.
L’objectiu d’aquesta tesis doctoral és desenvolupar noves estratègies que permetin explotar el coneixement biomèdic actual i així extreure informació rellevant per aplicacions biomèdiques futures. Per aquesta finalitat, em fet servir algoritmes de xarxes per tal d’integrar i explotar el coneixement biomèdic en diferents tasques, proporcionant un millor enteniment dels experiments farmacoòmics per tal d’ajudar accelerar el procés de descobriment de nous fàrmacs. Com a resultat, en aquesta tesi hem (i) dissenyat una estratègia per identificar grups funcionals de gens associats a la resposta de línies cel·lulars als fàrmacs, (ii) creat una col·lecció de descriptors biomèdics capaços, entre altres coses, d’anticipar com les cèl·lules responen als fàrmacs o trobar nous usos per fàrmacs existents, (iii) desenvolupat una eina per descobrir quins contextos biològics corresponen a una associació biològica observada experimentalment i, finalment, (iv) hem explorat diferents descriptors químics i biològics rellevants pel procés de descobriment de nous fàrmacs, mostrant com aquests poden ser utilitzats per trobar solucions a reptes actuals dins el camp de la biomedicina
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
Machine Learning Applications for Drug Repurposing
The cost of bringing a drug to market is astounding and the failure rate is intimidating. Drug discovery has been of limited success under the conventional reductionist model of one-drug-one-gene-one-disease paradigm, where a single disease-associated gene is identified and a molecular binder to the specific target is subsequently designed. Under the simplistic paradigm of drug discovery, a drug molecule is assumed to interact only with the intended on-target. However, small molecular drugs often interact with multiple targets, and those off-target interactions are not considered under the conventional paradigm. As a result, drug-induced side effects and adverse reactions are often neglected until a very late stage of the drug discovery, where the discovery of drug-induced side effects and potential drug resistance can decrease the value of the drug and even completely invalidate the use of the drug. Thus, a new paradigm in drug discovery is needed.
Structural systems pharmacology is a new paradigm in drug discovery that the drug activities are studied by data-driven large-scale models with considerations of the structures and drugs. Structural systems pharmacology will model, on a genome scale, the energetic and dynamic modifications of protein targets by drug molecules as well as the subsequent collective effects of drug-target interactions on the phenotypic drug responses. To date, however, few experimental and computational methods can determine genome-wide protein-ligand interaction networks and the clinical outcomes mediated by them. As a result, the majority of proteins have not been charted for their small molecular ligands; we have a limited understanding of drug actions. To address the challenge, this dissertation seeks to develop and experimentally validate innovative computational methods to infer genome-wide protein-ligand interactions and multi-scale drug-phenotype associations, including drug-induced side effects. The hypothesis is that the integration of data-driven bioinformatics tools with structure-and-mechanism-based molecular modeling methods will lead to an optimal tool for accurately predicting drug actions and drug associated phenotypic responses, such as side effects.
This dissertation starts by reviewing the current status of computational drug discovery for complex diseases in Chapter 1. In Chapter 2, we present REMAP, a one-class collaborative filtering method to predict off-target interactions from protein-ligand interaction network. In our later work, REMAP was integrated with structural genomics and statistical machine learning methods to design a dual-indication polypharmacological anticancer therapy. In Chapter 3, we extend REMAP, the core method in Chapter 2, into a multi-ranked collaborative filtering algorithm, WINTF, and present relevant mathematical justifications. Chapter 4 is an application of WINTF to repurpose an FDA-approved drug diazoxide as a potential treatment for triple negative breast cancer, a deadly subtype of breast cancer. In Chapter 5, we present a multilayer extension of REMAP, applied to predict drug-induced side effects and the associated biological pathways. In Chapter 6, we close this dissertation by presenting a deep learning application to learn biochemical features from protein sequence representation using a natural language processing method
TDR Targets 6: driving drug discovery for human pathogens through intensive chemogenomic data integration
The volume of biological, chemical and functional data deposited in the public domain is growing rapidly, thanks to next generation sequencing and highly-automated screening technologies. These datasets represent invaluable resources for drug discovery, particularly for less studied neglected disease pathogens. To leverage these datasets, smart and intensive data integration is required to guide computational inferences across diverse organisms. The TDR Targets chemogenomics resource integrates genomic data from human pathogens and model organisms along with information on bioactive compounds and their annotated activities. This report highlights the latest updates on the available data and functionality in TDR Targets 6. Based on chemogenomic network models providing links between inhibitors and targets, the database now incorporates network-driven target prioritizations, and novel visualizations of network subgraphs displaying chemical- and target-similarity neighborhoods along with associated target-compound bioactivity links. Available data can be browsed and queried through a new user interface, that allow users to perform prioritizations of protein targets and chemical inhibitors. As such, TDR Targets now facilitates the investigation of drug repurposing against pathogen targets, which can potentially help in identifying candidate targets for bioactive compounds with previously unknown targets.Fil: Urán Landaburu, Héctor Lionel. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. - Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Biotecnológicas; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas; ArgentinaFil: Berenstein, Ariel José. Fundación Instituto Leloir; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Videla, Santiago. Fundación Instituto Leloir; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Maru, Parag. National Chemical Laboratory; India. Academy of Scientific and
Innovative Research; IndiaFil: Shanmugam, Dhanasekaran. Academy of Scientific and
Innovative Research; India. National Chemical Laboratory; IndiaFil: Chernomoretz, Ariel. Fundación Instituto Leloir; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Física de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Física de Buenos Aires; ArgentinaFil: Agüero, Fernan Gonzalo. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. - Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Biotecnológicas; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Investigaciones Biológicas y Tecnológicas. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas, Físicas y Naturales. Instituto de Investigaciones Biológicas y Tecnológicas; Argentin
Knowledge-based Biomedical Data Science 2019
Knowledge-based biomedical data science (KBDS) involves the design and
implementation of computer systems that act as if they knew about biomedicine.
Such systems depend on formally represented knowledge in computer systems,
often in the form of knowledge graphs. Here we survey the progress in the last
year in systems that use formally represented knowledge to address data science
problems in both clinical and biological domains, as well as on approaches for
creating knowledge graphs. Major themes include the relationships between
knowledge graphs and machine learning, the use of natural language processing,
and the expansion of knowledge-based approaches to novel domains, such as
Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages
with 3 table
NETWORK INFERENCE DRIVEN DRUG DISCOVERY
The application of rational drug design principles in the era of network-pharmacology requires the investigation of drug-target and target-target interactions in order to design new drugs. The presented research was aimed at developing novel computational methods that enable the efficient analysis of complex biomedical data and to promote the hypothesis generation in the context of translational research. The three chapters of the Dissertation relate to various segments of drug discovery and development process.
The first chapter introduces the integrated predictive drug discovery platform „SmartGraph”. The novel collaborative-filtering based algorithm „Target Based Recommender (TBR)” was developed in the framework of this project and was validated on a set of 28,270 experimentally determined bioactivity data points involving 1,882 compounds and 869 targets. The TBR is integrated into the SmartGraph platform. The graphical interface of SmartGraph enables data analysis and hypothesis generation even for investigators without substantial bioinformatics knowledge. The platform can be utilized in the context of target identification, drug-target prediction and drug repurposing.
The second chapter of the Dissertation introduces an information theory inspired dynamic network model and the novel “Luminosity Diffusion (LD)” algorithm. The model can be utilized to prioritize protein targets for drug discovery purposes on the basis of available information and the importance of the targets. The importance of targets is accounted for in the information flow simulation process and is derived merely from network topology. The LD algorithm was validated on 8,010 relations of 794 proteins extracted from the Target Central Resource Database developed in the framework of the “Illuminating the Druggable Genome” project.
The last chapter discusses a fundamental problem pertaining to the generation of similarity network of molecules and their clustering. The network generation process relies on the selection of a similarity threshold. The presented work introduces a network topology based systematic solution for selecting this threshold so that the likelihood of a reasonable clustering can be increased. Furthermore, the work proposes a solution for generating so-called “pseudo-reference clustering” for large molecular data sets for performance evaluation purposes. The results of this chapter are applicable in the lead identification and development processes
Systems approaches to drug repositioning
PhD ThesisDrug discovery has overall become less fruitful and more costly, despite vastly increased
biomedical knowledge and evolving approaches to Research and Development (R&D).
One complementary approach to drug discovery is that of drug repositioning which
focusses on identifying novel uses for existing drugs. By focussing on existing drugs
that have already reached the market, drug repositioning has the potential to both
reduce the timeframe and cost of getting a disease treatment to those that need it.
Many marketed examples of repositioned drugs have been found via serendipitous or
rational observations, highlighting the need for more systematic methodologies.
Systems approaches have the potential to enable the development of novel methods to
understand the action of therapeutic compounds, but require an integrative approach
to biological data. Integrated networks can facilitate systems-level analyses by combining
multiple sources of evidence to provide a rich description of drugs, their targets and
their interactions. Classically, such networks can be mined manually where a skilled
person can identify portions of the graph that are indicative of relationships between
drugs and highlight possible repositioning opportunities. However, this approach is
not scalable. Automated procedures are required to mine integrated networks systematically
for these subgraphs and bring them to the attention of the user. The aim
of this project was the development of novel computational methods to identify new
therapeutic uses for existing drugs (with particular focus on active small molecules)
using data integration.
A framework for integrating disparate data relevant to drug repositioning, Drug Repositioning
Network Integration Framework (DReNInF) was developed as part of this
work. This framework includes a high-level ontology, Drug Repositioning Network
Integration Ontology (DReNInO), to aid integration and subsequent mining; a suite
of parsers; and a generic semantic graph integration platform. This framework enables
the production of integrated networks maintaining strict semantics that are important
in, but not exclusive to, drug repositioning. The DReNInF is then used to create Drug Repositioning Network Integration (DReNIn), a semantically-rich Resource Description
Framework (RDF) dataset. A Web-based front end was developed, which includes
a SPARQL Protocol and RDF Query Language (SPARQL) endpoint for querying this
dataset.
To automate the mining of drug repositioning datasets, a formal framework for the
definition of semantic subgraphs was established and a method for Drug Repositioning
Semantic Mining (DReSMin) was developed. DReSMin is an algorithm for mining
semantically-rich networks for occurrences of a given semantic subgraph. This algorithm
allows instances of complex semantic subgraphs that contain data about putative
drug repositioning opportunities to be identified in a computationally tractable
fashion, scaling close to linearly with network data.
The ability of DReSMin to identify novel Drug-Target (D-T) associations was investigated.
9,643,061 putative D-T interactions were identified and ranked, with a strong
correlation between highly scored associations and those supported by literature observed.
The 20 top ranked associations were analysed in more detail with 14 found
to be novel and six found to be supported by the literature. It was also shown that
this approach better prioritises known D-T interactions, than other state-of-the-art
methodologies.
The ability of DReSMin to identify novel Drug-Disease (Dr-D) indications was also
investigated. As target-based approaches are utilised heavily in the field of drug discovery,
it is necessary to have a systematic method to rank Gene-Disease (G-D) associations.
Although methods already exist to collect, integrate and score these associations,
these scores are often not a reliable re
flection of expert knowledge. Therefore, an
integrated data-driven approach to drug repositioning was developed using a Bayesian
statistics approach and applied to rank 309,885 G-D associations using existing knowledge.
Ranked associations were then integrated with other biological data to produce
a semantically-rich drug discovery network. Using this network it was shown that
diseases of the central nervous system (CNS) provide an area of interest. The network
was then systematically mined for semantic subgraphs that capture novel Dr-D relations.
275,934 Dr-D associations were identified and ranked, with those more likely to
be side-effects filtered. Work presented here includes novel tools and algorithms to enable research within
the field of drug repositioning. DReNIn, for example, includes data that previous
comparable datasets relevant to drug repositioning have neglected, such as clinical
trial data and drug indications. Furthermore, the dataset may be easily extended
using DReNInF to include future data as and when it becomes available, such as G-D
association directionality (i.e. is the mutation a loss-of-function or gain-of-function).
Unlike other algorithms and approaches developed for drug repositioning, DReSMin
can be used to infer any types of associations captured in the target semantic network.
Moreover, the approaches presented here should be more generically applicable to
other fields that require algorithms for the integration and mining of semantically rich
networks.European and Physical Sciences Research Council (EPSRC) and GS
- …