447 research outputs found
Automatic discovery of drug mode of action and drug repositioning from gene expression data
2009 - 2010The identification of the molecular pathway that is targeted by a compound,
combined with the dissection of the following reactions in the cellular environment,
i.e. the drug mode of action, is a key challenge in biomedicine.
Elucidation of drug mode of action has been attempted, in the past, with
different approaches. Methods based only on transcriptional responses are
those requiring the least amount of information and can be quickly applied
to new compounds. On the other hand, they have met with limited success
and, at the present, a general, robust and efficient gene-expression based
method to study drugs in mammalian systems is still missing.
We developed an efficient analysis framework to investigate the mode of
action of drugs by using gene expression data only. Particularly, by using
a large compendium of gene expression profiles following treatments with
more than 1,000 compounds on different human cell lines, we were able
to extract a synthetic consensual transcriptional response for each of the
tested compounds. This was obtained by developing an original rank merging
procedure. Then, we designed a novel similarity measure among the
transcriptional responses to each drug, endingending up with a “drug similarity
network”, where each drug is a node and edges represent significant
similarities between drugs.
By means of a novel hierarchical clustering algorithm, we then provided
this network with a modular topology, contanining groups of highly interconnected
nodes (i.e. network communities) whose exemplars form secondlevel
modules (i.e. network rich-clubs), and so on. We showed that these
topological modules are enriched for a given mode of action and that the
hierarchy of the resulting final network reflects the different levels of similarities
among the composing compound mode of actions.
Most importantly, by integrating a novel drug X into this network (which
can be done very quickly) the unknown mode of action can be inferred by
studying the topology of the subnetwork surrounding X. Moreover, novel
potential therapeutic applications can be assigned to safe and approved
drugs, that are already present in the network, by studying their neighborhood
(i.e. drug repositioning), hence in a very cheap, easy and fast way,
without the need of additional experiments.
By using this approach, we were able to correctly classify novel anti-cancer
compounds; to predict and experimentally validate an unexpected similarity
in the mode of action of CDK2 inhibitors and TopoIsomerase inhibitors
and to predict that Fasudil, a known and FDA-approved cardiotonic agent,
could be repositioned as novel enhancer of cellular autophagy.
Due to the extremely safe profile of this drug and its potential ability to
traverse the blood-brain barrier, this could have strong implications in the
treatment of several human neurodegenerative disorders, such as Huntington
and Parkinson diseases. [edited by author]IX n.s
Statistical analysis and modelling of proteomic and genetic network data illuminate hidden roles of proteins and their connections
While many stable protein complexes are known, the dynamic interactome is still underexplored. Experimental techniques such as single-tag affinity purification, aim to close the gap and identify transient interactions, but need better filtering tools to discriminate between true interactors and noise.
This thesis develops and contrasts two complementary approaches to the analysis of protein-protein interaction (PPI) networks derived from noisy experiments. The majority of data used for the analysis come from in vitro experiments aggregated from known databases (IntAct, BioGRID, BioPlex), but is also complemented by experiments done by our collaborators from the Ueffing group in the Institute of Ophthalmic Research, TĂĽbingen University (Germany).
Chapter 3 presents the statistical approach to the data analysis. It focuses on the case of a single dataset with target and control data in order to determine the significant interactions for the target. The procedure follows an expected trajectory of preprocessing, quality control, statistical testing, correction and discussion of results. The approach is tailored to the specific dataset, experiment design and related assumptions. This is specifically relevant for the missing value imputation where multiple approaches are discussed and a new method, building upon a previous method, is proposed and validated.
Chapter 4 presents a different approach for the filtering of experimental results for PPIs. It is a statistic, WeSA (weighted socio-affinity), which improves upon previous methods of scoring PPIs from affinity proteomics data. It uses network analysis techniques to analyse the full PPI network without the need for controls. WeSA is tested on protein-protein networks of varying accuracy, including the curated IntAct dataset, the unfiltered records in BioGRID, and the large BioPlex dataset. The model is also tested against the previous same-goal method. While the function itself proves superior, another major advantage is that it can efficiently combine and compare observations across studies and can therefore be used to aggregate and clean results from incoming experiments in the context of all already available data.
In the final part, uses of WeSA beyond wild-type PPI networks are analysed. The framework is proposed as a novel way to effectively compare mechanistic differences between variants of the same protein (e.g. mutant vs wild type). I also explore the use of WeSA to study other biological and non-biological networks such as genome-wide association studies (GWAS) and gene-phenotype associations, with encouraging results.
In conclusion, this work presents and compares a variety of mathematical, statistical and computational approaches adapted, combined and/or developed specifically for the task of obtaining a better overview of protein-protein interaction networks. The novel methods performance is validated and, specifically, WeSA, is extensively tested and analysed, including beyond the field of PPI networks
Improving average ranking precision in user searches for biomedical research datasets
Availability of research datasets is keystone for health and life science
study reproducibility and scientific progress. Due to the heterogeneity and
complexity of these data, a main challenge to be overcome by research data
management systems is to provide users with the best answers for their search
queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we
investigate a novel ranking pipeline to improve the search of datasets used in
biomedical experiments. Our system comprises a query expansion model based on
word embeddings, a similarity measure algorithm that takes into consideration
the relevance of the query terms, and a dataset categorisation method that
boosts the rank of datasets matching query constraints. The system was
evaluated using a corpus with 800k datasets and 21 annotated user queries. Our
system provides competitive results when compared to the other challenge
participants. In the official run, it achieved the highest infAP among the
participants, being +22.3% higher than the median infAP of the participant's
best submissions. Overall, it is ranked at top 2 if an aggregated metric using
the best official measures per participant is considered. The query expansion
method showed positive impact on the system's performance increasing our
baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively.
Our similarity measure algorithm seems to be robust, in particular compared to
Divergence From Randomness framework, having smaller performance variations
under different training conditions. Finally, the result categorization did not
have significant impact on the system's performance. We believe that our
solution could be used to enhance biomedical dataset management systems. In
particular, the use of data driven query expansion methods could be an
alternative to the complexity of biomedical terminologies
Collective genomic segments with differential pleiotropic patterns between cognitive dimensions and psychopathology
Cognitive deficits are known to be related to most forms of psychopathology. Here, we perform local genetic correlation analysis as a means of identifying independent segments of the genome that show biologically interpretable pleiotropic associations between cognitive dimensions and psychopathology. We identify collective segments of the genome, which we call “meta-loci”, showing differential pleiotropic patterns for psychopathology relative to either cognitive task performance (CTP) or performance on a non-cognitive factor (NCF) derived from educational attainment. We observe that neurodevelopmental gene sets expressed during the prenatal-early childhood period predominate in CTP-relevant meta-loci, while post-natal gene sets are more involved in NCF-relevant meta-loci. Further, we demonstrate that neurodevelopmental gene sets are dissociable across CTP meta-loci with respect to their spatial distribution across the brain. Additionally, we find that GABA-ergic, cholinergic, and glutamatergic genes drive pleiotropic relationships within dissociable meta-loci
- …