447 research outputs found

    Automatic discovery of drug mode of action and drug repositioning from gene expression data

    Get PDF
    2009 - 2010The identification of the molecular pathway that is targeted by a compound, combined with the dissection of the following reactions in the cellular environment, i.e. the drug mode of action, is a key challenge in biomedicine. Elucidation of drug mode of action has been attempted, in the past, with different approaches. Methods based only on transcriptional responses are those requiring the least amount of information and can be quickly applied to new compounds. On the other hand, they have met with limited success and, at the present, a general, robust and efficient gene-expression based method to study drugs in mammalian systems is still missing. We developed an efficient analysis framework to investigate the mode of action of drugs by using gene expression data only. Particularly, by using a large compendium of gene expression profiles following treatments with more than 1,000 compounds on different human cell lines, we were able to extract a synthetic consensual transcriptional response for each of the tested compounds. This was obtained by developing an original rank merging procedure. Then, we designed a novel similarity measure among the transcriptional responses to each drug, endingending up with a “drug similarity network”, where each drug is a node and edges represent significant similarities between drugs. By means of a novel hierarchical clustering algorithm, we then provided this network with a modular topology, contanining groups of highly interconnected nodes (i.e. network communities) whose exemplars form secondlevel modules (i.e. network rich-clubs), and so on. We showed that these topological modules are enriched for a given mode of action and that the hierarchy of the resulting final network reflects the different levels of similarities among the composing compound mode of actions. Most importantly, by integrating a novel drug X into this network (which can be done very quickly) the unknown mode of action can be inferred by studying the topology of the subnetwork surrounding X. Moreover, novel potential therapeutic applications can be assigned to safe and approved drugs, that are already present in the network, by studying their neighborhood (i.e. drug repositioning), hence in a very cheap, easy and fast way, without the need of additional experiments. By using this approach, we were able to correctly classify novel anti-cancer compounds; to predict and experimentally validate an unexpected similarity in the mode of action of CDK2 inhibitors and TopoIsomerase inhibitors and to predict that Fasudil, a known and FDA-approved cardiotonic agent, could be repositioned as novel enhancer of cellular autophagy. Due to the extremely safe profile of this drug and its potential ability to traverse the blood-brain barrier, this could have strong implications in the treatment of several human neurodegenerative disorders, such as Huntington and Parkinson diseases. [edited by author]IX n.s

    Statistical analysis and modelling of proteomic and genetic network data illuminate hidden roles of proteins and their connections

    Get PDF
    While many stable protein complexes are known, the dynamic interactome is still underexplored. Experimental techniques such as single-tag affinity purification, aim to close the gap and identify transient interactions, but need better filtering tools to discriminate between true interactors and noise. This thesis develops and contrasts two complementary approaches to the analysis of protein-protein interaction (PPI) networks derived from noisy experiments. The majority of data used for the analysis come from in vitro experiments aggregated from known databases (IntAct, BioGRID, BioPlex), but is also complemented by experiments done by our collaborators from the Ueffing group in the Institute of Ophthalmic Research, TĂĽbingen University (Germany). Chapter 3 presents the statistical approach to the data analysis. It focuses on the case of a single dataset with target and control data in order to determine the significant interactions for the target. The procedure follows an expected trajectory of preprocessing, quality control, statistical testing, correction and discussion of results. The approach is tailored to the specific dataset, experiment design and related assumptions. This is specifically relevant for the missing value imputation where multiple approaches are discussed and a new method, building upon a previous method, is proposed and validated. Chapter 4 presents a different approach for the filtering of experimental results for PPIs. It is a statistic, WeSA (weighted socio-affinity), which improves upon previous methods of scoring PPIs from affinity proteomics data. It uses network analysis techniques to analyse the full PPI network without the need for controls. WeSA is tested on protein-protein networks of varying accuracy, including the curated IntAct dataset, the unfiltered records in BioGRID, and the large BioPlex dataset. The model is also tested against the previous same-goal method. While the function itself proves superior, another major advantage is that it can efficiently combine and compare observations across studies and can therefore be used to aggregate and clean results from incoming experiments in the context of all already available data. In the final part, uses of WeSA beyond wild-type PPI networks are analysed. The framework is proposed as a novel way to effectively compare mechanistic differences between variants of the same protein (e.g. mutant vs wild type). I also explore the use of WeSA to study other biological and non-biological networks such as genome-wide association studies (GWAS) and gene-phenotype associations, with encouraging results. In conclusion, this work presents and compares a variety of mathematical, statistical and computational approaches adapted, combined and/or developed specifically for the task of obtaining a better overview of protein-protein interaction networks. The novel methods performance is validated and, specifically, WeSA, is extensively tested and analysed, including beyond the field of PPI networks

    Pathogenic Viruses and their Interaction with Human Host Cells

    Get PDF

    Improving average ranking precision in user searches for biomedical research datasets

    Full text link
    Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorisation method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries. Our system provides competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP among the participants, being +22.3% higher than the median infAP of the participant's best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system's performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. Our similarity measure algorithm seems to be robust, in particular compared to Divergence From Randomness framework, having smaller performance variations under different training conditions. Finally, the result categorization did not have significant impact on the system's performance. We believe that our solution could be used to enhance biomedical dataset management systems. In particular, the use of data driven query expansion methods could be an alternative to the complexity of biomedical terminologies

    Collective genomic segments with differential pleiotropic patterns between cognitive dimensions and psychopathology

    Get PDF
    Cognitive deficits are known to be related to most forms of psychopathology. Here, we perform local genetic correlation analysis as a means of identifying independent segments of the genome that show biologically interpretable pleiotropic associations between cognitive dimensions and psychopathology. We identify collective segments of the genome, which we call “meta-loci”, showing differential pleiotropic patterns for psychopathology relative to either cognitive task performance (CTP) or performance on a non-cognitive factor (NCF) derived from educational attainment. We observe that neurodevelopmental gene sets expressed during the prenatal-early childhood period predominate in CTP-relevant meta-loci, while post-natal gene sets are more involved in NCF-relevant meta-loci. Further, we demonstrate that neurodevelopmental gene sets are dissociable across CTP meta-loci with respect to their spatial distribution across the brain. Additionally, we find that GABA-ergic, cholinergic, and glutamatergic genes drive pleiotropic relationships within dissociable meta-loci
    • …
    corecore