256 research outputs found
Role of network topology based methods in discovering novel gene-phenotype associations
The cell is governed by the complex interactions among various types of biomolecules. Coupled with environmental factors, variations in DNA can cause alterations in normal gene function and lead to a disease condition. Often, such disease phenotypes involve coordinated dysregulation of multiple genes that implicate inter-connected pathways. Towards a better understanding and characterization of mechanisms underlying human diseases, here, I present GUILD, a network-based disease-gene prioritization framework. GUILD associates genes with diseases using the global topology of the protein-protein interaction network and an initial set of genes known to be implicated in the disease. Furthermore, I investigate the mechanistic relationships between disease-genes and explain the robustness emerging from these relationships. I also introduce GUILDify, an online and user-friendly tool which prioritizes genes for their association to any user-provided phenotype. Finally, I describe current state-of-the-art systems-biology approaches where network modeling has helped extending our view on diseases such as cancer.La cèl•lula es regeix per interaccions complexes entre diferents tipus de biomolècules. Juntament amb factors ambientals, variacions en el DNA poden causar alteracions en la funciĂł normal dels gens i provocar malalties. Sovint, aquests fenotips de malaltia involucren una desregulaciĂł coordinada de mĂşltiples gens implicats en vies interconnectades. Per tal de comprendre i caracteritzar millor els mecanismes subjacents en malalties humanes, en aquesta tesis presento el programa GUILD, una plataforma que prioritza gens relacionats amb una malaltia en concret fent us de la topologia de xarxe. A partir d’un conjunt conegut de gens implicats en una malaltia, GUILD associa altres gens amb la malaltia mitjancant la topologia global de la xarxa d’interaccions de proteĂŻnes. A mĂ©s a mĂ©s, analitzo les relacions mecanĂstiques entre gens associats a malalties i explico la robustesa es desprèn d’aquesta anĂ lisi. TambĂ© presento GUILDify, un servidor web de fácil Ăşs per la prioritzaciĂł de gens i la seva associaciĂł a un determinat fenotip. Finalment, descric els mètodes mĂ©s recents en què el model•latge de xarxes ha ajudat extendre el coneixement sobre malalties complexes, com per exemple a cĂ ncer
RANDOM WALK APPLIED TO HETEROGENOUS DRUG-TARGET NETWORKS FOR PREDICTING BIOLOGICAL OUTCOMES
Thesis (Ph.D.) - Indiana University, Informatics and Computing, 2016Prediction of unknown drug target interactions from bioassay data is critical not only for the understanding of various interactions but also crucial for the development of new drugs and repurposing of old ones. Conventional methods for prediction of such interactions can be divided into 2D based and 3D based methods. 3D methods are more CPU expensive and require more manual interpretation whereas 2D methods are actually fast methods like machine learning and similarity search which use chemical fingerprints. One of the problems of using traditional machine learning based method to predict drug-target pairs is that it requires a labeled information of true and false interactions. One of the major problems of supervised learning methods is selection on negative samples. Unknown drug target interactions are regarded as false interactions, which may influence the predictive accuracy of the model. To overcome this problem network based methods has become an effective tool in predicting the drug target interactions overcoming the negative sampling problem. In this dissertation study, I will describe traditional machine learning methods and 3D methods of pharmacophore modeling for drug target prediction and will show how these methods work in a drug discovery scenario. I will then introduce a new framework for drug target prediction based on bipartite networks of drug target relations known as Random Walk with Restart (RWR). RWR integrates various networks including drug– drug similarity networks, protein-protein similarity networks and drug- target interaction networks into a heterogeneous network that is capable of predicting novel drug-target relations. I will describe how chemical features for measuring drug-drug similarity do not affect performance in predicting interactions and further show the performance of RWR using an external dataset from ChEMBL database. I will describe about further implementations of RWR approach into multilayered networks consisting of biological data like diseases, tissue based gene expression data, protein- complexes and metabolic pathways to predict associations between human diseases and metabolic pathways which are very crucial in drug discovery. I have further developed a software tool package netpredictor in R (standalone and the web) for unipartite and bipartite networks and implemented network-based predictive algorithms and network properties for drug-target prediction. This package will be described
Recommended from our members
Elucidating the mechanistic impact of single nucleotide variants in model organisms
Understanding how genetic variation propagate to differences in phenotypes in individuals is an ongoing challenge in genetics. Genome-wide association studies have allowed for the identification of many trait-associated genomic loci. However, they are limited in their inability to explain the altered cellular mechanism. Genetic variation can drive disease by altering a range of mechanisms, including signalling networks, TF binding, and protein folding. Understanding the impact of variants on such processes has key implications in therapeutics, drug development, and more. This thesis aims to utilise computational predictors to shed light on how cellular mechanisms are altered in the context of genetic variation and better understand how they drive both molecular and organism-level phenotypes.
Many binding events in the cell are mediated by short stretches of sequence motifs. The ability to discover these underlying rules of binding could greatly aid our understanding of variant impact. Kinase–substrate phosphorylation is one of the most prominent post-translational modifications (PTMs) which is mediated by such motifs. We first describe a computational method which utilises interaction and phosphorylation data to predict sequence preferences of kinases. Our method was applied to 57% of human kinases capturing known well-characterised and novel kinase specificities. We experimentally validate four understudied kinases to show that predicted models closely resemble true specificities. We further demonstrate that this method can be applied to different organisms and can be used for other phospho-recognition domains. The described approach allows for an extended repertoire of sequence specificities to be generated, particularly in organisms for which little data is available.
TF-DNA binding is another mechanism driven by sequence motifs, which is key for the tight regulation of gene expression and can be greatly altered by genetic variation. We have comprehensively benchmarked current methods used to predict non-coding variant effects on TF-DNA binding by employing over 20,000 compiled allele-specific ChIP-seq variants across 94 TFs. We show that machine learning-based approaches significantly outperform more rudimentary methods such as the position weight matrix. We further note that models for many TFs with distinct binding specificities were unable to accurately assess the impact of variants. For these TFs, we explore alternative mechanisms underlying TF-binding, such as methylation, co-operative binding, and DNA shape that drive poor performance. Our results demonstrate the complexity of predicting non-coding variant effects and the importance of incorporating alternative mechanisms into models.
Finally, we describe a comprehensive effort to compile and benchmark state-of-the-art sequence and structure-based predictors of mutational consequences and predict the effect of coding and non-coding variants in the reference genomes of human, yeast, and E. coli. Predicted mechanisms include the impact on protein stability, interaction interfaces, and PTMs. These variant effects are provided through mutfunc, a fast and intuitive web tool by which users can interactively explore pre-computed mechanistic variant impact predictions. We validate computed predictions by analysing known pathogenic disease variants and provide mechanistic hypotheses for causal variants of unknown function. We further use our predictions to devise gene-level functionality scores in human and yeast individuals, which we then used to perform gene-phenotype associations and uncover novel gene-phenotype associations
Novel Approaches to Studying the Effects of Cis-Regulatory Variants in the Central Nervous System
For decades, studies of the genetic basis of disease have focused on rare coding mutations that disrupt protein function, leading to the identification of hundreds of genes underlying Mendelian diseases. However, many complex diseases are non-Mendelian, and less than 2% of the genome is coding. It is now clear that non-coding variants contribute to disease susceptibility, but the precise underlying mechanisms are generally unknown. Cis-regulatory elements (CREs) are transcription factor (TF)-bound genomic regions that regulate gene expression, and variants within CREs can therefore modify gene expression. The putative locations of CREs in a variety of cell types have been identified through genome-wide assays of TF binding and epigenomic signatures, providing a starting point for probing the effects of cis-regulatory variants. Unlike coding mutations, which can be interpreted based on the genetic code, the functional consequence of any given cis-regulatory variant is difficult to predict even at the molecular level. Therefore, a major bottleneck lies in interpreting the functional significance of these variants.
In the present work, I study the effects of cis-regulatory variants in the central nervous system (CNS), specifically in retina and brain. The retina is composed of well-characterized neuronal cell types and an extensively studied transcriptional network, while the brain is the center of human cognition and a target of devastating neuropsychiatric diseases. First, I take advantage of the genetic diversity between two distantly related mouse strains to describe the relationship between cis-regulatory variants and differences in retinal gene expression. I identify cis- and trans-regulatory effects, as well as parent-of-origin effects. Second, I develop a new technology based on an existing massively parallel reporter assay, CRE-seq, to enable the functional study of long CREs in the CNS in vivo for the first time. I demonstrate the ability of this approach to measure tissue-specific cis-regulatory activity in the brain and to pinpoint DNA bases critical for activity. Finally, I conduct a detailed mechanistic study of a non-coding region containing variants associated with both human cognitive performance and bipolar disorder. This last study illustrates the complexities and challenges of establishing the causal role of non-coding variants in disease
Network-driven strategies to integrate and exploit biomedical data
[eng] In the quest for understanding complex biological systems, the scientific community has been delving into protein, chemical and disease biology, populating biomedical databases with a wealth of data and knowledge. Currently, the field of biomedicine has entered a Big Data era, in which computational-driven research can largely benefit from existing knowledge to better understand and characterize biological and chemical entities. And yet, the heterogeneity and complexity of biomedical data trigger the need for a proper integration and representation of this knowledge, so that it can be effectively and efficiently exploited.
In this thesis, we aim at developing new strategies to leverage the current biomedical knowledge, so that meaningful information can be extracted and fused into downstream applications. To this goal, we have capitalized on network analysis algorithms to integrate and exploit biomedical data in a wide variety of scenarios, providing a better understanding of pharmacoomics experiments while helping accelerate the drug discovery process. More specifically, we have (i) devised an approach to identify functional gene sets associated with drug response mechanisms of action, (ii) created a resource of biomedical descriptors able to anticipate cellular drug response and identify new drug repurposing opportunities, (iii) designed a tool to annotate biomedical support for a given set of experimental observations, and (iv) reviewed different chemical and biological descriptors relevant for drug discovery, illustrating how they can be used to provide solutions to current challenges in biomedicine.[cat] En la cerca d’una millor comprensiĂł dels sistemes biològics complexos, la comunitat cientĂfica ha estat aprofundint en la biologia de les proteĂŻnes, fĂ rmacs i malalties, poblant les bases de dades biomèdiques amb un gran volum de dades i coneixement. En l’actualitat, el camp de la biomedicina es troba en una era de “dades massives” (Big Data), on la investigaciĂł duta a terme per ordinadors se’n pot beneficiar per entendre i caracteritzar millor les entitats quĂmiques i biològiques. No obstant, la heterogeneĂŻtat i complexitat de les dades biomèdiques requereix que aquestes s’integrin i es representin d’una manera idònia, permetent aixĂ explotar aquesta informaciĂł d’una manera efectiva i eficient.
L’objectiu d’aquesta tesis doctoral Ă©s desenvolupar noves estratègies que permetin explotar el coneixement biomèdic actual i aixĂ extreure informaciĂł rellevant per aplicacions biomèdiques futures. Per aquesta finalitat, em fet servir algoritmes de xarxes per tal d’integrar i explotar el coneixement biomèdic en diferents tasques, proporcionant un millor enteniment dels experiments farmacoòmics per tal d’ajudar accelerar el procĂ©s de descobriment de nous fĂ rmacs. Com a resultat, en aquesta tesi hem (i) dissenyat una estratègia per identificar grups funcionals de gens associats a la resposta de lĂnies cel·lulars als fĂ rmacs, (ii) creat una col·lecciĂł de descriptors biomèdics capaços, entre altres coses, d’anticipar com les cèl·lules responen als fĂ rmacs o trobar nous usos per fĂ rmacs existents, (iii) desenvolupat una eina per descobrir quins contextos biològics corresponen a una associaciĂł biològica observada experimentalment i, finalment, (iv) hem explorat diferents descriptors quĂmics i biològics rellevants pel procĂ©s de descobriment de nous fĂ rmacs, mostrant com aquests poden ser utilitzats per trobar solucions a reptes actuals dins el camp de la biomedicina
β-cells cis-regulatory networks and type 1 diabetes
[eng] Type 1 Diabetes (T1D) is a ÂcellÂtargeted autoimmune disease, leading to a reduction in pancreatic Âcell mass that renders patients insulinÂdependent for life. In early stages of the disease, cells from the immune system infiltrate pancreatic islets in a process called insulitis. During this stage, a crossÂtalk is established between cells in the pancreatic islets and the infiltrating immune cells, mediated by the release of cytokines and chemokines. Studying the gene regulatory networks driving cell responses during insulitis, will allow us to pinpoint key gene pathways leading to Âcell lossÂofÂfunction and apoptosis, and also to understand the role cells have in their own demise. In the present thesis, we used two different cytokine cocktails, IFNÂ and IFNÂ + ILÂ1, to model early and late insulitis, respectively. After exposing cells and pancreatic islets to such proinflammatory cytokines, we characterized the changes in their chromatin landscape, gene networks and protein profiles. Using both models, we observed dramatic chromatin remodeling in terms of accessibility and/or H3K27ac histone modification enrichment, coupled with upÂregulation of the nearby genes and increased abundance of the corresponding protein. Mining gene regulatory networks of Âcells exposed to IFNÂ revealed two potential therapeutic interventions which were able to reduce interferon signature in cells: 1) Inhibition of bromodomain proteins, which resulted in a downÂregulation of IFNÂÂinduced HLAÂI and CXCL10 expressionÍľ 2) Baricitnib, a JAK1/2 inhibitor, which was able to reduce both IFNÂÂinduced HLAÂI and CXCL10 expression levels and cell apoptosis. In cells exposed to IFNÂ + ILÂ1, we were able to identify a subset of novel regulatory elements uncovered upon the exposure, which we named Induced Regulatory Elements (IREs). Such regions were enriched for T1DÂassociated risk variants, suggesting that cells might carry a portion of T1D genetic risk. Interestingly, we identified two T1D lead variants overlapping IREs, in which the risk allele modulated the IRE enhancer activity, exposing a potential T1D mechanism acting through cells. To facilitate the access to these genomic data, together with other datasets relevant for the pancreatic islet community, we developed the Islet Regulome Browser (http://www.isletregulome.org/), a free web application that allows exploration and integration of pancreatic islet genomic data
Analysing directed network data
The topology of undirected biological networks, such as protein-protein interaction networks, or genetic interaction networks, has been extensively explored in search of new biological knowledge. Graphlets, small connected non-isomorphic induced sub-graphs of an undirected network, have been particularly useful in computational network biology. Having in mind that a significant portion of biological networks, such as metabolic networks or
transcriptional regulatory networks, are directed by nature, we define all up to four node directed graphlets and orbits and implement the directed graphlet and graphlet orbits counting algorithm. We generalise all existing graphlet based measures to the directed case, defining: relative directed graphlet frequency distance, directed graphlet degree distribution similarity,
directed graphlet degree vector similarity, and directed graphlet correlation distance. We apply new topological measures to metabolic networks and show that the topology of directed biological networks is correlated with
biological function. Finally, we look for topology–function relationships in metabolic networks that are conserved across different species.Open Acces
Applications of bioinformatics and machine learning in the analysis of proteomics data
In chapter one, a general introduction to the basic principles and techniques of MS-based proteomics, quantification strategies, and a generalized shotgun proteomics workflow are given. Moreover, I also outline how to analyze proteomics data from a bioinformatics perspective including normalization, dealing with missing values, differential analysis, functional annotation, as well as how to reveal the biology from post-translational modification data. Furthermore, I generalized the basics of machine learning algorithms from the perspective of supervised and unsupervised machine learning, along with that the application of machine learning algorithms to the identification of protein complexes. In chapter two, we are seeking to explore the drug addiction mechanism in melanoma cells that carry BRAF mutation. We present a proteomics and phosphoproteomics study of BRAFi-addicted melanoma cells (i.e., 451Lu cell line) in response to BRAFi withdrawal, in which ERK1, ERK2, and JUNB were genetically silenced separately using CRISPR-Cas9. We show that inactivation of ERK2 and, to a lesser extent, JUNB prevents drug addiction in these melanoma cells, while, conversely, knockout of ERK1 fails to reverse this phenotype, showing a response similar to that of control cells. Our data indicate that ERK2 and JUNB share comparable proteome responses dominated by the reactivation of cell division. Importantly, we find that EMT activation in drug-addicted melanoma cells upon drug withdrawal is affected by silencing ERK2 but not ERK1. Moreover, we reveal that PIR acts as an effector of ERK2, and phosphoproteome analysis reveals that silencing of ERK2 but not ERK1 leads to the amplification of GSK3 kinase activity. Our results depict possible mechanisms of drug addiction in melanoma, which may provide a guide for therapeutic strategies in drug-resistant melanoma. In chapter three, we are dedicated to exploring the role of PD-1 in T cell activation by comparing the proteome and phosphoproteome profiles in resting and activated CD8+ T cells, in which PD-1 was silenced using CRISPR–Cas9. Our data reveal that the activated T cells reprogrammed their proteome and phosphoproteome marked by activating of mTORC1 pathway. Moreover, we find that silencing of PD-1 altered the expression of E3 ubiquitin-- protein ligases, and increased glucose and lactate transporters. On the phosphoproteomics level, it evokes phosphorylation events in the mTORC1 pathway and activates the epidermal growth factor and its downstream MAPK pathway. Therefore, the data presented in this chapter depicts mechanisms of PD-1 in response to TCR stimulation in CD8+ T cells, which may provide a guide in immune homeostasis and immune checkpoint therapy. In chapter four, we construct a comprehensive map of human protein complexes through the integration of protein-protein interactions and protein abundance features. A deep learning framework was built to predict protein-protein interactions (PPIs), followed by a two-stage clustering to identify protein complexes. Our deep learning technique-based classifier significantly outperformed recently published machine learning prediction models with an F1-measure of 0.68 and captured in the process 5,010 complexes containing over 9,000 unique proteins. Moreover, this deep learning model enables us to capture poorly characterized interactions and the co-expressed protein involved interactions
- …