4,190 research outputs found

    A novel statistical approach for identification of the master regulator transcription factor

    Get PDF
    Test Dataset. This file contains an example test dataset where our method can be implemented. This simulated data contains 10 transcription factors, namely TF 1, TF 2, …, TF 10 along with 105 genes that were regulated by these transcription factors. Among the transcription factors, TF 1 was generated to play the role of the master regulator. (CSV 1382 kb

    An Approach for Determining and Measuring Network Hierarchy Applied to Comparing the Phosphorylome and the Regulome

    Get PDF
    Many biological networks naturally form a hierarchy with a preponderance of downward information flow. In this study, we define a score to quantify the degree of hierarchy in a network and develop a simulated-annealing algorithm to maximize the hierarchical score globally over a network. We apply our algorithm to determine the hierarchical structure of the phosphorylome in detail and investigate the correlation between its hierarchy and kinase properties. We also compare it to the regulatory network, finding that the phosphorylome is more hierarchical than the regulome

    Identifying the human homologs of yeast Rab proteins Ypt10 & Ypt11 and a global-scale louse endosymbiont genome variation

    Get PDF
    Amyotrophic lateral sclerosis (ALS) is a late-onset fatal neurodegenerative disease that causes loss of upper and/or lower motor neurons, and currently has no treatment or cure available. Over 90% of cases occur spontaneously with unknown causes, highlighting the complexity of the disease, and only 10% of cases are linked to heritable genetic mutations. Numerous ALS-linked genes are conserved through evolution, and model organisms may therefore provide opportunities to understand disease pathology at a molecular or cellular level, proving instrumental in identifying therapeutic targets. ALS subtype 8 (ALS8) is caused by an autosomal dominant P56S mutation in the VAPB gene that alters morphology and function of the endoplasmic reticulum (ER), leading to ER stress sensitivity. In a budding yeast (Saccharomyces cerevisiae) model of ALS8 that recapitulates these phenotypes, we identified Rab GTPases and their regulators involved in membrane traffic as a class of genes whose overexpression improved tolerance to ER stress. Yeast possesses 11 Rab genes, and while the majority of these are characterized and have clear homologs in mammals, the function of both YPT10 and YPT11 remain poorly understood. Notably, YPT10 was isolated as a possible suppressor of ALS8 phenotypes in the yeast model. The goal of this study was to obtain genetic information about Ypt10 and Ypt11 function and phylogeny using bioinformatic approaches. By identifying the human homologs of yeast Rabs, we can potentially study their function, and identify targets for ALS treatments. This study narrowed down the potential human homologs for Ypt10 to Homo sapiens Rab20, Rab22a, and Rab31, as well as for Ypt11 to H. sapiens Rab34 and Rab36

    Identifying therapeutic targets in glioma using integrated network analysis

    Get PDF
    Gliomas are the most common brain tumours in adult population with rapid progression and poor prognosis. Survival among the patients diagnosed with the most aggressive histopathological subtype of gliomas, the glioblastoma, is a mere 12.6 months given the current standard of care. While glioblastomas mostly occur in people over 60, the lower-grade gliomas afflict themselves upon individuals in their third and fourth decades of life. Collectively, the gliomas are one of the major causes of cancer-related death in individuals under fortyin the UK. Over the past twenty years, little has changed in the standard of glioma treatment and the disease has remained incurable. This study focuses on identifying potential therapeutic targets in gliomasusing systems-level approaches and large-scale data integration.I used publicly available transcriptomic data to identify gene co-expression networks associated with the progression of IDH1-mutant 1p/19q euploid astrocytomas from grade II to grade III and high-lighted hub-genes of these networks, which could be targeted to modulate their biological function. I also studied the changes in co-expression patterns between grade II and grade III gliomas and identified a cluster of genes with differential co-expression in different disease states (module M2). By data integration and adaptation of reverse-engineering methods, I elucidated master regulators of the module M2. I then sought to counteract the regulatory activity by using drug-induced gene expression dataset to find compounds inducing gene expression in the opposite direction of the disease signature. I proposed resveratrol as a potentially disease modifying compound, which when administered to patients with a low-grade disease could potentially delay glioma progression.Finally, I appliedanensemble-learning algorithm on a large-scale loss-of-function viability screen in cancer cell-lines with different genetic backgrounds to identify gene dependencies associated with chromosomal copy-number losses common intheglioblastomas. I propose five novel target predictions to be validated in future experiments.Open acces

    Genome-scale Precision Proteomics Identifies Cancer Signaling Networks and Therapeutic Vulnerabilities

    Get PDF
    Mass spectrometry (MS) based-proteomics technology has been emerging as an indispensable tool for biomedical research. But the highly diverse physical and chemical properties of the protein building blocks and the dramatic human proteome complexity largely limited proteomic profiling depth. Moreover, there was a lack of high-throughput quantitative strategies that were both precise and parallel to in-depth proteomic techniques. To solve these grand challenges, a high resolution liquid chromatography (LC) system that coupled with an advanced mass spectrometer was developed to allow genome-scale human proteome identification. Using the combination of pre-MS peptide fractionation, MS2-based interference detection and post-MS computational interference correction, we enabled precise proteome quantification with isobaric labeling. We then applied these advanced proteomics tools for cancer proteome analyses on high grade gliomas (HGG) and rhabdomyosarcomas (RMS). Using systems biology approaches, we demonstrated that these newly developed proteomic analysis pipelines are able to (i) define human proteotypes that link oncogenotypes to cancer phenotypes in HGG and to (ii) identify therapeutic vulnerabilities in RMS. Development of high resolution liquid chromatography is essential for improving the sensitivity and throughput of mass spectrometry-based proteomics to genome-scale. Here we present systematic optimization of a long gradient LC-MS/MS platform to enhance protein identification from a complex mixture. The platform employed an in-house fabricated, reverse phase long column (100 µm x 150 cm, 5 µm C18 beads) coupled with Q Exactive MS. The column was capable of achieving a peak capacity of approximately 700 in a 720 min gradient of 10-45% acetonitrile. The optimal loading amount was about 6 micrograms of peptides, although the column allowed loading as many as 20 micrograms. Gas phase fractionation of peptide ions further increased the number of peptides identified by ~10%. Moreover, the combination of basic pH LC pre-fractionation with the long gradient LC-MS/MS platform enabled the identification of 96,127 peptides and 10,544 proteins at 1% protein false discovery rate in a postmortem brain sample of Alzheimer’s disease. As deep RNA sequencing of the same specimen suggested that ~16,000 genes were expressed, current analysis covered more than 60% of the expressed proteome. Isobaric labeling quantification by mass spectrometry has emerged as a powerful technology for multiplexed large-scale protein profiling, but measurement accuracy in complex mixtures is confounded by the interference from co-isolated ions, resulting in ratio compression. Here we report that the ratio compression can be essentially resolved by the combination of pre-MS peptide fractionation, MS2-based interference detection and post-MS computational interference correction. To recapitulate the complexity of biological samples, we pooled tandem mass tag (TMT) labeled E. coli peptides at 1 : 3 : 10 ratios, and added in ~20-fold more rat peptides as background, followed by the analysis of two dimensional liquid chromatography-MS/MS. Systematic investigation indicated that the quantitative interference was impacted by LC fractionation depth, MS isolation window and peptide loading amount. Exhaustive fractionation (320 x 4 h) can nearly eliminate the interference and achieve results comparable to the MS3-based method. Importantly, the interference in MS2 scans can be estimated by the intensity of contaminated y1 product ions, and we thus developed an algorithm to correct reporter ion ratios of tryptic peptides. Our data indicated that intermediate fractionation (40 x 2 h) and y1 ion-based correction allowed accurate and deep TMT protein profiling, which represents a straightforward and affordable strategy in isobaric labeling proteomics High throughput omics approaches provide an unprecedented opportunity for dissecting molecular mechanisms in cancer biology. Here we present deep profiling of whole proteome, phosphoproteome and transcriptome in two high-grade glioma mouse models driven by mutated receptor tyrosine kinase (RTK) oncogenes, platelet-derived growth factor receptor alpha (PDGFRA) and neurotrophic receptor tyrosine kinase 1 (NTRK1), analyzing 13,860 proteins (11,941 genes) and 30,431 phosphosites by mass spectrometry. Systems biology approaches identified numerous functional modules and master regulators, including 41 kinases and 26 transcription factors. Pathway activity computation and mouse survival curves indicate the NTRK1 mutation induces a higher activation of AKT targets, drives a positive feedback loop to up-regulate multiple other RTKs, and shows higher oncogenic potency than the PDGFRA mutation. Further integration of the mouse data with human HGG transcriptome data determines shared regulators of invasion and stemness. Thus, multi-omics integrative profiling is a powerful avenue to characterize oncogenic activity. There is growing emphasis on personalizing cancer therapy based on somatic mutations identified in patient’s tumors. Among pediatric solid tumors, RAS pathway mutations in rhabdomyosarcoma are the most common potentially actionable lesions. Recent success targeting CDK4/6 and MEK in RAS mutant adult cancers led our collaborator Dr. Dyer’s group to test this approach for rhabdomyosarcoma. They achieved synergistic killing of RAS mutant rhabdomyosarcoma tumor cells by combining MEK and CDK4/6 inhibitors in culture but failed to achieve efficacy in vivo using orthotopic patient derived xenografts (O-PDXs). To determine how rhabdomyosarcomas evade targeting of CDK4/6 and MEK, we collaborated to perform large-scale deep proteomic, phosphoproteomic, and epigenomic profiling of RMS tumors. Integrative analysis of these omics data detected that RMS tumor cells rapidly compensate and overcome CDK4/6 and MEK combination therapy through 6 myogenic signal transduction pathways including WNT, HH, BMP, Adenyl Cyclase, P38/MAPK and PI3K. While it is not feasible to target each of these signal transduction pathways simultaneously in RMS, we discovered that they require the HSP90 chaperone to sustain the complex developmental signal transduction milieu. We achieved specific and synergistic killing of RMS cells using sub-therapeutic concentrations of an HSP90 inhibitor (ganetespib) in combination with conventional chemotherapy used for recurrent RMS. These effects were seen in the most aggressive recurrent RMS orthotopic patient derived xenografts irrespective of RAS pathway perturbations, histologic or molecular classification. Thus, multi-omics integrative cancer profiling using our newly developed tools is powerful to identify core signaling transduction networks, tumor vulnerability (master regulators) for novel cancer therapy

    Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

    Get PDF
    The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

    Integrative Modeling of Transcriptional Regulation in Response to Autoimmune Desease Therapies

    Get PDF
    Die rheumatoide Arthritis (RA) und die Multiple Sklerose (MS) werden allgemein als Autoimmunkrankheiten eingestuft. Zur Behandlung dieser Krankheiten werden immunmodulatorische Medikamente eingesetzt, etwa TNF-alpha-Blocker (z.B. Etanercept) im Falle der RA und IFN-beta-Präparate (z.B. Betaferon und Avonex) im Falle der MS. Bis heute sind die molekularen Mechanismen dieser Therapien weitestgehend unbekannt. Zudem ist ihre Wirksamkeit und Verträglichkeit bei einigen Patienten unzureichend. In dieser Arbeit wurde die transkriptionelle Antwort im Blut von Patienten auf jede dieser drei Therapien untersucht, um die Wirkungsweise dieser Medikamente besser zu verstehen. Dabei wurden Methoden der Netzwerkinferenz eingesetzt, mit dem Ziel, die genregulatorischen Netzwerke (GRNs) der in ihrer Expression veränderten Gene zu rekonstruieren. Ausgangspunkt dieser Analysen war jeweils ein Genexpressions- Datensatz. Daraus wurden zunächst Gene gefiltert, die nach Therapiebeginn hoch- oder herunterreguliert sind. Anschließend wurden die genregulatorischen Regionen dieser Gene auf Transkriptionsfaktor-Bindestellen (TFBS) analysiert. Um schließlich GRN-Modelle abzuleiten, wurde ein neuer Netzwerkinferenz-Algorithmus (TILAR) verwendet. TILAR unterscheidet zwischen Genen und TF und beschreibt die regulatorischen Effekte zwischen diesen durch ein lineares Gleichungssystem. TILAR erlaubt dabei Vorwissen über Gen-TF- und TF-Gen-Interaktionen einzubeziehen. Im Ergebnis wurden komplexe Netzwerkstrukturen rekonstruiert, welche die regulatorischen Beziehungen zwischen den Genen beschreiben, die im Verlauf der Therapien differentiell exprimiert sind. Für die Etanercept-Therapie wurde ein Teilnetz gefunden, das Gene enthält, die niedrigere Expressionslevel bei RA-Patienten zeigen, die sehr gut auf das Medikament ansprechen. Die Analyse von GRNs kann somit zu einem besseren Verständnis Therapie-assoziierter Prozesse beitragen und transkriptionelle Unterschiede zwischen Patienten aufzeigen

    Genomic Methods for Studying the Post-Translational Regulation of Transcription Factors

    Get PDF
    The spatiotemporal coordination of gene expression is a fundamental process in cellular biology. Gene expression is regulated, in large part, by sequence-specific transcription factors that bind to DNA regions in the proximity of each target gene. Transcription factor activity and specificity are, in turn, regulated post-translationally by protein-modifying enzymes. High-throughput methods exist to probe specific steps of this process, such as protein-protein and protein-DNA interactions, but few computational tools exist to integrate this information in a principled, model-oriented manner. In this work, I develop several computational tools for studying the functional implications of transcription factor modification. I establish the first publicly accessible database for known and predicted regulatory circuits that encompass modifying enzymes, transcription factors, and transcriptional targets. I also develop a model-based method for integrating heterogeneous genomic and proteomic data for the inference of modification-dependent transcriptional regulatory networks. The model-based method is thoroughly validated as a reliable and accurate computational genomic tool. Additionally, I propose and demonstrate fundamental improvements to computational proteomic methods for identifying modified protein forms. In summary, this work contributes critical methodological advances to the field of regulatory network inference

    Transcriptome-based Gene Networks for Systems-level Analysis of Plant Gene Functions

    Get PDF
    Present day genomic technologies are evolving at an unprecedented rate, allowing interrogation of cellular activities with increasing breadth and depth. However, we know very little about how the genome functions and what the identified genes do. The lack of functional annotations of genes greatly limits the post-analytical interpretation of new high throughput genomic datasets. For plant biologists, the problem is much severe. Less than 50% of all the identified genes in the model plant Arabidopsis thaliana, and only about 20% of all genes in the crop model Oryza sativa have some aspects of their functions assigned. Therefore, there is an urgent need to develop innovative methods to predict and expand on the currently available functional annotations of plant genes. With open-access catching the ‘pulse’ of modern day molecular research, an integration of the copious amount of transcriptome datasets allows rapid prediction of gene functions in specific biological contexts, which provide added evidence over traditional homology-based functional inference. The main goal of this dissertation was to develop data analysis strategies and tools broadly applicable in systems biology research. Two user friendly interactive web applications are presented: The Rice Regulatory Network (RRN) captures an abiotic-stress conditioned gene regulatory network designed to facilitate the identification of transcription factor targets during induction of various environmental stresses. The Arabidopsis Seed Active Network (SANe) is a transcriptional regulatory network that encapsulates various aspects of seed formation, including embryogenesis, endosperm development and seed-coat formation. Further, an edge-set enrichment analysis algorithm is proposed that uses network density as a parameter to estimate the gain or loss in correlation of pathways between two conditionally independent coexpression networks
    • …
    corecore