4,190 research outputs found
A novel statistical approach for identification of the master regulator transcription factor
Test Dataset. This file contains an example test dataset where our method can be implemented. This simulated data contains 10 transcription factors, namely TF 1, TF 2, …, TF 10 along with 105 genes that were regulated by these transcription factors. Among the transcription factors, TF 1 was generated to play the role of the master regulator. (CSV 1382 kb
An Approach for Determining and Measuring Network Hierarchy Applied to Comparing the Phosphorylome and the Regulome
Many biological networks naturally form a hierarchy with a preponderance of downward information flow. In this study, we define a score to quantify the degree of hierarchy in a network and develop a simulated-annealing algorithm to maximize the hierarchical score globally over a network. We apply our algorithm to determine the hierarchical structure of the phosphorylome in detail and investigate the correlation between its hierarchy and kinase properties. We also compare it to the regulatory network, finding that the phosphorylome is more hierarchical than the regulome
Identifying the human homologs of yeast Rab proteins Ypt10 & Ypt11 and a global-scale louse endosymbiont genome variation
Amyotrophic lateral sclerosis (ALS) is a late-onset fatal neurodegenerative disease that causes loss of upper and/or lower motor neurons, and currently has no treatment or cure available. Over 90% of cases occur spontaneously with unknown causes, highlighting the complexity of the disease, and only 10% of cases are linked to heritable genetic mutations. Numerous ALS-linked genes are conserved through evolution, and model organisms may therefore provide opportunities to understand disease pathology at a molecular or cellular level, proving instrumental in identifying therapeutic targets. ALS subtype 8 (ALS8) is caused by an autosomal dominant P56S mutation in the VAPB gene that alters morphology and function of the endoplasmic reticulum (ER), leading to ER stress sensitivity. In a budding yeast (Saccharomyces cerevisiae) model of ALS8 that recapitulates these phenotypes, we identified Rab GTPases and their regulators involved in membrane traffic as a class of genes whose overexpression improved tolerance to ER stress. Yeast possesses 11 Rab genes, and while the majority of these are characterized and have clear homologs in mammals, the function of both YPT10 and YPT11 remain poorly understood. Notably, YPT10 was isolated as a possible suppressor of ALS8 phenotypes in the yeast model.
The goal of this study was to obtain genetic information about Ypt10 and Ypt11 function and phylogeny using bioinformatic approaches. By identifying the human homologs of yeast Rabs, we can potentially study their function, and identify targets for ALS treatments. This study narrowed down the potential human homologs for Ypt10 to Homo sapiens Rab20, Rab22a, and Rab31, as well as for Ypt11 to H. sapiens Rab34 and Rab36
Identifying therapeutic targets in glioma using integrated network analysis
Gliomas are the most common brain tumours in adult population with rapid progression and poor prognosis. Survival among the patients diagnosed with the most aggressive histopathological subtype of gliomas, the glioblastoma, is a mere 12.6 months given the current standard of care. While glioblastomas mostly occur in people over 60, the lower-grade gliomas afflict themselves upon individuals in their third and fourth decades of life. Collectively, the gliomas are one of the major causes of cancer-related death in individuals under fortyin the UK. Over the past twenty years, little has changed in the standard of glioma treatment and the disease has remained incurable. This study focuses on identifying potential therapeutic targets in gliomasusing systems-level approaches and large-scale data integration.I used publicly available transcriptomic data to identify gene co-expression networks associated with the progression of IDH1-mutant 1p/19q euploid astrocytomas from grade II to grade III and high-lighted hub-genes of these networks, which could be targeted to modulate their biological function. I also studied the changes in co-expression patterns between grade II and grade III gliomas and identified a cluster of genes with differential co-expression in different disease states (module M2). By data integration and adaptation of reverse-engineering methods, I elucidated master regulators of the module M2. I then sought to counteract the regulatory activity by using drug-induced gene expression dataset to find compounds inducing gene expression in the opposite direction of the disease signature. I proposed resveratrol as a potentially disease modifying compound, which when administered to patients with a low-grade disease could potentially delay glioma progression.Finally, I appliedanensemble-learning algorithm on a large-scale loss-of-function viability screen in cancer cell-lines with different genetic backgrounds to identify gene dependencies associated with chromosomal copy-number losses common intheglioblastomas. I propose five novel target predictions to be validated in future experiments.Open acces
Genome-scale Precision Proteomics Identifies Cancer Signaling Networks and Therapeutic Vulnerabilities
Mass spectrometry (MS) based-proteomics technology has been emerging as an indispensable tool for biomedical research. But the highly diverse physical and chemical properties of the protein building blocks and the dramatic human proteome complexity largely limited proteomic profiling depth. Moreover, there was a lack of high-throughput quantitative strategies that were both precise and parallel to in-depth proteomic techniques. To solve these grand challenges, a high resolution liquid chromatography (LC) system that coupled with an advanced mass spectrometer was developed to allow genome-scale human proteome identification. Using the combination of pre-MS peptide fractionation, MS2-based interference detection and post-MS computational interference correction, we enabled precise proteome quantification with isobaric labeling. We then applied these advanced proteomics tools for cancer proteome analyses on high grade gliomas (HGG) and rhabdomyosarcomas (RMS). Using systems biology approaches, we demonstrated that these newly developed proteomic analysis pipelines are able to (i) define human proteotypes that link oncogenotypes to cancer phenotypes in HGG and to (ii) identify therapeutic vulnerabilities in RMS. Development of high resolution liquid chromatography is essential for improving the sensitivity and throughput of mass spectrometry-based proteomics to genome-scale. Here we present systematic optimization of a long gradient LC-MS/MS platform to enhance protein identification from a complex mixture. The platform employed an in-house fabricated, reverse phase long column (100 µm x 150 cm, 5 µm C18 beads) coupled with Q Exactive MS. The column was capable of achieving a peak capacity of approximately 700 in a 720 min gradient of 10-45% acetonitrile. The optimal loading amount was about 6 micrograms of peptides, although the column allowed loading as many as 20 micrograms. Gas phase fractionation of peptide ions further increased the number of peptides identified by ~10%. Moreover, the combination of basic pH LC pre-fractionation with the long gradient LC-MS/MS platform enabled the identification of 96,127 peptides and 10,544 proteins at 1% protein false discovery rate in a postmortem brain sample of Alzheimer’s disease. As deep RNA sequencing of the same specimen suggested that ~16,000 genes were expressed, current analysis covered more than 60% of the expressed proteome. Isobaric labeling quantification by mass spectrometry has emerged as a powerful technology for multiplexed large-scale protein profiling, but measurement accuracy in complex mixtures is confounded by the interference from co-isolated ions, resulting in ratio compression. Here we report that the ratio compression can be essentially resolved by the combination of pre-MS peptide fractionation, MS2-based interference detection and post-MS computational interference correction. To recapitulate the complexity of biological samples, we pooled tandem mass tag (TMT) labeled E. coli peptides at 1 : 3 : 10 ratios, and added in ~20-fold more rat peptides as background, followed by the analysis of two dimensional liquid chromatography-MS/MS. Systematic investigation indicated that the quantitative interference was impacted by LC fractionation depth, MS isolation window and peptide loading amount. Exhaustive fractionation (320 x 4 h) can nearly eliminate the interference and achieve results comparable to the MS3-based method. Importantly, the interference in MS2 scans can be estimated by the intensity of contaminated y1 product ions, and we thus developed an algorithm to correct reporter ion ratios of tryptic peptides. Our data indicated that intermediate fractionation (40 x 2 h) and y1 ion-based correction allowed accurate and deep TMT protein profiling, which represents a straightforward and affordable strategy in isobaric labeling proteomics High throughput omics approaches provide an unprecedented opportunity for dissecting molecular mechanisms in cancer biology. Here we present deep profiling of whole proteome, phosphoproteome and transcriptome in two high-grade glioma mouse models driven by mutated receptor tyrosine kinase (RTK) oncogenes, platelet-derived growth factor receptor alpha (PDGFRA) and neurotrophic receptor tyrosine kinase 1 (NTRK1), analyzing 13,860 proteins (11,941 genes) and 30,431 phosphosites by mass spectrometry. Systems biology approaches identified numerous functional modules and master regulators, including 41 kinases and 26 transcription factors. Pathway activity computation and mouse survival curves indicate the NTRK1 mutation induces a higher activation of AKT targets, drives a positive feedback loop to up-regulate multiple other RTKs, and shows higher oncogenic potency than the PDGFRA mutation. Further integration of the mouse data with human HGG transcriptome data determines shared regulators of invasion and stemness. Thus, multi-omics integrative profiling is a powerful avenue to characterize oncogenic activity. There is growing emphasis on personalizing cancer therapy based on somatic mutations identified in patient’s tumors. Among pediatric solid tumors, RAS pathway mutations in rhabdomyosarcoma are the most common potentially actionable lesions. Recent success targeting CDK4/6 and MEK in RAS mutant adult cancers led our collaborator Dr. Dyer’s group to test this approach for rhabdomyosarcoma. They achieved synergistic killing of RAS mutant rhabdomyosarcoma tumor cells by combining MEK and CDK4/6 inhibitors in culture but failed to achieve efficacy in vivo using orthotopic patient derived xenografts (O-PDXs). To determine how rhabdomyosarcomas evade targeting of CDK4/6 and MEK, we collaborated to perform large-scale deep proteomic, phosphoproteomic, and epigenomic profiling of RMS tumors. Integrative analysis of these omics data detected that RMS tumor cells rapidly compensate and overcome CDK4/6 and MEK combination therapy through 6 myogenic signal transduction pathways including WNT, HH, BMP, Adenyl Cyclase, P38/MAPK and PI3K. While it is not feasible to target each of these signal transduction pathways simultaneously in RMS, we discovered that they require the HSP90 chaperone to sustain the complex developmental signal transduction milieu. We achieved specific and synergistic killing of RMS cells using sub-therapeutic concentrations of an HSP90 inhibitor (ganetespib) in combination with conventional chemotherapy used for recurrent RMS. These effects were seen in the most aggressive recurrent RMS orthotopic patient derived xenografts irrespective of RAS pathway perturbations, histologic or molecular classification. Thus, multi-omics integrative cancer profiling using our newly developed tools is powerful to identify core signaling transduction networks, tumor vulnerability (master regulators) for novel cancer therapy
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
Integrative Modeling of Transcriptional Regulation in Response to Autoimmune Desease Therapies
Die rheumatoide Arthritis (RA) und die Multiple Sklerose (MS) werden allgemein als Autoimmunkrankheiten eingestuft. Zur Behandlung dieser Krankheiten werden immunmodulatorische Medikamente eingesetzt, etwa TNF-alpha-Blocker (z.B. Etanercept) im Falle der RA und IFN-beta-Präparate (z.B. Betaferon und Avonex) im Falle der MS. Bis heute sind die molekularen Mechanismen dieser Therapien weitestgehend unbekannt. Zudem ist ihre Wirksamkeit und Verträglichkeit bei einigen Patienten unzureichend.
In dieser Arbeit wurde die transkriptionelle Antwort im Blut von Patienten auf jede dieser drei Therapien untersucht, um die Wirkungsweise dieser Medikamente besser zu verstehen. Dabei wurden Methoden der Netzwerkinferenz eingesetzt, mit dem Ziel, die genregulatorischen Netzwerke (GRNs) der in ihrer Expression veränderten Gene zu rekonstruieren. Ausgangspunkt dieser Analysen war jeweils ein Genexpressions- Datensatz. Daraus wurden zunächst Gene gefiltert, die nach Therapiebeginn hoch- oder herunterreguliert sind. Anschließend wurden die genregulatorischen Regionen dieser Gene auf Transkriptionsfaktor-Bindestellen (TFBS) analysiert. Um schließlich GRN-Modelle abzuleiten, wurde ein neuer Netzwerkinferenz-Algorithmus (TILAR) verwendet. TILAR unterscheidet zwischen Genen und TF und beschreibt die regulatorischen Effekte zwischen diesen durch ein lineares Gleichungssystem. TILAR erlaubt dabei Vorwissen über Gen-TF- und TF-Gen-Interaktionen einzubeziehen.
Im Ergebnis wurden komplexe Netzwerkstrukturen rekonstruiert, welche die regulatorischen Beziehungen zwischen den Genen beschreiben, die im Verlauf der Therapien differentiell exprimiert sind. Für die Etanercept-Therapie wurde ein Teilnetz gefunden, das Gene enthält, die niedrigere Expressionslevel bei RA-Patienten zeigen, die sehr gut auf das Medikament ansprechen. Die Analyse von GRNs kann somit zu einem besseren Verständnis Therapie-assoziierter Prozesse beitragen und transkriptionelle Unterschiede zwischen Patienten aufzeigen
Genomic Methods for Studying the Post-Translational Regulation of Transcription Factors
The spatiotemporal coordination of gene expression is a fundamental process in cellular biology. Gene expression is regulated, in large part, by sequence-specific transcription factors that bind to DNA regions in the proximity of each target gene. Transcription factor activity and specificity are, in turn, regulated post-translationally by protein-modifying enzymes. High-throughput methods exist to probe specific steps of this process, such as protein-protein and protein-DNA interactions, but few computational tools exist to integrate this information in a principled, model-oriented manner. In this work, I develop several computational tools for studying the functional implications of transcription factor modification. I establish the first publicly accessible database for known and predicted regulatory circuits that encompass modifying enzymes, transcription factors, and transcriptional targets. I also develop a model-based method for integrating heterogeneous genomic and proteomic data for the inference of modification-dependent transcriptional regulatory networks. The model-based method is thoroughly validated as a reliable and accurate computational genomic tool. Additionally, I propose and demonstrate fundamental improvements to computational proteomic methods for identifying modified protein forms. In summary, this work contributes critical methodological advances to the field of regulatory network inference
Transcriptome-based Gene Networks for Systems-level Analysis of Plant Gene Functions
Present day genomic technologies are evolving at an unprecedented rate, allowing interrogation of
cellular activities with increasing breadth and depth. However, we know very little about how the
genome functions and what the identified genes do. The lack of functional annotations of genes
greatly limits the post-analytical interpretation of new high throughput genomic datasets. For plant
biologists, the problem is much severe. Less than 50% of all the identified genes in the model plant
Arabidopsis thaliana, and only about 20% of all genes in the crop model Oryza sativa have some
aspects of their functions assigned. Therefore, there is an urgent need to develop innovative
methods to predict and expand on the currently available functional annotations of plant genes.
With open-access catching the ‘pulse’ of modern day molecular research, an integration of the
copious amount of transcriptome datasets allows rapid prediction of gene functions in specific
biological contexts, which provide added evidence over traditional homology-based functional
inference. The main goal of this dissertation was to develop data analysis strategies and tools
broadly applicable in systems biology research.
Two user friendly interactive web applications are presented: The Rice Regulatory
Network (RRN) captures an abiotic-stress conditioned gene regulatory network designed to
facilitate the identification of transcription factor targets during induction of various environmental
stresses. The Arabidopsis Seed Active Network (SANe) is a transcriptional regulatory network
that encapsulates various aspects of seed formation, including embryogenesis, endosperm
development and seed-coat formation. Further, an edge-set enrichment analysis algorithm is
proposed that uses network density as a parameter to estimate the gain or loss in correlation of
pathways between two conditionally independent coexpression networks
Recommended from our members
Discovering Master Regulators of Single-Cell Transcriptional States in the Tumor Immune Microenvironment to Reveal Immuno-Therapeutic Targets and Synergistic Treatments
The development of checkpoint immunotherapy has been a paradigm shift in the treatment of cancer, leading to dramatic improvement in treatment outcomes across a broad range of tumor types. Nevertheless, our current understanding of the tumor immune microenvironment and mediators of resistance to therapy are limited. The recent development of high-throughput single-cell RNA-Sequencing (scRNA-Seq) technology has opened up an unprecedented window into the transcriptional states of distinct tumor-infiltrating immune and stromal cells. However, even this technology has its biological limitations, with very high levels of data dropout induced by low total mRNA molecules and capture efficiency. This thesis explores the application of a transcriptional regulatory protein activity inference approach to single-cell data in order to resolve gene dropout and more deeply characterize upstream drivers of cell state within the micro-environment of several distinct tumor types.
To this end, algorithms for inference of protein activity, drug sensitivity, and cell-cell interaction have been adapted to scRNA-Seq data, along with an approach for querying enrichment of single-cell-derived population marker gene sets patient-by-patient in larger bulk-RNA-Seq cohorts. By applying these tools systematically, we have identified distinct cellular sub-populations associated with clinical outcome in different tumor types, including a novel population of C1Q+/TREM2+/APOE+ macrophages associated with post-surgical tumor recurrence in clear cell renal carcinoma, a sub-population of fibroblasts associated with improved response to immunotherapy in head and neck squamous cell carcinoma, tumor cell subpopulations with distinct inferred drug sensitivities in cholangiocarcinoma and prostate cancer, as well as tumor-specific regulatory T-cells (Tregs), active as a mechanism of immunotherapy resistance across a range of tumor types. In ongoing clinical trials from both primary and metastatic prostate cancer as well as clear cell renal carcinoma, we are able to assess which of these populations are enriched in non-responders to checkpoint immunotherapy. The proteomic master regulators of each of these single-cell types have direct utility as potential biomarkers for treatment response, but they may also be therapeutically modulated as novel targets for combination immunotherapy, potentially improving treatment response rates and treatment outcomes in future clinical trials.
Finally, this thesis also presents a discovery-to-validation platform to accelerate micro-environment-directed drug repurposing in the context of immunotherapy resistance and rapid CRISPRko validation of novel therapeutic targets. This platform has been developed specifically to validate newly identified master regulators of tumor-specific immunosuppressive regulatory T-cells (Tregs), resulting in discovery of low-dose gemcitabine as a tumor-specific Treg-modulating drug synergistic with anti-PD1 checkpoint immunotherapy and TRPS1 as a proteomic master regulator with clinically significant effect on tumor Treg-infiltrating and tumor growth rate. However, the platform itself may be readily extended in future work to prioritize agents against immunosuppressive macrophage and fibroblast populations for clinical development and trials. As we have discovered, different cancers have different populations of cells driving therapy response and resistance. Taken together, the analytical and validation tools presented in this thesis represent an opportunity to tailor future immuno-therapies at the single-cell level to particular tumor types and to individual patients
- …