12 research outputs found

    ChIP-Array 2: integrating multiple omics data to construct gene regulatory networks

    Get PDF
    published_or_final_versio

    An Integrated Model of Multiple-Condition ChIP-Seq Data Reveals Predeterminants of Cdx2 Binding

    Get PDF
    Regulatory proteins can bind to different sets of genomic targets in various cell types or conditions. To reliably characterize such condition-specific regulatory binding we introduce MultiGPS, an integrated machine learning approach for the analysis of multiple related ChIP-seq experiments. MultiGPS is based on a generalized Expectation Maximization framework that shares information across multiple experiments for binding event discovery. We demonstrate that our framework enables the simultaneous modeling of sparse condition-specific binding changes, sequence dependence, and replicate-specific noise sources. MultiGPS encourages consistency in reported binding event locations across multiple-condition ChIP-seq datasets and provides accurate estimation of ChIP enrichment levels at each event. MultiGPS's multi-experiment modeling approach thus provides a reliable platform for detecting differential binding enrichment across experimental conditions. We demonstrate the advantages of MultiGPS with an analysis of Cdx2 binding in three distinct developmental contexts. By accurately characterizing condition-specific Cdx2 binding, MultiGPS enables novel insight into the mechanistic basis of Cdx2 site selectivity. Specifically, the condition-specific Cdx2 sites characterized by MultiGPS are highly associated with pre-existing genomic context, suggesting that such sites are pre-determined by cell-specific regulatory architecture. However, MultiGPS-defined condition-independent sites are not predicted by pre-existing regulatory signals, suggesting that Cdx2 can bind to a subset of locations regardless of genomic environment. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5.National Science Foundation (U.S.) (Graduate Research Fellowship under Grant 0645960)National Institutes of Health (U.S.) (grant P01 NS055923)Pennsylvania State University. Center for Eukaryotic Gene Regulatio

    Computational methods for studying epigenomic regulation

    Get PDF
    In the nucleus, DNA is tightly wrapped around proteins in a structure called chromatin in order to protect it from degradation. Chromatin is composed of nucleosomes which are a structure of eight histones around which the DNA is wrapped. Nucleosomes can be modified by enzymes on amino acids located on their N-terminal tails. These modifications allow the chromatin to open and close in targeted regions, providing control over gene expression. At present, chromatin immuno-precipitation (ChIP) and assay of transposase-accessible chromatin (ATAC) combined with high-throughput sequencing (ChIP-seq and ATAC-seq) are the major high-throughput methods allowing the study of histone modifications and genome-wide chromatin openness, respectively. Typically, ChIP-seq targets one histone at a time by enriching the histone-bound regions of the genome using immuno-precipitation, while ATAC-seq uses a transposase enzyme to cut the open chromatin into fragments of DNA. The DNA fragments obtained from both techniques can be sequenced and aligned against a reference genome. Once the location of the fragments is determined, the genome is scanned for significant enrichment in a process called peak calling. Differential analysis is then used to compare local enrichment-level variations between different biological conditions. Combining ChIP-seq and ATAC-seq data with other information, such as RNA-seq–derived transcriptomics data, can further help to build a comprehensive picture of the complex underlying biology. This work therefore focuses on the development of computational tools to help with the analysis of epigenomics research data. In this thesis, a robust workflow for the differential analysis of ChIP-seq and ATAC-seq data is developed and evaluated against existing tools using one synthetic dataset, two biological ChIP-seq datasets and two biological ATAC-seq datasets. RNA-seq data is then further correlated with the detected peaks. An efficient replicate-driven visualisation tool is also proposed to visualise coverage of DNA fragments on the genome, which is compared to two existing tools, highlighting its efficiency. Lastly, two studies are presented showcasing the usefulness of the differential analysis approaches in extracting knowledge in a real-life biological setting

    Integrative methods for epigenetic profiling in cancer and development

    Get PDF
    DNA mutation, epigenetic alteration, and gene expression are three major molecular components that distinguish cancer from normal cells. Although it is widely accepted that epigenetic modifications can greatly affect the expression of the target genes, because of the complex combinations of epigenetic marks, together with the interactions between multiple non-coding regulatory elements, measuring the epigenetic effects on gene expression is not an easy task. Nevertheless, it is estimated that epigenetic modifications have a greater effect than DNA mutations on tumorigenesis. In addition, epigenetic alterations are the initiating factor in some chromosome abnormalities and aberrant gene expression, making the study of epigenetic alterations a central aspect in understanding the underlying mechanisms in cancer and cell development. The aim of this thesis is to conduct qualitative and quantitative analyses of differential epigenetic modifications. To this end, a variety of existing approaches were applied in the ChIP-Seq analyses of six histone marks on glioblastoma data from four distinct subtypes. The results depict a comprehensive landscape of active and poised regulatory elements specific to glioblastoma subtypes, which describes the different aspects of tumor progression. However, the descriptive model of multiple histone marks (ChromHMM and peak calls) was also shown to be prone to various biases and artifacts. Moreover, some models also neglect the quantitative information of ChIP-Seq data, making it inadequate in addressing the magnitude of changes between epigenetic modification and gene expression levels. Therefore, in the second part of my work, I designed an integrative, network-based approach, in which I integrated two levels of epigenetic information: the signal intensities of each epigenetic mark, and the relationships between promoters and distal regulatory elements known as enhancers. Applying this approach to a variety of test cases, it predicts a number of candidate genes with significant epigenetic alterations, and comprehensive benchmarking validated these findings in cancer and cell development. In summary, as increasing amounts of epigenetic data become available, the computational approaches employed in this study would be highly relevant in both comparative and integrative analysis of the epigenetic landscape. The discovery of novel epigenetic targets in cancers, not only unfolds the fundamental mechanisms in tumorigenesis and development, but also serves as an emerging resource for molecular diagnosis and treatment

    Quantitative analysis of ChIP-seq signals and transcriptomes

    Get PDF
    Chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-seq) is commonly used to analyze the in vivo interactions between proteins and DNA across the genome. Analysis of ChIP-seq data has largely focused on detection of presence of peaks that represent DNA regions enriched by chromatin immunoprecipitation, i.e., the DNA loci bound by the immunoprecipitated proteins. To properly interpret ChIP-seq data, capturing its quantitative features is imperative. In this dissertation, we develop a statistically robust pipeline, named as ChIP-seq Signal Quantifier (CSSQ), that provides normalized ChIP-seq data, enabling detection and quantification of differential binding (DBs) across the genome, allowing calculable comparisons among multiple ChIP-seq datasets on predefined regions. Using both experimental datasets and computational simulations, we demonstrate the superior performance of CSSQ against existing tools as evidenced by its high sensitivity and specificity, and low false discovery rate. CSSQ is applicable to ChIP-seq datasets with varied signal to noise ratio, significantly improving the accuracy of comparison of ChIP-seq datasets from different experiments, serving as a powerful pipeline suited to garner quantitative information from ChIP-seq datasets for deciphering epigenomes. RNA-seq has become the leading choice for transcriptome analysis. Using RNA-seq and bioinformatics analysis, we characterize gene expression profiles and key cellular processes during stem cell differentiation and cell responses upon nanoparticle exposure. Collectively, these studies show that transcriptome analysis is a powerful tool for characterization and understanding cellular mechanisms.Ph.D

    Validação de genes alvo da via Rac1/PAK1-BCL6/STAT5 envolvidos na progressão tumoral

    Get PDF
    Dissertação de mestrado em Bioquímica, apresentada à Faculdade de Ciências da Universidade de Lisboa, 2016Defendida em janeiro de 2017Trabalho desenvolvido no INSA no grupo de oncobiologia e vias de sinalização da UID do DGH sob a orientação formal dos investigadores que colaboram nessa equipa.O cancro colorretal é um dos tipos de cancro com maior incidência a nível mundial e também dos mais mortíferos, sendo o seu prognóstico tanto mais limitado quanto mais avançado for o estado da doença. A GTPase Rac1 encontra-se sobre-expressa em vários tipos de carcinoma, nomeadamente colorretal e a desregulação da sua sinalização celular tem sido associada à transformação maligna. Em particular, o eixo de sinalização Rac1/PAK1 encontra-se alterado em cerca de 60% dos tumores sólidos, estando esta alteração associada a tumores mais agressivos e invasivos, com prognósticos clínicos mais desfavoráveis, consequentes, muitas vezes, do desenvolvimento de resistência à quimioterapia. Para além disto, Rac1/PAK1 é também responsável pela ativação de várias vias de sinalização que conduzem à regulação da expressão génica, facto que tem adquirido cada vez mais destaque no estudo da progressão tumoral. O laboratório de acolhimento descreveu, em células de carcinoma colorretal, uma nova via de sinalização em que a ativação de Rac1/PAK1 promove um switch transcricional entre o repressor BCL6 e o ativador STAT5, levando a um aumento da expressão génica. Assim, de forma a identificar todos os locais do genoma nos quais a expressão génica poderia ser modulada por esta via, o grupo de investigação utilizou uma abordagem inovadora de análise de dados de ChIP-seq, explorando a seletividade do switch transcricional, BCL6/STAT5. No presente estudo pretendeu-se validar experimentalmente esta nova abordagem de análise de dados de ChIP-seq, procurando, entre os múltiplos “hits” encontrados, um conjunto de genes cuja modulação da expressão por esta via pudesse elucidar algumas das consequências pro-oncogénicas da sobre-ativação de Rac1 e PAK1 observada em tumores agressivos com mau prognóstico. Os nossos resultados identificaram 2402 genes que respondiam ao switch BCL6/STAT5 aquando da estimulação da via Rac1/PAK. Destes, selecionaram-se para validação experimental um conjunto de 15 genes, para os quais os picos detetados resumiam as características da totalidade dos picos identificados no que respeita a parâmetros como dimensão, amplitude, localização relativa ao respetivo gene, entre outros. Como era esperado, com base em estudos prévios, a variação na expressão dos genes identificados, em resposta à manipulação do estado de ativação da via Rac1/PAK1 é ligeira, não ultrapassando três vezes os valores basais. Curiosamente, alguns destes genes exibiam uma diminuição dos níveis de expressão aquando da ativação da via Rac1/PAK1 e o comportamento contrário na sua inibição. Uma análise mais detalhada das sequências delimitadas pelos picos correspondentes, permitiu identificar dois sub-motivos distintos para os genes que respondiam de forma positiva e negativa à ativação da via Rac1/PAK1, dentro do motivo consensus geral de ligação ao DNA dos fatores BCL6 e STAT5. Em paralelo, realizou-se uma análise de agrupamento funcional, tendo-se observado um enriquecimento dentro da lista de genes identificados de genes envolvidos na resposta e reparação de danos no DNA. Os picos correspondentes a estes genes continham o sub-domínio de resposta positiva à estimulação da via Rac1/PAK1 e a análise funcional da sua expressão demonstrou que os níveis de todos eles aumentavam em resposta à ativação da via e diminuíam aquando da inibição da via. A avaliação do impacto fisiológico da ativação destes genes em células de carcinoma colorretal (DLD1) pelo ensaio do cometa, aquando da ativação da via Rac1/PAK1, revelou que esta via confere uma proteção parcial aos danos induzidos pelo tratamento com o agente alquilante sulfonato de etil-metano (EMS), acelerando o processo de reparação dos mesmos. De notar é o facto de que a inibição desta via com o inibidor seletivo de Rac1, EHT1846, bloqueia significativamente a reparação dos danos genómicos, promovendo mesmo o seu agravamento. Este trabalho veio demonstrar que a nova estratégia de análise de ChIP-seq permite a identificação de pequenas variações transcricionais como a que advém da resposta transcricional ao switch BCL6/STAT5, modulada pela via Rac1/PAK1. Revelou, ainda, o papel da via de sinalização Rac1/PAK1-BCL6/STAT5 na resposta e reparação de danos no DNA, sugerindo que a sua inibição farmacológica poderá ter aplicação terapêutica no cancro, nomeadamente na potenciação dos efeitos de determinados agentes quimioterápicos com efeitos genotóxicos.Colorectal cancer is one of the most prevalent types of cancer worldwide and also one of the most deadly malignancies, with its prognosis being poorer the more advanced the state of the disease. The GTPase Rac1 is overexpressed in several types of carcinoma, namely in colorectal cancers, and the dysregulation of its cellular signaling has been tightly associated with malignant transformation. In particular, the Rac1/PAK1 signaling axis is altered in about 60% of solid tumors, this alteration being associated with more aggressive and invasive tumors, with more unfavorable clinical prognosis, often resulting from the development of resistance to chemotherapy. In addition, Rac1/PAK1 is also responsible for the activation of several signaling pathways that lead to the regulation of gene expression, an aspect that has become increasingly prominent in the study of tumor progression. The host laboratory described, in colorectal carcinoma cells, a new signaling pathway in which activation of the Rac1/PAK1 pathway promotes a transcriptional switch between the BCL6 repressor and the STAT5 activator, leading to increased gene expression. Thus, in order to identify all sites in the genome in which gene expression could be modulated by this pathway, the research group used an innovative ChIP-seq data analysis approach, exploring the selectivity of the transcriptional switch, BCL6/STAT5. In the present study, we aimed to experimentally validate this new approach of ChIP-seq data analysis, searching among the multiple hits identified for a set of genes for which the modulation of their expression by this pathway could elucidate some of the prooncogenic consequences of the Rac1/PAK1 over-activation observed in aggressive tumors with poor prognosis. Our results identified 2402 genes responding to the BCL6/STAT5 switch upon stimulation of the Rac1/PAK pathway. From these, a set of 15 genes were selected for experimental validation, since their corresponding ChIP-seq peaks summarized the characteristics of overall identified peaks with respect to parameters such as size, amplitude, location relative to the respective gene, among others. As expected, based on previous studies, the variation in expression levels of the selected genes was small in response to the manipulation of the Rac1/PAK1 pathway activity, not exceeding three times the baseline values. Interestingly, some of these genes exhibited a decrease in expression upon activation of the pathway and the opposite behavior in its inhibition. A more detailed analysis of the sequences delimited by the corresponding peaks allowed to identify two distinct DNA-binding sub-motifs for the genes that responded positively and negatively to the activation of the Rac1/PAK1 pathway, within the general DNA binding consensus factor for BCL6 and STAT5 factors. In parallel, a functional clustering analysis was performed, and enrichment in genes involved in DNA damage response and repair was observed within the identified 2402 gene list. Peaks corresponding to these genes contained the subdomain of positive response to Rac1/PAK1 pathway stimulation and the functional analysis of 4 of these for their expression levels demonstrated that all increased in response to Rac1/PAK1 pathway activation while inhibition of the pathway led to a decrease in their expression. The evaluation of the physiological impact of the activation of these genes in colorectal carcinoma cells (DLD1) by the comet assay, upon activation of the Rac1/PAK1 pathway, has shown that this pathway provides partial protection against damage induced by treatment with alkylating agents such as EMS, accelerating the process of DNA damage repair. Of note is the fact that inhibition of this pathway with the selective Rac1 inhibitor, EHT1846, significantly blocks repair of genomic damage, even increasing its severity. This work demonstrated that the new strategy for ChIP-seq data analysis allows the identification of small transcriptional variations such as that derived from the transcriptional response to the BCL6/STAT5 switch, modulated by the Rac1/PAK1 pathway. It also revealed a role of the Rac1/PAK1-BCL6/STAT5 pathway in the response to DNA damage and repair, suggesting that its pharmacological inhibition may have therapeutic application in cancer, namely in potentiating the effects of chemotherapeutic agents with genotoxic effects.N/

    Statistical Methods for the Analysis of Epigenomic Data

    Get PDF
    Epigenomics, the study of the human genome and its interactions with proteins and other cellular elements, has become of significant interest in the past decade. Several landmark studies have shown that these interactions regulate essential cellular processes (gene transcription, gene silencing, etc.) and are associated with multiple complex disorders such as cancer incidence, cardiovascular disease, etc. Chromatin immunoprecipitation followed by massively-parallel sequencing (ChIP-seq) is one of several techniques used to (1) detect protein-DNA interaction sites, (2) classify differential epigenomic activity across conditions, and (3) characterize subpopulations of single-cells in heterogeneous samples. In this dissertation, we present statistical methods to tackle problems (1-3) in contexts where protein-DNA interaction sites expand across broad genomic domains. First, we present a statistical model that integrates data from multiple epigenomic assays and detects protein-DNA interaction sites in consensus across multiple replicates. We introduce a class of zero-inflated mixed-effects hidden Markov models (HMMs) to account for the excess of observed zeros, the latent sample-specific differences, and the local dependency of sequencing read counts. By integrating multiple samples into a statistical model tailored for broad epigenomic marks, our model shows high sensitivity and specificity in both simulated and real datasets. Second, we present an efficient framework for the detection and classification of regions exhibiting differential epigenomic activity in multi-sample multi-condition designs. The presented model utilizes a finite mixture model embedded into a HMM to classify patterns of broad and short differential epigenomic activity across conditions. We utilize a fast rejection-controlled EM algorithm that makes our implementation among the fastest algorithms available, while showing improvement in performance in data from broad epigenomic marks. Lastly, we analyze data from single-cell ChIP-seq assays and present a statistical model that allows the simultaneous clustering and characterization of single-cell subpopulations. The presented framework is robust for the often observed sparsity in single-cell epigenomic data and accounts for the local dependency of counts. We introduce an initialization scheme for the initialization of the EM algorithm as well as the identification of the number of single-cell subpopulations in the data, a common task in current single-cell epigenomic algorithms.Doctor of Philosoph
    corecore