6 research outputs found

    From genes to transcripts : integrative modeling and analysis of regulatory networks

    Get PDF
    Although all the cells in an organism posses the same genome, the regulatory mechanisms lead to highly specific cell types. Elucidating these regulatory mechanisms is a great challenge in systems biology research. Nonetheless, it is known that a large fraction of our genome is comprised of regulatory elements, the precise mechanisms by which different combinations of regulatory elements are involved in controlling gene expression and cell identity are poorly understood. This thesis describes algorithms and approaches for modeling and analysis of different modes of gene regulation. We present POSTIT a novel algorithm for modeling and inferring transcript isoform regulation from transcriptomics and epigenomics data. POSTIT uses multi-task learning with structured-sparsity inducing regularizer to share the regulatory information between isoforms of a gene, which is shown to lead to accurate isoform expression prediction and inference of regulators. Furthermore, it can use isoform expression level and annotation as informative priors for gene expression prediction. Hence, it constitute a novel accurate approach applicable to gene or transcript isoform centric analysis using expression data. In an application to microRNA (miRNA) target prioritization, we demonstrate that it out-competes classical gene centric methods. Moreover, pinpoints important transcription factors and miRNAs that regulate differentially expressed isoforms in any biological system. Competing endogenous RNA (ceRNA) interactions mediated by miRNAs were postulated as an important cellular regulatory network, in which cross-talk between different transcripts involves competition for joint regulators. We developed a novel statistical method, called SPONGE, for large-scale inference of ceRNA networks. In this framework, we designed an efficient empirical p-value computation approach, by sampling from derived null models, which addresses important confounding factors such as sample size, number of involved regulators and strength of correlation. In an application to a large pan-cancer dataset with 31 cancers we discovered protein-coding and non-coding RNAs that are generic ceRNAs in cancer. Finally, we present an integrative analysis of miRNA and protein-based posttranscriptional regulation. We postulate a competitive regulation of the RNAbinding protein IMP2 with miRNAs binding the same RNAs using expression and RNA binding data. This function of IMP2 is relevant in the contribution to disease in the context of adult cellular metabolism. As a summary, in this thesis we have presented a number of different novel approaches for inference and the integrative analysis of regulatory networks that we believe will find wide applicability in the biological sciences

    Combining Transcription Factor Binding Affinities with Open-Chromatin Data for Accurate Gene Expression Prediction

    Get PDF
    The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively
    corecore