9 research outputs found

    Single cell analyses of ES cells reveal alternative pluripotent cell states and molecular mechanisms that control self-renewal

    Get PDF
    Analyses of gene expression in single mouse embryonic stem cells (mESCs) cultured in serum and LIF revealed the presence of two distinct cell subpopulations with individual gene expression signatures. Comparisons with published data revealed that cells in the first subpopulation are phenotypically similar to cells isolated from the inner cell mass (ICM). In contrast, cells in the second subpopulation appear to be more mature. Pluripotency Gene Regulatory Network (PGRN) reconstruction based on single-cell data and published data suggested antagonistic roles for Oct4 and Nanog in the maintenance of pluripotency states. Integrated analyses of published genomic binding (ChIP) data strongly supported this observation. Certain target genes alternatively regulated by OCT4 and NANOG, such as Sall4 and Zscan10, feed back into the top hierarchical regulator Oct4. Analyses of such incoherent feedforward loops with feedback (iFFL-FB) suggest a dynamic model for the maintenance of mESC pluripotency and self-renewal

    A case-control approach to assess variability in distribution of distance between transcription factor binding site and transcription start site

    Get PDF
    Using the in-silico approach, with ENCODE ChIP-seq data for various transcription factors and different cell types; we systematically compared the distance between the transcription factor binding site (TFBS) and the transcription start (TSS). Our aim was to determine if the same transcription factor binds at a different position relative to the TSS in a normal and an abnormal cell type. We compare distribution of distance of binding sites from the TSS; to make description less verbose we call this “distance” where there is no possibility of confusion. We used a case-control methodology where the distance between the TFBS and the TSS in the normal, non-cancerous or untreated cell type is the control. The distance between the TFBS and the TSS in the cancerous or treated cell type is the case. We use the distance between the TFBS and the TSS in the control as the standard. We compared the distance between the TFBS and the TSS in the case and the control. If the distance between the TFBS and the TSS in the control was greater than the distance between the TFBS and the TSS in the case, we can infer the following. The transcription factor in the case binds closer to the TSS compared to the control. If the distance between the TFBS and the TSS in the control is smaller than the distance between the TFBS and the TSS in the case, we can infer the following. The TF in the case binds further away from the TSS compared to the control. Our method is a screening method whereby we compare ChIP-seq data to determine if there is a difference in the distribution distance between the TFBS and the TSS for normal and abnormal cell types. We used the R package ChIP-Enrich to compare the distribution of distance between ChIP-seq peak and the nearest TSS. ChIP-Enrich produces a histogram with the number of ChIP-seq peaks at a certain distance from the TSS. The results indicate for some transcription factors like GM12878-cMyc and K562-cMyc there is a difference between the distribution of distance between the TFBS and the nearest TSS. cMyc has more binding sites within a distance of 1kb from the TSS in GM12878 when compared to K562. GM12878-CTCF and K562-CTCF have slight differences when comparing their distribution of distance from the TSS. This means CTCF binds almost the same distance from the TSS in both GM12878 and K562. A549-gr treated with dexamethasone is interesting because with increase dose of dexamethasone the distribution of distance from the TSS changes as well

    Transcription factor binding distribution and properties in prokaryotes

    Full text link
    The canonical model of transcriptional regulation in prokaryotes restricted binding site locations to promoter regions and suggested that the binding sequences serve as the main determinants of binding. In this dissertation, I challenge these assumptions. As a member of the TB Systems Biology Consortium, I analyzed and validated ChIP-Seq and microarray experiments for over 100 transcription factors (TFs). In order to study the transcriptional functions of predicted binding sites, I integrated binding and expression data and assigned potential regulatory roles to 20% of the binding sites. Stronger binding sites were more often associated with regulation than weaker sites, suggesting a correlation between binding strength and regulatory impact. Seventy-six percent of the sites fell into annotated coding regions and a significant proportion was assigned to regulatory functions. To study the importance of binding sequences, I compared experimental sites with computational motif predictions. Although a conservative binding motif was found for most TFs, only a fraction of the observed motifs appeared bound in the experiment. Some low-affinity binding sites appeared occupied by the corresponding TF while many high-affinity binding sites were not. Interestingly, I found exactly the same nucleotide sequences (up to 15 residues long) bound in one area of the genome but not bound in another area, pointing to DNA accessibility as an important factor for in vivo binding. To investigate the evolutionary conservation of binding-site occupancy, sequence, and transcriptional impact, I analyzed ChIP-Seq and expression experiments for five conserved TFs for two-to-four Mycobacterial relatives. The regulon composition showed significantly less conservation than expected from the overall gene conservation level across Mycobacteria. Despite expectations, sequence conservation did not serve as a good indicator of whether or not a computationally predicted motif was bound experimentally; and in some cases, a fully conserved motif was bound in one relative but not in the other. Conservation of genic binding sites was higher than expected from the random model, adding to the evidence that at least some genic sites are functional. Understanding the evolutionary story of binding sites allowed me to explain unusual site configurations, some of which indicated a role for DNA looping

    Systems and Synthetic Biology Approaches to Engineer Fungi for Fine Chemical Production

    Get PDF
    Since the advent of systems and synthetic biology, many studies have sought to harness microbes as cell factories through genetic and metabolic engineering approaches. Yeast and filamentous fungi have been successfully harnessed to produce fine and high value-added chemical products. In this review, we present some of the most promising advances from recent years in the use of fungi for this purpose, focusing on the manipulation of fungal strains using systems and synthetic biology tools to improve metabolic flow and the flow of secondary metabolites by pathway redesign. We also review the roles of bioinformatics analysis and predictions in synthetic circuits, highlighting in silico systemic approaches to improve the efficiency of synthetic modules

    Improving the prediction of transcription factor binding sites to aid the interpretation of non-coding single nucleotide variants

    Get PDF
    Single nucleotide variants (SNVs) that occur in transcription factor binding sites (TFBSs) can disrupt the binding of transcription factors and alter gene expression which can cause inherited diseases and act as driver SNVs in cancer. The identification of SNVs in TFBSs has historically been challenging given the limited number of experimentally characterised TFBSs. The recent ENCODE project has resulted in the availability of ChIP-Seq data that provides genome wide sets of regions bound by transcription factors. These data have the potential to improve the identification of SNVs in TFBSs. However, as the ChIP-Seq data identify a broader range of DNA in which a transcription factor binds, computational prediction is required to identify the precise TFBS. Prediction of TFBSs involves scanning a DNA sequence with a Position Weight Matrix (PWM) using a pattern matching tool. This thesis focusses on the prediction of TFBSs by: (a) evaluating a set of locally-installable pattern-matching tools and identifying the best performing tool (FIMO), (b) using the ENCODE ChIP-Seq data to evaluate a set of de novo motif discovery tools that are used to derive PWMs which can handle large volumes of data, (c) identifying the best performing tool (rGADEM), (d) using rGADEM to generate a set of PWMs from the ENCODE ChIP-Seq data and (e) by finally checking that the selection of the best pattern matching tool is not unduly influenced by the choice of PWMs. These analyses were exploited to obtain a set of predicted TFBSs from the ENCODE ChIP-Seq data. The predicted TFBSs were utilised to analyse somatic cancer driver, and passenger SNVs that occur in TFBSs. Clear signals in conservation and therefore Shannon entropy values were identified, and subsequently exploited to identify a threshold that can be used to prioritize somatic cancer driver SNVs for experimental validation

    Quantitative modeling and statistical analysis of protein-DNA binding sites

    Get PDF

    Transcription factor binding specificity and occupancy : elucidation, modelling and evaluation

    Get PDF
    The major contributions of this thesis are addressing the need for an objective quality evaluation of a transcription factor binding model, demonstrating the value of the tools developed to this end and elucidating how in vitro and in vivo information can be utilized to improve TF binding specificity models. Accurate elucidation of TF binding specificity remains an ongoing challenge in gene regulatory research. Several in vitro and in vivo experimental techniques have been developed followed by a proliferation of algorithms, and ultimately, the binding models. This increase led to a choice problem for the end users: which tools to use, and which is the most accurate model for a given TF? Therefore, the first section of this thesis investigates the motif assessment problem: how scoring functions, choice and processing of benchmark data, and statistics used in evaluation affect motif ranking. This analysis revealed that TF motif quality assessment requires a systematic comparative analysis, and that scoring functions used have a TF-specific effect on motif ranking. These results advised the design of a Motif Assessment and Ranking Suite MARS, supported by PBM and ChIP-seq benchmark data and an extensive collection of PWM motifs. MARS implements consistency, enrichment, and scoring and classification-based motif evaluation algorithms. Transcription factor binding is also influenced and determined by contextual factors: chromatin accessibility, competition or cooperation with other TFs, cell line or condition specificity, binding locality (e.g. proximity to transcription start sites) and the shape of the binding site (DNA-shape). In vitro techniques do not capture such context; therefore, this thesis also combines PBM and DNase-seq data using a comparative k-mer enrichment approach that compares open chromatin with genome-wide prevalence, achieving a modest performance improvement when benchmarked on ChIP-seq data. Finally, since statistical and probabilistic methods cannot capture all the information that determine binding, a machine learning approach (XGBooost) was implemented to investigate how the features contribute to TF specificity and occupancy. This combinatorial approach improves the predictive ability of TF specificity models with the most predictive feature being chromatin accessibility, while the DNA-shape and conservation information all significantly improve on the baseline model of k-mer and DNase data. The results and the tools introduced in this thesis are useful for systematic comparative analysis (via MARS) and a combinatorial approach to modelling TF binding specificity, including appropriate feature engineering practices for machine learning modelling
    corecore