131 research outputs found
Recommended from our members
Insight into transcription factor gene duplication from Caenorhabditis elegans Promoterome-driven expression patterns
BACKGROUND: The C. elegans Promoterome is a powerful resource for revealing the regulatory mechanisms by which transcription is controlled pan-genomically. Transcription factors will form the core of any systems biology model of genome control and therefore the promoter activity of Promoterome inserts for C. elegans transcription factor genes was examined, in vivo, with a reporter gene approach. RESULTS: Transgenic C. elegans strains were generated for 366 transcription factor promoter/gfp reporter gene fusions. GFP distributions were determined, and then summarized with reference to developmental stage and cell type. Reliability of these data was demonstrated by comparison to previously described gene product distributions. A detailed consideration of the results for one C. elegans transcription factor gene family, the Six family, comprising ceh-32, ceh-33, ceh-34 and unc-39 illustrates the value of these analyses. The high proportion of Promoterome reporter fusions that drove GFP expression, compared to previous studies, led to the hypothesis that transcription factor genes might be involved in local gene duplication events less frequently than other genes. Comparison of transcription factor genes of C. elegans and Caenorhabditis briggsae was therefore carried out and revealed very few examples of functional gene duplication since the divergence of these species for most, but not all, transcription factor gene families. CONCLUSION: Examining reporter expression patterns for hundreds of promoters informs, and thereby improves, interpretation of this data type. Genes encoding transcription factors involved in intrinsic developmental control processes appear acutely sensitive to changes in gene dosage through local gene duplication, on an evolutionary time scale
Recommended from our members
Statistical Assessment of the Global Regulatory Role of Histone Acetylation in Saccharomyces cerevisiae
BACKGROUND: Histone acetylation plays important but incompletely understood roles in gene regulation. A comprehensive understanding of the regulatory role of histone acetylation is difficult because many different histone acetylation patterns exist and their effects are confounded by other factors, such as the transcription factor binding sequence motif information and nucleosome occupancy.
RESULTS: We analyzed recent genomewide histone acetylation data using a few complementary statistical models and tested the validity of a cumulative model in approximating the global regulatory effect of histone acetylation. Confounding effects due to transcription factor binding sequence information were estimated by using two independent motif-based algorithms followed by a variable selection method. We found that the sequence information has a significant role in regulating transcription, and we also found a clear additional histone acetylation effect. Our model fits well with observed genome-wide data. Strikingly, including more complicated combinatorial effects does not improve the model's performance. Through a statistical analysis of conditional independence, we found that H4 acetylation may not have significant direct impact on global gene expression.
CONCLUSION: Decoding the combinatorial complexity of histone modification requires not only new data but also new methods to analyze the data. Our statistical analysis confirms that histone acetylation has a significant effect on gene transcription rates in addition to that attributable to upstream sequence motifs. Our analysis also suggests that a cumulative effect model for global histone acetylation is justified, although a more complex histone code may be important at specific gene loci. We also found that the regulatory roles among different histone acetylation sites have important differences.Statistic
Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework
<p>Abstract</p> <p>Background</p> <p>Serial Analysis of Gene Expression (SAGE) is a high-throughput method for inferring mRNA expression levels from the experimentally generated sequence based tags. Standard analyses of SAGE data, however, ignore the fact that the probability of generating an observable tag varies across genes and between experiments. As a consequence, these analyses result in biased estimators and posterior probability intervals for gene expression levels in the transcriptome.</p> <p>Results</p> <p>Using the yeast <it>Saccharomyces cerevisiae </it>as an example, we introduce a new Bayesian method of data analysis which is based on a model of SAGE tag formation. Our approach incorporates the variation in the probability of tag formation into the interpretation of SAGE data and allows us to derive exact joint and approximate marginal posterior distributions for the mRNA frequency of genes detectable using SAGE. Our analysis of these distributions indicates that the frequency of a gene in the tag pool is influenced by its mRNA frequency, the cleavage efficiency of the anchoring enzyme (AE), and the number of informative and uninformative AE cleavage sites within its mRNA.</p> <p>Conclusion</p> <p>With a mechanistic, model based approach for SAGE data analysis, we find that inter-genic variation in SAGE tag formation is large. However, this variation can be estimated and, importantly, accounted for using the methods we develop here. As a result, SAGE based estimates of mRNA frequencies can be adjusted to remove the bias introduced by the SAGE tag formation process.</p
Genome-wide analysis of the cis-regulatory modules of divergent gene pairs in yeast
AbstractIn budding yeast, approximately a quarter of adjacent genes are divergently transcribed (divergent gene pairs). Whether genes in a divergent pair share the same regulatory system is still unknown. By examining transcription factor (TF) knockout experiments, we found that most TF knockout only altered the expression of one gene in a divergent pair. This prompted us to conduct a comprehensive analysis in silico to estimate how many divergent pairs are regulated by common sets of TFs (cis-regulatory modules, CRMs) using TF binding sites and expression data. Analyses of ten expression datasets show that only a limited number of divergent gene pairs share CRMs in any single dataset. However, around half of divergent pairs do share a regulatory system in at least one dataset. Our analysis suggests that genes in a divergent pair tend to be co-regulated in at least one condition; however, in most conditions, they may not be co-regulated
Dissecting the spatial structure of overlapping transcription in budding yeast
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 95-102).This thesis presents a computational and algorithmic method for the analysis of high-resolution transcription data in the budding yeast Saccharomyces cerevisiae. We begin by describing a computational system for storing and retrieving spatially-mapped genomic data. This system forms the infrastructure for a novel algorithmic approach to detect and recover instances of same-strand overlapping transcripts in high resolution expression experiments. We then apply these algorithms to a set of transcription experiments in budding yeast, Saccharomyces cerevisiae, in order to identify potential sites of same-strand overlapping transcripts that may be involved in novel forms of transcriptional regulation.by Timothy Danford.Ph.D
DISTILLER: a data integration framework to reveal condition dependency of complex regulons in Escherichia coli
DISTILLER, a data integration framework for the inference of transcriptional module networks, is presented and used to investigate the condition dependency and modularity in Escherichia coli networks
Fuzzy association rules for biological data analysis: a case study on yeast
BACKGROUND: Last years' mapping of diverse genomes has generated huge amounts of biological data which are currently dispersed through many databases. Integration of the information available in the various databases is required to unveil possible associations relating already known data. Biological data are often imprecise and noisy. Fuzzy set theory is specially suitable to model imprecise data while association rules are very appropriate to integrate heterogeneous data. RESULTS: In this work we propose a novel fuzzy methodology based on a fuzzy association rule mining method for biological knowledge extraction. We apply this methodology over a yeast genome dataset containing heterogeneous information regarding structural and functional genome features. A number of association rules have been found, many of them agreeing with previous research in the area. In addition, a comparison between crisp and fuzzy results proves the fuzzy associations to be more reliable than crisp ones. CONCLUSION: An integrative approach as the one carried out in this work can unveil significant knowledge which is currently hidden and dispersed through the existing biological databases. It is shown that fuzzy association rules can model this knowledge in an intuitive way by using linguistic labels and few easy-understandable parameters
Computational modeling of gene expression from regulatory sequences
Regulation of gene expression is an important early step in controlling every biological process that underlies the function of living organisms. Even though gene expression may be regulated in several stages, the modulation occurs mostly at the primary stage known as “transcription”. Teasing out the details of transcriptional regulation is therefore a core focus of biological research. Transcriptional regulation of gene expression is dictated by regulatory DNA sequences, often called cis-regulatory modules (CRM; also known as “enhancer”), that contain specific binding sites for regulatory proteins (transcription factors, TF). The assembly of TFs bound on a CRM drives the desired expression level of the gene associated with the CRM. As the abundance of TFs vary across different cell types, the expression level of the gene, also termed as the “readout” of the CRM, varies accordingly and results in the aforementioned control over biological processes. The rules, collectively known as the “cis-regulatory logic”, to predict gene expression level given information about CRMs and TFs, however, are unclear. Decades of experimental studies have hypothesized mechanisms about parts of this regulatory process (e.g., about the influence of TF-TF interactions), but a comprehensive study of cis-regulatory logic is feasible only through computational models. The subject of this thesis is to develop mechanistic models of gene expression from regulatory sequences and use the models to understand such details of the system that are difficult to assess experimentally.
The first part of this thesis develops a model that integrates the regulatory effect of signaling pathways with that of sequence-bound TFs to understand the expression pattern of a gene from its CRM. Given the various types of molecular interactions that the model needs to capture, it is both complex in structure and rich in the number of parameters. Similarly complex models commonly used in other disciplines, from signaling networks to climatology, have been shown to fit many distinct parameterizations that are equally consistent with data but might represent disparate mechanistic hypotheses. Whether this is also the case for models of cis-regulation has never been investigated, with the standard practice in this realm being to report a single or a few best-fit models. We demonstrate here – taking the Drosophila ind gene as an example – that gene expression modeling from cis-regulatory sequences may suffer from incomplete and even incorrect conclusions if one adheres to this current practice. We construct an ensemble of models by systematically exploring the entire parameter space and leveraging both wild-type data and various perturbation experiments, and make statistical inferences from the ensemble about detail regulatory mechanisms of ind. Years of genetic experiments have put forth an assortment of hypotheses about ind regulation. We use our modeling approach to show how a mechanism involving MAPK induced attenuation in the DNA binding affinity of Capicua and the use of low-affinity Dorsal binding sites may provide a coherent explanation of ind regulation. Also, we quantitatively predict and experimentally validate the role of the “pioneer factor” Zelda in activating ind. Finally, we discuss disparate hypotheses that are supported by our ensemble of models and will need future experimentation for a complete understanding of ind regulation.
The second part of this thesis addresses a fundamental goal of computational biology, namely that of modeling a gene’s expression from its intergenic locus and trans-regulatory context. Owing to the distributed nature of cis-regulatory information and the poorly understood mechanisms that integrate such information, gene locus modeling is a more challenging task than modeling individual enhancers. Here we report the first quantitative model of a gene’s expression pattern as a function of its locus. We model the expression readout of a locus in two tiers: 1) combinatorial regulation by transcription factors bound to each enhancer is predicted by a thermodynamics-based model and 2) independent contributions from multiple enhancers are linearly combined to fit the gene expression pattern. The model does not require any prior knowledge about enhancers contributing toward a gene’s expression. We demonstrate that the model captures the complex multi-domain expression patterns of anterior-posterior patterning genes in the early Drosophila embryo. Altogether, we model the expression patterns of 27 genes; these include several gap genes, pair-rule genes, and anterior, posterior, trunk, and terminal genes. We find that the model-selected enhancers for each gene overlap strongly with its experimentally characterized enhancers. Our findings also suggest the presence of sequence-segments in the locus that would contribute ectopic expression patterns and hence were “shut down” by the model. We applied our model to identify the transcription factors responsible for forming the stripe boundaries of the studied genes. The resulting network of regulatory interactions exhibits a high level of agreement with known regulatory influences on the target genes. Finally, we analyzed whether and why our assumption of enhancer independence was necessary for the genes we studied. We found a deterioration of expression when binding sites in one enhancer were allowed to influence the readout of another enhancer. Thus, interference between enhancer activities was a possible factor necessitating enhancer independence in our model.
The third part of this thesis applies the aforementioned models to two novel datasets. The first dataset was created by fusing two well-studied CRMs of the even-skipped (eve) gene in Drosophila. The fused constructs differ in the way the CRMs’ orientation, order, and intervening spacing are varied. Interestingly, the two constituent CRMs regulate eve expression by using the same TFs, although binding affinities (i.e., strength) of the repressor sites in the two CRMs are different – an observation that has been implicated to help the CRMs drive expression in two distinct domains (each domain consists of two stripes of eve) when they act in their endogenous context. However, the fact that these two CRMs harbor sites for the same TFs makes it difficult to predict the readouts of the constructs in our dataset. In particular, readouts of these constructs show some subtle aspects that essentially challenge the conventional models of information integration from sequences and suggest that a different mechanism may be necessary to explain these observations. Our modeling of this novel dataset suggests that the conventional assumption that relatively short DNA sequences, e.g., CRMs, do not comprise smaller “independent” regulatory sequences may not be true – since the lengths of the fused constructs are comparable to typical CRMs and their readouts can be modeled by assuming the existence of smaller independent regulatory segments. The second dataset modeled in this part of the thesis features five genes that control the growth and patterning of wing in Drosophila. Notably, ours is the first attempt to link regulatory sequences and the related molecular details to the growth and scaling of an organ. In course of fitting this dataset, we identify the important regulatory role of a TF called Scalloped (Sd) and speculate on Sd’s role in assuring that the expression domains of the studied genes scale with wing growth. We also use our models to identify novel regulatory sequences of these genes and to answer several questions that were left open in the experimental studies that attempted first to understand the cis-regulatory logic for these genes
Transcriptional features of genomic regulatory blocks
CAGE tag mapping of transcription start sites across different human tissues shows that genomic regulatory blocks have unique features that are the likely cause of their ability to respond to regulatory inputs from very long distances
- …