Search CORE

442 research outputs found

DNA entropy reveals a significant difference in complexity between housekeeping and tissue specific gene promoters

Author: Finan Christopher
Jones Susan
Newport Melanie
Thomas David
Publication venue: 'Elsevier BV'
Publication date: 01/10/2015
Field of study

BACKGROUND The complexity of DNA can be quantified using estimates of entropy. Variation in DNA complexity is expected between the promoters of genes with different transcriptional mechanisms; namely housekeeping (HK) and tissue specific (TS). The former are transcribed constitutively to maintain general cellular functions, and the latter are transcribed in restricted tissue and cells types for specific molecular events. It is known that promoter features in the human genome are related to tissue specificity, but this has been difficult to quantify on a genomic scale. If entropy effectively quantifies DNA complexity, calculating the entropies of HK and TS gene promoters as profiles may reveal significant differences. RESULTS Entropy profiles were calculated for a total dataset of 12,003 human gene promoters and for 501 housekeeping (HK) and 587 tissue specific (TS) human gene promoters. The mean profiles show the TS promoters have a significantly lower entropy (p<2.2e-16) than HK gene promoters. The entropy distributions for the 3 datasets show that promoter entropies could be used to identify novel HK genes. CONCLUSION Functional features comprise DNA sequence patterns that are non-random and hence they have lower entropies. The lower entropy of TS gene promoters can be explained by a higher density of positive and negative regulatory elements, required for genes with complex spatial and temporary expression

University of Dundee Online Publications

Sussex Research Online

Promoter architecture and gene expression dynamics in embryonic development

Author: Vučenović Dunja
Publication venue: Institute of Clinical Sciences, Imperial College London
Publication date: 01/07/2020
Field of study

Genes indispensable for proper embryonic development show intricate patterns of expression throughout the time, space and magnitude of their activity. This diversity is enabled by elaborate regulatory mechanisms that guide their expression. They also possess a distinct type of core promoters that enable the integration of all regulatory inputs. However, it is still not clear how is coordination of regulation achieved. The first step towards understanding this process is to characterise dynamics of expression, and core promoter features that process the regulation. In this thesis, I explored the diversity of spatio-temporal gene expression during zebrafish development. I defined a novel measure of anatomical specificity that defines how precisely an anatomical structure is defined in the Anatomical Ontology system. Using anatomical specificity measure, I quantified gene expression dynamics from mRNA in situ hybridisation data. Gene expression divergence from in situs was used to predict expression levels from RNA-seq expression data. This analysis allowed me to propose a measure of gene expression complexity which showed that genes with the highest complexity score are developmental genes, whereas genes with low complexity score are involved in housekeeping functions. Next, I developed a method that reports significantly enriched core promoter elements in a group of genes. Using this method, I compared differences in core promoter composition in active genes expressed in different developmental periods. In addition, this method found groups of genes with a specific core promoter structure that are specified for a biological process. Finally, I used scRNA-seq data from zebrafish development to identify patterns of gene co-expression across different cell clusters. Co-expression suggests that a gene pair possesses a common regulatory programme. I show that genes with the most divergent co-expression patterns across development are developmental genes and that housekeeping genes have least diverse co-expression patterns. I went further to create co-expression networks which allowed me to analyse co-expression patterns into more details.Open Acces

Spiral - Imperial College Digital Repository

Recommended from our members

Transcription regulation: models for combinatorial regulation and functional specificity

Author: Thomas David John
Publication venue
Publication date: 16/05/2014
Field of study

Gene regulation id controlled by transcription factor proteins that bind to specific DNA sequences, known as transcription factor binding sites (TFBSs). Combinations of transcription factors working, co-operatively in cis-regulatory modules (CRMs), play a role in regulating gene expression. Current computational methods for TFBS prediction cannot distinguish between functional and non-functional sites, and predict very large numbers of false positives. The thesis focuses on the development of a novel computational model, based on artificial neural networks (ANNs), for the identification of functional TFBSs, and the CRMs within which they operate in the human genome. Datasets of 12,239 experimentally verified true positive (TP) TFBSs and 130,199 false positive (FP) TFBSs were extracted using a combination of position weight matrices from the JASPAR database and experimentally verified sites from the Encyclopedia of DNA elements (ENCODE). A number of machine learning alsgorithms were tested using a range of genetic information including gene expression, necleosome positioning, DNA methylation states and DNA entropy. The best model, that gave a mean area under the curve under a receiver operator characteristic curve of 0.800, was based on a feedforward ANN using backpropagation. This model was then used to predict functional TFBSs in a number of gene sets from the human genome. The predictions, combined with experimentally proven TFBSs from ENCODE, were used to investigate combinatorial [atterns of TFBSs operating in CRMs. CRM patterns have been analysed in disease-associated genes located in linkage disequilibrium blocks containing SNPs obtained from Genome Wide Association Studies (GWAS). The potential for the model to make functional TFBS predictions to aid in the annotation of orphan genes of unknown function is discussed. In addition this thesis presents computational work on a number of smaller published studies

Sussex Research Online

Regulatory complexity in gene expression

Author: Rennie Sarah
Publication venue: The University of Edinburgh
Publication date: 08/07/2017
Field of study

The regulation of gene expression is the driver of cellular differentiation in multicellular organisms; the result is a diverse range of cell types each with their own unique profile of expression. Within these cell types the transcriptional product of a gene is up or down regulated in response to intrinsic and extrinsic stimuli according to its own regulatory programme encoded within the cell. The complexity of this regulatory programme depends on the requirements of the gene to change expression states in different cell lineages or temporally in response to a range of conditions. In the case of many housekeeping genes integral to the survival of the cell, this programme is simple - switch on the gene and leave it on, whereas often the required level and precision of regulatory control is much more involved and lends to subtle changes in expression. This raises many questions of precisely where and how that regulatory information is encoded and whether different biological systems encode it in the same way. This project attempts to answer these questions through the development of novel approaches in quantifying the output of this regulatory programme according to the state changes as observed from the expression profile of a given gene. Measures of complexity in gene expression are calculated over a wide range of cell types and conditions collected using CAGE, which provides a quantitative estimate of gene expression that precisely defines the promoter utilised to initiate that expression. As expected, housekeeping genes were found to be amongst the least complex, as a result of their uniform expression profiles, as well as those genes highly restricted in their expression. The genes most complex in their expression output were those associated with the presence of H3K27me3 repressive marks; genes poised for activation in a specific set of cell types, as well as those enriched in DNAse I hypersensitive sites in their upstream region but not necessarily conserved in that region. Evidence also suggests that different promoters associated with a gene contribute in different ways to its resultant regulatory complexity, suggesting that certain promoters may be more crucial in driving the regulation of some genes. This allows for the targeting of such promoters in the analysis of certain diseases implicated by changes in regulatory regions. Indeed, genes known to be associated with diseases such as leukaemia and Alzheimer’s are found to be highly complex in their expression

Edinburgh Research Archive

Patterns and Complexity in Biological Systems: A Study of Sequence Structure and Ontology-based Networks

Author: Glass Kimberly
Publication venue
Publication date: 01/01/2010
Field of study

Biological information can be explored at many different levels, with the most basic information encoded in patterns within the DNA sequence. Through molecular level processes, these patterns are capable of controlling the states of genes, resulting in a complex network of interactions between genes. Key features of biological systems can be determined by evaluating properties of this gene regulatory network. More specifically, a network-based approach helps us to understand how the collective behavior of genes corresponds to patterns in genetic function. We combine Chromatin-Immunoprecipitation microarray (ChIP-chip) data with genomic sequence data to determine how DNA sequence works to recruit various proteins. We quantify this information using a value termed "nmer-association.'' "Nmer-association'' measures how strongly individual DNA sequences are associated with a protein in a given ChIP-chip experiment. We also develop the "split-motif'' algorithm to study the underlying structural properties of DNA sequence independent of wet-lab data. The "split-motif'' algorithm finds pairs of DNA motifs which preferentially localize relative to one another. These pairs are primarily composed of known transcription factor binding sites and their co-occurrence is indicative of higher-order structure. This kind of structure has largely been missed in standard motif-finding algorithms despite emerging evidence of the importance of complex regulation. In both simple and complex regulation, two genes that are connected in a regulatory fashion are likely to have shared functions. The Gene Ontology (GO) provides biologists with a controlled terminology with which to describe how genes are associated with function and how those functional terms are related to each other. We introduce a method for processing functional information in GO to produce a gene network. We find that the edges in this network are correlated with known regulatory interactions and that the strength of the functional relationship between two genes can be used as an indicator of how informationally important that link is in the regulatory network. We also investigate the network structure of gene-term annotations found in GO and use these associations to establish an alternate natural way to group the functional terms. These groups of terms are drastically different from the hierarchical structure established by the Gene Ontology and provide an alternative framework with which to describe and predict the functions of experimentally identified groups of genes

Digital Repository at the University of Maryland

Complex organizational structure of the genome revealed by genome-wide analysis of single and alternative promoters in Drosophila melanogaster

Author: Halfon Marc S
Zhu Qianqian
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The promoter is a critical necessary transcriptional <it>cis</it>-regulatory element. In addition to its role as an assembly site for the basal transcriptional apparatus, the promoter plays a key part in mediating temporal and spatial aspects of gene expression through differential binding of transcription factors and selective interaction with distal enhancers. Although many genes have multiple promoters, little attention has been focused on how these relate to one another; nor has much study been directed at relationships between promoters of adjacent genes. Results We have undertaken a systematic investigation of <it>Drosophila </it>promoters. We divided promoters into three groups: unique promoters, first alternative promoters (the most 5' of a gene's multiple promoters), and downstream alternative promoters (the remaining alternative promoters 3' to the first). We observed distinct nucleotide distribution and sequence motif preferences among these three classes. We also investigated the promoters of neighboring genes and found that a greater than expected number of adjacent genes have similar sequence motif profiles, which may allow the genes to be regulated in a coordinated fashion. Consistent with this, there is a positive correlation between similar promoter motifs and related gene expression profiles for these genes. Conclusions Our results suggest that different regulatory mechanisms may apply to each of the three promoter classes, and provide a mechanism for "gene expression neighborhoods," local clusters of co-expressed genes. As a whole, our data reveal an unexpected complexity of genomic organization at the promoter level with respect to both alternative and neighboring promoters.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Profile analysis and prediction of tissue-specific CpG island methylation classes

Author: del Val Coral
Harari Oscar
Previti Christopher
Zwir Igor
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The computational prediction of DNA methylation has become an important topic in the recent years due to its role in the epigenetic control of normal and cancer-related processes. While previous prediction approaches focused merely on differences between methylated and unmethylated DNA sequences, recent experimental results have shown the presence of much more complex patterns of methylation across tissues and time in the human genome. These patterns are only partially described by a binary model of DNA methylation. In this work we propose a novel approach, based on profile analysis of tissue-specific methylation that uncovers significant differences in the sequences of CpG islands (CGIs) that predispose them to a tissue- specific methylation pattern. Results We defined CGI methylation profiles that separate not only between constitutively methylated and unmethylated CGIs, but also identify CGIs showing a differential degree of methylation across tissues and cell-types or a lack of methylation exclusively in sperm. These profiles are clearly distinguished by a number of CGI attributes including their evolutionary conservation, their significance, as well as the evolutionary evidence of prior methylation. Additionally, we assess profile functionality with respect to the different compartments of protein coding genes and their possible use in the prediction of DNA methylation. Conclusion Our approach provides new insights into the biological features that determine if a CGI has a functional role in the epigenetic control of gene expression and the features associated with CGI methylation susceptibility. Moreover, we show that the ability to predict CGI methylation is based primarily on the quality of the biological information used and the relationships uncovered between different sources of knowledge. The strategy presented here is able to predict, besides the constitutively methylated and unmethylated classes, two more tissue specific methylation classes conserving the accuracy provided by leading binary methylation classification methods.</p

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

NORA - Norwegian Open Research Archives

Profile analysis and prediction of tissue-specific CpG island methylation classes

Author: del Val Coral
Harari Oscar
Previti Christopher
Zwir Igor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/08/2013
Field of study

Background: The computational prediction of DNA methylation has become an important topic in the recent years due to its role in the epigenetic control of normal and cancer-related processes. While previous prediction approaches focused merely on differences between methylated and unmethylated DNA sequences, recent experimental results have shown the presence of much more complex patterns of methylation across tissues and time in the human genome. These patterns are only partially described by a binary model of DNA methylation. In this work we propose a novel approach, based on profile analysis of tissue-specific methylation that uncovers significant differences in the sequences of CpG islands (CGIs) that predispose them to a tissuespecific methylation pattern. Results: We defined CGI methylation profiles that separate not only between constitutively methylated and unmethylated CGIs, but also identify CGIs showing a differential degree of methylation across tissues and cell-types or a lack of methylation exclusively in sperm. These profiles are clearly distinguished by a number of CGI attributes including their evolutionary conservation, their significance, as well as the evolutionary evidence of prior methylation. Additionally, we assess profile functionality with respect to the different compartments of protein coding genes and their possible use in the prediction of DNA methylation. Conclusion: Our approach provides new insights into the biological features that determine if a CGI has a functional role in the epigenetic control of gene expression and the features associated with CGI methylation susceptibility. Moreover, we show that the ability to predict CGI methylation is based primarily on the quality of the biological information used and the relationships uncovered between different sources of knowledge. The strategy presented here is able to predict, besides the constitutively methylated and unmethylated classes, two more tissue specific methylation classes conserving the accuracy provided by leading binary methylation classification methods.publishedVersionPeer Reviewe

University of Bergen

Information theoretical approaches for the identification of potentially cooperating transcription factors

Author: Meckbach Cornelia
Publication venue
Publication date: 21/06/2019
Field of study

Georg-August-University Göttingen