442 research outputs found
DNA entropy reveals a significant difference in complexity between housekeeping and tissue specific gene promoters
BACKGROUND
The complexity of DNA can be quantified using estimates of entropy. Variation in DNA complexity is expected between the promoters of genes with different transcriptional mechanisms; namely housekeeping (HK) and tissue specific (TS). The former are transcribed constitutively to maintain general cellular functions, and the latter are transcribed in restricted tissue and cells types for specific molecular events. It is known that promoter features in the human genome are related to tissue specificity, but this has been difficult to quantify on a genomic scale. If entropy effectively quantifies DNA complexity, calculating the entropies of HK and TS gene promoters as profiles may reveal significant differences.
RESULTS
Entropy profiles were calculated for a total dataset of 12,003 human gene promoters and for 501 housekeeping (HK) and 587 tissue specific (TS) human gene promoters. The mean profiles show the TS promoters have a significantly lower entropy (p<2.2e-16) than HK gene promoters. The entropy distributions for the 3 datasets show that promoter entropies could be used to identify novel HK genes.
CONCLUSION
Functional features comprise DNA sequence patterns that are non-random and hence they have lower entropies. The lower entropy of TS gene promoters can be explained by a higher density of positive and negative regulatory elements, required for genes with complex spatial and temporary expression
Promoter architecture and gene expression dynamics in embryonic development
Genes indispensable for proper embryonic development show intricate patterns of expression
throughout the time, space and magnitude of their activity. This diversity is enabled by
elaborate regulatory mechanisms that guide their expression. They also possess a distinct
type of core promoters that enable the integration of all regulatory inputs. However, it is still
not clear how is coordination of regulation achieved. The first step towards understanding
this process is to characterise dynamics of expression, and core promoter features that process
the regulation.
In this thesis, I explored the diversity of spatio-temporal gene expression during zebrafish
development. I defined a novel measure of anatomical specificity that defines how precisely
an anatomical structure is defined in the Anatomical Ontology system. Using anatomical
specificity measure, I quantified gene expression dynamics from mRNA in situ hybridisation
data. Gene expression divergence from in situs was used to predict expression levels from
RNA-seq expression data. This analysis allowed me to propose a measure of gene expression
complexity which showed that genes with the highest complexity score are developmental
genes, whereas genes with low complexity score are involved in housekeeping functions. Next,
I developed a method that reports significantly enriched core promoter elements in a group
of genes. Using this method, I compared differences in core promoter composition in active
genes expressed in different developmental periods. In addition, this method found groups
of genes with a specific core promoter structure that are specified for a biological process.
Finally, I used scRNA-seq data from zebrafish development to identify patterns of gene
co-expression across different cell clusters. Co-expression suggests that a gene pair possesses
a common regulatory programme. I show that genes with the most divergent co-expression
patterns across development are developmental genes and that housekeeping genes have least
diverse co-expression patterns. I went further to create co-expression networks which allowed
me to analyse co-expression patterns into more details.Open Acces
Recommended from our members
Transcription regulation: models for combinatorial regulation and functional specificity
Gene regulation id controlled by transcription factor proteins that bind to specific DNA sequences, known as transcription factor binding sites (TFBSs). Combinations of transcription factors working, co-operatively in cis-regulatory modules (CRMs), play a role in regulating gene expression. Current computational methods for TFBS prediction cannot distinguish between functional and non-functional sites, and predict very large numbers of false positives.
The thesis focuses on the development of a novel computational model, based on artificial neural networks (ANNs), for the identification of functional TFBSs, and the CRMs within which they operate in the human genome. Datasets of 12,239 experimentally verified true positive (TP) TFBSs and 130,199 false positive (FP) TFBSs were extracted using a combination of position weight matrices from the JASPAR database and experimentally verified sites from the Encyclopedia of DNA elements (ENCODE). A number of machine learning alsgorithms were tested using a range of genetic information including gene expression, necleosome positioning, DNA methylation states and DNA entropy. The best model, that gave a mean area under the curve under a receiver operator characteristic curve of 0.800, was based on a feedforward ANN using backpropagation.
This model was then used to predict functional TFBSs in a number of gene sets from the human genome. The predictions, combined with experimentally proven TFBSs from ENCODE, were used to investigate combinatorial [atterns of TFBSs operating in CRMs. CRM patterns have been analysed in disease-associated genes located in linkage disequilibrium blocks containing SNPs obtained from Genome Wide Association Studies (GWAS).
The potential for the model to make functional TFBS predictions to aid in the annotation of orphan genes of unknown function is discussed. In addition this thesis presents computational work on a number of smaller published studies
Regulatory complexity in gene expression
The regulation of gene expression is the driver of cellular differentiation in multicellular
organisms; the result is a diverse range of cell types each with their own unique profile
of expression. Within these cell types the transcriptional product of a gene is up
or down regulated in response to intrinsic and extrinsic stimuli according to its own
regulatory programme encoded within the cell. The complexity of this regulatory
programme depends on the requirements of the gene to change expression states in
different cell lineages or temporally in response to a range of conditions. In the case of
many housekeeping genes integral to the survival of the cell, this programme is simple
- switch on the gene and leave it on, whereas often the required level and precision of
regulatory control is much more involved and lends to subtle changes in expression.
This raises many questions of precisely where and how that regulatory information is
encoded and whether different biological systems encode it in the same way.
This project attempts to answer these questions through the development of novel approaches
in quantifying the output of this regulatory programme according to the state
changes as observed from the expression profile of a given gene. Measures of complexity
in gene expression are calculated over a wide range of cell types and conditions collected
using CAGE, which provides a quantitative estimate of gene expression that precisely
defines the promoter utilised to initiate that expression. As expected, housekeeping
genes were found to be amongst the least complex, as a result of their uniform expression
profiles, as well as those genes highly restricted in their expression. The genes
most complex in their expression output were those associated with the presence of
H3K27me3 repressive marks; genes poised for activation in a specific set of cell types,
as well as those enriched in DNAse I hypersensitive sites in their upstream region but
not necessarily conserved in that region. Evidence also suggests that different promoters
associated with a gene contribute in different ways to its resultant regulatory
complexity, suggesting that certain promoters may be more crucial in driving the regulation
of some genes. This allows for the targeting of such promoters in the analysis of
certain diseases implicated by changes in regulatory regions. Indeed, genes known to
be associated with diseases such as leukaemia and Alzheimer’s are found to be highly
complex in their expression
Patterns and Complexity in Biological Systems: A Study of Sequence Structure and Ontology-based Networks
Biological information can be explored at many different levels, with the most basic information encoded in patterns within the DNA sequence. Through molecular level processes, these patterns are capable of controlling the states of genes, resulting in a complex network of interactions between genes. Key features of biological systems can be determined by evaluating properties of this gene regulatory network. More specifically, a network-based approach helps us to understand how the collective behavior of genes corresponds to patterns in genetic function.
We combine Chromatin-Immunoprecipitation microarray (ChIP-chip) data with genomic sequence data to determine how DNA sequence works to recruit various proteins. We quantify this information using a value termed "nmer-association.'' "Nmer-association'' measures how strongly individual DNA sequences are associated with a protein in a given ChIP-chip experiment. We also develop the "split-motif'' algorithm to study the underlying structural properties of DNA sequence independent of wet-lab data. The "split-motif'' algorithm finds pairs of DNA motifs which preferentially localize relative to one another. These pairs are primarily composed of known transcription factor binding sites and their co-occurrence is indicative of higher-order structure. This kind of structure has largely been missed in standard motif-finding algorithms despite emerging evidence of the importance of complex regulation.
In both simple and complex regulation, two genes that are connected in a regulatory fashion are likely to have shared functions. The Gene Ontology (GO) provides biologists with a controlled terminology with which to describe how genes are associated with function and how those functional terms are related to each other. We introduce a method for processing functional information in GO to produce a gene network. We find that the edges in this network are correlated with known regulatory interactions and that the strength of the functional relationship between two genes can be used as an indicator of how informationally important that link is in the regulatory network. We also investigate the network structure of gene-term annotations found in GO and use these associations to establish an alternate natural way to group the functional terms. These groups of terms are drastically different from the hierarchical structure established by the Gene Ontology and provide an alternative framework with which to describe and predict the functions of experimentally identified groups of genes
Complex organizational structure of the genome revealed by genome-wide analysis of single and alternative promoters in Drosophila melanogaster
<p>Abstract</p> <p>Background</p> <p>The promoter is a critical necessary transcriptional <it>cis</it>-regulatory element. In addition to its role as an assembly site for the basal transcriptional apparatus, the promoter plays a key part in mediating temporal and spatial aspects of gene expression through differential binding of transcription factors and selective interaction with distal enhancers. Although many genes have multiple promoters, little attention has been focused on how these relate to one another; nor has much study been directed at relationships between promoters of adjacent genes.</p> <p>Results</p> <p>We have undertaken a systematic investigation of <it>Drosophila </it>promoters. We divided promoters into three groups: unique promoters, first alternative promoters (the most 5' of a gene's multiple promoters), and downstream alternative promoters (the remaining alternative promoters 3' to the first). We observed distinct nucleotide distribution and sequence motif preferences among these three classes. We also investigated the promoters of neighboring genes and found that a greater than expected number of adjacent genes have similar sequence motif profiles, which may allow the genes to be regulated in a coordinated fashion. Consistent with this, there is a positive correlation between similar promoter motifs and related gene expression profiles for these genes.</p> <p>Conclusions</p> <p>Our results suggest that different regulatory mechanisms may apply to each of the three promoter classes, and provide a mechanism for "gene expression neighborhoods," local clusters of co-expressed genes. As a whole, our data reveal an unexpected complexity of genomic organization at the promoter level with respect to both alternative and neighboring promoters.</p
Profile analysis and prediction of tissue-specific CpG island methylation classes
<p>Abstract</p> <p>Background</p> <p>The computational prediction of DNA methylation has become an important topic in the recent years due to its role in the epigenetic control of normal and cancer-related processes. While previous prediction approaches focused merely on differences between methylated and unmethylated DNA sequences, recent experimental results have shown the presence of much more complex patterns of methylation across tissues and time in the human genome. These patterns are only partially described by a binary model of DNA methylation. In this work we propose a novel approach, based on profile analysis of tissue-specific methylation that uncovers significant differences in the sequences of CpG islands (CGIs) that predispose them to a tissue- specific methylation pattern.</p> <p>Results</p> <p>We defined CGI methylation profiles that separate not only between constitutively methylated and unmethylated CGIs, but also identify CGIs showing a differential degree of methylation across tissues and cell-types or a lack of methylation exclusively in sperm. These profiles are clearly distinguished by a number of CGI attributes including their evolutionary conservation, their significance, as well as the evolutionary evidence of prior methylation. Additionally, we assess profile functionality with respect to the different compartments of protein coding genes and their possible use in the prediction of DNA methylation.</p> <p>Conclusion</p> <p>Our approach provides new insights into the biological features that determine if a CGI has a functional role in the epigenetic control of gene expression and the features associated with CGI methylation susceptibility. Moreover, we show that the ability to predict CGI methylation is based primarily on the quality of the biological information used and the relationships uncovered between different sources of knowledge. The strategy presented here is able to predict, besides the constitutively methylated and unmethylated classes, two more tissue specific methylation classes conserving the accuracy provided by leading binary methylation classification methods.</p
Profile analysis and prediction of tissue-specific CpG island methylation classes
Background: The computational prediction of DNA methylation has become an important topic in the recent years due to its role in the epigenetic control of normal and cancer-related processes. While previous prediction approaches focused merely on differences between methylated and unmethylated DNA sequences, recent experimental results have shown the presence of much more complex patterns of methylation across tissues and time in the human genome. These patterns are only partially described by a binary model of DNA methylation. In this work we propose a novel approach, based on profile analysis of tissue-specific methylation that uncovers significant differences in the sequences of CpG islands (CGIs) that predispose them to a tissuespecific methylation pattern. Results: We defined CGI methylation profiles that separate not only between constitutively methylated and unmethylated CGIs, but also identify CGIs showing a differential degree of methylation across tissues and cell-types or a lack of methylation exclusively in sperm. These profiles are clearly distinguished by a number of CGI attributes including their evolutionary conservation, their significance, as well as the evolutionary evidence of prior methylation. Additionally, we assess profile functionality with respect to the different compartments of protein coding genes and their possible use in the prediction of DNA methylation. Conclusion: Our approach provides new insights into the biological features that determine if a CGI has a functional role in the epigenetic control of gene expression and the features associated with CGI methylation susceptibility. Moreover, we show that the ability to predict CGI methylation is based primarily on the quality of the biological information used and the relationships uncovered between different sources of knowledge. The strategy presented here is able to predict, besides the constitutively methylated and unmethylated classes, two more tissue specific methylation classes conserving the accuracy provided by leading binary methylation classification methods.publishedVersionPeer Reviewe
- …