154 research outputs found

    Toward a gold standard for promoter prediction evaluation

    Get PDF
    Motivation: Promoter prediction is an important task in genome annotation projects, and during the past years many new promoter prediction programs (PPPs) have emerged. However, many of these programs are compared inadequately to other programs. In most cases, only a small portion of the genome is used to evaluate the program, which is not a realistic setting for whole genome annotation projects. In addition, a common evaluation design to properly compare PPPs is still lacking

    ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles

    Get PDF
    Motivation: More and more genomes are being sequenced, and to keep up with the pace of sequencing projects, automated annotation techniques are required. One of the most challenging problems in genome annotation is the identification of the core promoter. Because the identification of the transcription initiation region is such a challenging problem, it is not yet a common practice to integrate transcription start site prediction in genome annotation projects. Nevertheless, better core promoter prediction can improve genome annotation and can be used to guide experimental work

    Myelin-associated glycoprotein gene mutation causes Pelizaeus-Merzbacher disease-like disorder

    Get PDF
    Pelizaeus-Merzbacher disease is an X-linked hypomyelinating leukodystrophy. Lossos et al. describe a family with an early-onset Pelizaeus-Merzbacher disease-like phenotype that slowly evolves into complicated hereditary spastic paraplegia, affecting both the CNS and PNS. Exome sequencing reveals a causative homozygous missense mutation in MAG, which encodes myelin associated glycoprotei

    Particle Swarm Optimization with Reinforcement Learning for the Prediction of CpG Islands in the Human Genome

    Get PDF
    BACKGROUND: Regions with abundant GC nucleotides, a high CpG number, and a length greater than 200 bp in a genome are often referred to as CpG islands. These islands are usually located in the 5' end of genes. Recently, several algorithms for the prediction of CpG islands have been proposed. METHODOLOGY/PRINCIPAL FINDINGS: We propose here a new method called CPSORL to predict CpG islands, which consists of a complement particle swarm optimization algorithm combined with reinforcement learning to predict CpG islands more reliably. Several CpG island prediction tools equipped with the sliding window technique have been developed previously. However, the quality of the results seems to rely too much on the choices that are made for the window sizes, and thus these methods leave room for improvement. CONCLUSIONS/SIGNIFICANCE: Experimental results indicate that CPSORL provides results of a higher sensitivity and a higher correlation coefficient in all selected experimental contigs than the other methods it was compared to (CpGIS, CpGcluster, CpGProd and CpGPlot). A higher number of CpG islands were identified in chromosomes 21 and 22 of the human genome than with the other methods from the literature. CPSORL also achieved the highest coverage rate (3.4%). CPSORL is an application for identifying promoter and TSS regions associated with CpG islands in entire human genomic. When compared to CpGcluster, the islands predicted by CPSORL covered a larger region in the TSS (12.2%) and promoter (26.1%) region. If Alu sequences are considered, the islands predicted by CPSORL (Alu) covered a larger TSS (40.5%) and promoter (67.8%) region than CpGIS. Furthermore, CPSORL was used to verify that the average methylation density was 5.33% for CpG islands in the entire human genome

    CpG_MI: a novel approach for identifying functional CpG islands in mammalian genomes

    Get PDF
    CpG islands (CGIs) are CpG-rich regions compared to CpG-depleted bulk DNA of mammalian genomes and are generally regarded as the epigenetic regulatory regions in association with unmethylation, promoter activity and histone modifications. Accurate identification of CpG islands with epigenetic regulatory function in bulk genomes is of wide interest. Here, the common features of functional CGIs are identified using an average mutual information method to differentiate functional CGIs from the remaining CGIs. A new approach (CpG mutual information, CpG_MI) was further explored to identify functional CGIs based on the cumulative mutual information of physical distances between two neighboring CpGs. Compared to current approaches, CpG_MI achieved the highest prediction accuracy. This approach also identified new functional CGIs overlapping with gene promoter regions which were missed by other algorithms. Nearly all CGIs identified by CpG_MI overlapped with histone modification marks. CpG_MI could also be used to identify potential functional CGIs in other mammalian genomes, as the CpG dinucleotide contents and cumulative mutual information distributions are almost the same among six mammalian genomes in our analysis. It is a reliable quantitative tool for the identification of functional CGIs from bulk genomes and helps in understanding the relationships between genomic functional elements and epigenomic modifications

    Gene expression signatures in motor neurone disease fibroblasts reveal dysregulation of metabolism, hypoxia-response and RNA processing functions

    Get PDF
    Aims Amyotrophic lateral sclerosis (ALS) and primary lateral sclerosis (PLS) are two syndromic variants within the motor neurone disease spectrum. As PLS and most ALS cases are sporadic (SALS), this limits the availability of cellular models for investigating pathogenic mechanisms and therapeutic targets. The aim of this study was to use gene expression profiling to evaluate fibroblasts as cellular models for SALS and PLS, to establish whether dysregulated biological processes recapitulate those seen in the central nervous system and to elucidate pathways that distinguish the clinically defined variants of SALS and PLS. Methods Microarray analysis was performed on fibroblast RNA and differentially expressed genes identified. Genes in enriched biological pathways were validated by quantitative PCR and functional assays performed to establish the effect of altered RNA levels on the cellular processes. Results Gene expression profiling demonstrated that whilst there were many differentially expressed genes in common between SALS and PLS fibroblasts, there were many more expressed specifically in the SALS fibroblasts, including those involved in RNA processing and the stress response. Functional analysis of the fibroblasts confirmed a significant decrease in miRNA production and a reduced response to hypoxia in SALS fibroblasts. Furthermore, metabolic gene changes seen in SALS, many of which were also evident in PLS fibroblasts, resulted in dysfunctional cellular respiration. Conclusions The data demonstrate that fibroblasts can act as cellular models for ALS and PLS, by establishing the transcriptional changes in known pathogenic pathways that confer subsequent functional effects and potentially highlight targets for therapeutic intervention

    DNA Methylation and Genome Evolution in Honeybee: Gene Length, Expression, Functional Enrichment Covary with the Evolutionary Signature of DNA Methylation

    Get PDF
    A growing body of evidence suggests that DNA methylation is functionally divergent among different taxa. The recently discovered functional methylation system in the honeybee Apis mellifera presents an attractive invertebrate model system to study evolution and function of DNA methylation. In the honeybee, DNA methylation is mostly targeted toward transcription units (gene bodies) of a subset of genes. Here, we report an intriguing covariation of length and epigenetic status of honeybee genes. Hypermethylated and hypomethylated genes in honeybee are dramatically different in their lengths for both exons and introns. By analyzing orthologs in Drosophila melanogaster, Acyrthosiphon pisum, and Ciona intestinalis, we show genes that were short and long in the past are now preferentially situated in hyper- and hypomethylated classes respectively, in the honeybee. Moreover, we demonstrate that a subset of high-CpG genes are conspicuously longer than expected under the evolutionary relationship alone and that they are enriched in specific functional categories. We suggest that gene length evolution in the honeybee is partially driven by evolutionary forces related to regulation of gene expression, which in turn is associated with DNA methylation. However, lineage-specific patterns of gene length evolution suggest that there may exist additional forces underlying the observed interaction between DNA methylation and gene lengths in the honeybee

    Comprehensive analysis of the base composition around the transcription start site in Metazoa

    Get PDF
    BACKGROUND: The transcription start site of a metazoan gene remains poorly understood, mostly because there is no clear signal present in all genes. Now that several sequenced metazoan genomes have been annotated, we have been able to compare the base composition around the transcription start site for all annotated genes across multiple genomes. RESULTS: The most prominent feature in the base compositions is a significant local variation in G+C content over a large region around the transcription start site. The change is present in all animal phyla but the extent of variation is different between distinct classes of vertebrates, and the shape of the variation is completely different between vertebrates and arthropods. Furthermore, the height of the variation correlates with CpG frequencies in vertebrates but not in invertebrates and it also correlates with gene expression, especially in mammals. We also detect GC and AT skews in all clades (where %G is not equal to %C or %A is not equal to %T respectively) but these occur in a more confined region around the transcription start site and in the coding region. CONCLUSIONS: The dramatic changes in nucleotide composition in humans are a consequence of CpG nucleotide frequencies and of gene expression, the changes in Fugu could point to primordial CpG islands, and the changes in the fly are of a totally different kind and unrelated to dinucleotide frequencies

    Assessment of clusters of transcription factor binding sites in relationship to human promoter, CpG islands and gene expression

    Get PDF
    BACKGROUND: Gene expression is regulated mainly by transcription factors (TFs) that interact with regulatory cis-elements on DNA sequences. To identify functional regulatory elements, computer searching can predict TF binding sites (TFBS) using position weight matrices (PWMs) that represent positional base frequencies of collected experimentally determined TFBS. A disadvantage of this approach is the large output of results for genomic DNA. One strategy to identify genuine TFBS is to utilize local concentrations of predicted TFBS. It is unclear whether there is a general tendency for TFBS to cluster at promoter regions, although this is the case for certain TFBS. Also unclear is the identification of TFs that have TFBS concentrated in promoters and to what level this occurs. This study hopes to answer some of these questions. RESULTS: We developed the cluster score measure to evaluate the correlation between predicted TFBS clusters and promoter sequences for each PWM. Non-promoter sequences were used as a control. Using the cluster score, we identified a PWM group called PWM-PCP, in which TFBS clusters positively correlate with promoters, and another PWM group called PWM-NCP, in which TFBS clusters negatively correlate with promoters. The PWM-PCP group comprises 47% of the 199 vertebrate PWMs, while the PWM-NCP group occupied 11 percent. After reducing the effect of CpG islands (CGI) against the clusters using partial correlation coefficients among three properties (promoter, CGI and predicted TFBS cluster), we identified two PWM groups including those strongly correlated with CGI and those not correlated with CGI. CONCLUSION: Not all PWMs predict TFBS correlated with human promoter sequences. Two main PWM groups were identified: (1) those that show TFBS clustered in promoters associated with CGI, and (2) those that show TFBS clustered in promoters independent of CGI. Assessment of PWM matches will allow more positive interpretation of TFBS in regulatory regions

    Gradual transition from mosaic to global DNA methylation patterns during deuterostome evolution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA methylation by the Dnmt family occurs in vertebrates and invertebrates, including ascidians, and is thought to play important roles in gene regulation and genome stability, especially in vertebrates. However, the global methylation patterns of vertebrates and invertebrates are distinctive. Whereas almost all CpG sites are methylated in vertebrates, with the exception of those in CpG islands, the ascidian genome contains approximately equal amounts of methylated and unmethylated regions. Curiously, methylation status can be reliably estimated from the local frequency of CpG dinucleotides in the ascidian genome. Methylated and unmethylated regions tend to have few and many CpG sites, respectively, consistent with our knowledge of the methylation status of CpG islands and other regions in mammals. However, DNA methylation patterns and levels in vertebrates and invertebrates have not been analyzed in the same way.</p> <p>Results</p> <p>Using a new computational methodology based on the decomposition of the bimodal distributions of methylated and unmethylated regions, we estimated the extent of the global methylation patterns in a wide range of animals. We then examined the epigenetic changes <it>in silico </it>along the phylogenetic tree. We observed a gradual transition from fractional to global patterns of methylation in deuterostomes, rather than a clear demarcation between vertebrates and invertebrates. When we applied this methodology to six piscine genomes, some of which showed features similar to those of invertebrates.</p> <p>Conclusions</p> <p>The mammalian global DNA methylation pattern was probably not acquired at an early stage of vertebrate evolution, but gradually expanded from that of a more ancient organism.</p
    corecore