40 research outputs found

    ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles

    Get PDF
    Motivation: More and more genomes are being sequenced, and to keep up with the pace of sequencing projects, automated annotation techniques are required. One of the most challenging problems in genome annotation is the identification of the core promoter. Because the identification of the transcription initiation region is such a challenging problem, it is not yet a common practice to integrate transcription start site prediction in genome annotation projects. Nevertheless, better core promoter prediction can improve genome annotation and can be used to guide experimental work

    Myelin-associated glycoprotein gene mutation causes Pelizaeus-Merzbacher disease-like disorder

    Get PDF
    Pelizaeus-Merzbacher disease is an X-linked hypomyelinating leukodystrophy. Lossos et al. describe a family with an early-onset Pelizaeus-Merzbacher disease-like phenotype that slowly evolves into complicated hereditary spastic paraplegia, affecting both the CNS and PNS. Exome sequencing reveals a causative homozygous missense mutation in MAG, which encodes myelin associated glycoprotei

    High Sensitivity TSS Prediction: Estimates of Locations Where TSS Cannot Occur

    Get PDF
    Although transcription in mammalian genomes can initiate from various genomic positions (e.g., 3′UTR, coding exons, etc.), most locations on genomes are not prone to transcription initiation. It is of practical and theoretical interest to be able to estimate such collections of non-TSS locations (NTLs). The identification of large portions of NTLs can contribute to better focusing the search for TSS locations and thus contribute to promoter and gene finding. It can help in the assessment of 5′ completeness of expressed sequences, contribute to more successful experimental designs, as well as more accurate gene annotation.Using comprehensive collections of Cap Analysis of Gene Expression (CAGE) and other transcript data from mouse and human genomes, we developed a methodology that allows us, by performing computational TSS prediction with very high sensitivity, to annotate, with a high accuracy in a strand specific manner, locations of mammalian genomes that are highly unlikely to harbor transcription start sites (TSSs). The properties of the immediate genomic neighborhood of 98,682 accurately determined mouse and 113,814 human TSSs are used to determine features that distinguish genomic transcription initiation locations from those that are not likely to initiate transcription. In our algorithm we utilize various constraining properties of features identified in the upstream and downstream regions around TSSs, as well as statistical analyses of these surrounding regions.

    Particle Swarm Optimization with Reinforcement Learning for the Prediction of CpG Islands in the Human Genome

    Get PDF
    BACKGROUND: Regions with abundant GC nucleotides, a high CpG number, and a length greater than 200 bp in a genome are often referred to as CpG islands. These islands are usually located in the 5' end of genes. Recently, several algorithms for the prediction of CpG islands have been proposed. METHODOLOGY/PRINCIPAL FINDINGS: We propose here a new method called CPSORL to predict CpG islands, which consists of a complement particle swarm optimization algorithm combined with reinforcement learning to predict CpG islands more reliably. Several CpG island prediction tools equipped with the sliding window technique have been developed previously. However, the quality of the results seems to rely too much on the choices that are made for the window sizes, and thus these methods leave room for improvement. CONCLUSIONS/SIGNIFICANCE: Experimental results indicate that CPSORL provides results of a higher sensitivity and a higher correlation coefficient in all selected experimental contigs than the other methods it was compared to (CpGIS, CpGcluster, CpGProd and CpGPlot). A higher number of CpG islands were identified in chromosomes 21 and 22 of the human genome than with the other methods from the literature. CPSORL also achieved the highest coverage rate (3.4%). CPSORL is an application for identifying promoter and TSS regions associated with CpG islands in entire human genomic. When compared to CpGcluster, the islands predicted by CPSORL covered a larger region in the TSS (12.2%) and promoter (26.1%) region. If Alu sequences are considered, the islands predicted by CPSORL (Alu) covered a larger TSS (40.5%) and promoter (67.8%) region than CpGIS. Furthermore, CPSORL was used to verify that the average methylation density was 5.33% for CpG islands in the entire human genome

    Comprehensive analysis of the base composition around the transcription start site in Metazoa

    Get PDF
    BACKGROUND: The transcription start site of a metazoan gene remains poorly understood, mostly because there is no clear signal present in all genes. Now that several sequenced metazoan genomes have been annotated, we have been able to compare the base composition around the transcription start site for all annotated genes across multiple genomes. RESULTS: The most prominent feature in the base compositions is a significant local variation in G+C content over a large region around the transcription start site. The change is present in all animal phyla but the extent of variation is different between distinct classes of vertebrates, and the shape of the variation is completely different between vertebrates and arthropods. Furthermore, the height of the variation correlates with CpG frequencies in vertebrates but not in invertebrates and it also correlates with gene expression, especially in mammals. We also detect GC and AT skews in all clades (where %G is not equal to %C or %A is not equal to %T respectively) but these occur in a more confined region around the transcription start site and in the coding region. CONCLUSIONS: The dramatic changes in nucleotide composition in humans are a consequence of CpG nucleotide frequencies and of gene expression, the changes in Fugu could point to primordial CpG islands, and the changes in the fly are of a totally different kind and unrelated to dinucleotide frequencies

    Detailed Analysis of <em>ITPR1 </em>Missense Variants Guides Diagnostics and Therapeutic Design

    Get PDF
    \ua9 2023 The Authors. Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.Background: The ITPR1 gene encodes the inositol 1,4,5-trisphosphate (IP3) receptor type 1 (IP3R1), a critical player in cerebellar intracellular calcium signaling. Pathogenic missense variants in ITPR1 cause congenital spinocerebellar ataxia type 29 (SCA29), Gillespie syndrome (GLSP), and severe pontine/cerebellar hypoplasia. The pathophysiological basis of the different phenotypes is poorly understood. Objectives: We aimed to identify novel SCA29 and GLSP cases to define core phenotypes, describe the spectrum of missense variation across ITPR1, standardize the ITPR1 variant nomenclature, and investigate disease progression in relation to cerebellar atrophy. Methods: Cases were identified using next-generation sequencing through the Deciphering Developmental Disorders study, the 100,000 Genomes project, and clinical collaborations. ITPR1 alternative splicing in the human cerebellum was investigated by quantitative polymerase chain reaction. Results: We report the largest, multinational case series of 46 patients with 28 unique ITPR1 missense variants. Variants clustered in functional domains of the protein, especially in the N-terminal IP3-binding domain, the carbonic anhydrase 8 (CA8)-binding region, and the C-terminal transmembrane channel domain. Variants outside these domains were of questionable clinical significance. Standardized transcript annotation, based on our ITPR1 transcript expression data, greatly facilitated analysis. Genotype–phenotype associations were highly variable. Importantly, while cerebellar atrophy was common, cerebellar volume loss did not correlate with symptom progression. Conclusions: This dataset represents the largest cohort of patients with ITPR1 missense variants, expanding the clinical spectrum of SCA29 and GLSP. Standardized transcript annotation is essential for future reporting. Our findings will aid in diagnostic interpretation in the clinic and guide selection of variants for preclinical studies. \ua9 2023 The Authors. Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society

    Responsiveness of genes to manipulation of transcription factors in ES cells is associated with histone modifications and tissue specificity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In addition to determining static states of gene expression (high vs. low), it is important to characterize their dynamic status. For example, genes with H3K27me3 chromatin marks are not only suppressed but also poised for activation. However, the responsiveness of genes to perturbations has never been studied systematically. To distinguish gene responses to specific factors from responsiveness in general, it is necessary to analyze gene expression profiles of cells responding to a large variety of disturbances, and such databases did not exist before.</p> <p>Results</p> <p>We estimated the responsiveness of all genes in mouse ES cells using our recently published database on expression change after controlled induction of 53 transcription factors (TFs) and other genes. Responsive genes (<it>N </it>= 4746), which were readily upregulated or downregulated depending on the kind of perturbation, mostly have regulatory functions and a propensity to become tissue-specific upon differentiation. Tissue-specific expression was evaluated on the basis of published (GNF) and our new data for 15 organs and tissues. Non-responsive genes (<it>N </it>= 9562), which did not change their expression much following any perturbation, were enriched in housekeeping functions. We found that TF-responsiveness in ES cells is the best predictor known for tissue-specificity in gene expression. Among genes with CpG islands, high responsiveness is associated with H3K27me3 chromatin marks, and low responsiveness is associated with H3K36me3 chromatin, stronger tri-methylation of H3K4, binding of E2F1, and GABP binding motifs in promoters.</p> <p>Conclusions</p> <p>We thus propose the responsiveness of expression to perturbations as a new way to define the dynamic status of genes, which brings new insights into mechanisms of regulation of gene expression and tissue specificity.</p

    Genetic and epigenetic changes in the common 1p36 deletion in neuroblastoma tumours

    Get PDF
    Chromosome 1p is frequently deleted in neuroblastoma (NB) tumours. The commonly deleted region has been narrowed down by loss of heterozygosity studies undertaken by different groups. Based on earlier mapping data, we have focused on a region on 1p36 (chr1: 7 765 595–11 019 814) and performed an analysis of 30 genes by exploring features such as epigenetic regulation, that is DNA methylation and histone deacetylation, mutations at the DNA level and mRNA expression. Treatment of NB cell lines with the histone deacetylase inhibitor trichostatin A led to increased gene transcription of four of the 30 genes, ERRFI1 (MIG-6), PIK3CD, RBP7 (CRBPIV) and CASZ1, indicating that these genes could be affected by epigenetic downregulation in NBs. Two patients with nonsynonymous mutations in the PIK3CD gene were detected. One patient harboured three variations in the same exon, and p.R188W. The other patient had the variation p.M655I. In addition, synonymous variations and one variation in an intronic sequence were also found. The mRNA expression of this gene is downregulated in unfavourable, compared to favourable, NBs. One nonsynonymous mutation was also identified in the ERRFI1 gene, p.N343S, and one synonymous. None of the variations above were found in healthy control individuals. In conclusion, of the 30 genes analysed, the PIK3CD gene stands out as one of the most interesting for further studies of NB development and progression

    Transcription Initiation Activity Sets Replication Origin Efficiency in Mammalian Cells

    Get PDF
    Genomic mapping of DNA replication origins (ORIs) in mammals provides a powerful means for understanding the regulatory complexity of our genome. Here we combine a genome-wide approach to identify preferential sites of DNA replication initiation at 0.4% of the mouse genome with detailed molecular analysis at distinct classes of ORIs according to their location relative to the genes. Our study reveals that 85% of the replication initiation sites in mouse embryonic stem (ES) cells are associated with transcriptional units. Nearly half of the identified ORIs map at promoter regions and, interestingly, ORI density strongly correlates with promoter density, reflecting the coordinated organisation of replication and transcription in the mouse genome. Detailed analysis of ORI activity showed that CpG island promoter-ORIs are the most efficient ORIs in ES cells and both ORI specification and firing efficiency are maintained across cell types. Remarkably, the distribution of replication initiation sites at promoter-ORIs exactly parallels that of transcription start sites (TSS), suggesting a co-evolution of the regulatory regions driving replication and transcription. Moreover, we found that promoter-ORIs are significantly enriched in CAGE tags derived from early embryos relative to all promoters. This association implies that transcription initiation early in development sets the probability of ORI activation, unveiling a new hallmark in ORI efficiency regulation in mammalian cells

    Transcription Initiation Patterns Indicate Divergent Strategies for Gene Regulation at the Chromatin Level

    Get PDF
    The application of deep sequencing to map 5′ capped transcripts has confirmed the existence of at least two distinct promoter classes in metazoans: “focused” promoters with transcription start sites (TSSs) that occur in a narrowly defined genomic span and “dispersed” promoters with TSSs that are spread over a larger window. Previous studies have explored the presence of genomic features, such as CpG islands and sequence motifs, in these promoter classes, but virtually no studies have directly investigated the relationship with chromatin features. Here, we show that promoter classes are significantly differentiated by nucleosome organization and chromatin structure. Dispersed promoters display higher associations with well-positioned nucleosomes downstream of the TSS and a more clearly defined nucleosome free region upstream, while focused promoters have a less organized nucleosome structure, yet higher presence of RNA polymerase II. These differences extend to histone variants (H2A.Z) and marks (H3K4 methylation), as well as insulator binding (such as CTCF), independent of the expression levels of affected genes. Notably, differences are conserved across mammals and flies, and they provide for a clearer separation of promoter architectures than the presence and absence of CpG islands or the occurrence of stalled RNA polymerase. Computational models support the stronger contribution of chromatin features to the definition of dispersed promoters compared to focused start sites. Our results show that promoter classes defined from 5′ capped transcripts not only reflect differences in the initiation process at the core promoter but also are indicative of divergent transcriptional programs established within gene-proximal nucleosome organization
    corecore