99 research outputs found

    Complexity driven evolution of Alternative splicing

    Get PDF
    Based on the animal model of agonistic interactions, we observed co-varied (linked) alternative exons (LEs) in the genes with alternative splicing phenotype in brain. As a result, we have found 263 positively co-varied pairs, and 26 pairs with negative co-variation. To ascertain the data consistency, we employed three organisms cross-validation: human, mouse and rat with available hippocampus brain region SRA repositories, which supported the co-varied effect of the corresponding exons. From 142 genes with LE events the maximum LE pairs were observed in insulin – related Sorbs1 (Sorbin And SH3 Domain Containing 1; 18 LE AS events), and synaptic Nrcam (12 LE events). 104 genes maintain only 1 LE pair and 36 genes maintain 2-7 LE pairs. Notably there is a mode at 3 LE pairs per gene (14 genes) in genes vs LE events distribution. GO analysis reveals that the majority of genes maintaining LE events have belong to the synaptic genes, RNA-splicing machinery, and chromatin remodeling. The ‘complexity’ (entropic) measure of gene is calculated as Σ = − i 1,n ψ log 2(ψ ) , where (Ψ) psi is a percent inclusion rate of a particular AS exon, n – number of AS exons in the gene. It is evident that linked AS exons decrease gene complexity rate [3], allowing coordinated splicing in high splicing dynamics rate genes, such as synaptic, RNA processing, chromatin remodeling genes. Herein we speculate if LE AS events are of evolutionary advantage for the high splicing turnover genes working in homeostasis equilibrium. Next step of the work is to elucidate features providing the linking phenomenon, including mRNA secondary structure, the splicing factor binding sites within and around the corresponding exons. We will present the results on the issue featuring some complex interactions between exons.Book of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 202

    Long-term trends in evolution of indels in protein sequences

    Get PDF
    BACKGROUND: In this paper we describe an analysis of the size evolution of both protein domains and their indels, as inferred by changing sizes of whole domains or individual unaligned regions or "spacers". We studied relatively early evolutionary events and focused on protein domains which are conserved among various taxonomy groups. RESULTS: We found that more than one third of all domains have a statistically significant tendency to increase/decrease in size in evolution as judged from the overall domain size distribution as well as from the size distribution of individual spacers. Moreover, the fraction of domains and individual spacers increasing in size is almost twofold larger than the fraction decreasing in size. CONCLUSION: We showed that the tolerance to insertion and deletion events depends on the domain's taxonomy span. Eukaryotic domains are depleted in insertions compared to the overall test set, namely, the number of spacers increasing in size is about the same as the number of spacers decreasing in size. On the other hand, ancient domain families show some bias towards insertions or spacers which grow in size in evolution. Domains from several Gene Ontology categories also demonstrate certain tendencies for insertion or deletion events as inferred from the analysis of spacer sizes

    Conservation versus parallel gains in intron evolution

    Get PDF
    Orthologous genes from distant eukaryotic species, e.g. animals and plants, share up to 25–30% intron positions. However, the relative contributions of evolutionary conservation and parallel gain of new introns into this pattern remain unknown. Here, the extent of independent insertion of introns in the same sites (parallel gain) in orthologous genes from phylogenetically distant eukaryotes is assessed within the framework of the protosplice site model. It is shown that protosplice sites are no more conserved during evolution of eukaryotic gene sequences than random sites. Simulation of intron insertion into protosplice sites with the observed protosplice site frequencies and intron densities shows that parallel gain can account but for a small fraction (5–10%) of shared intron positions in distantly related species. Thus, the presence of numerous introns in the same positions in orthologous genes from distant eukaryotes, such as animals, fungi and plants, appears to reflect mostly bona fide evolutionary conservation

    Mutational hotspots in the TP53 gene and, possibly, other tumor suppressors evolve by positive selection

    Get PDF
    BACKGROUND: The mutation spectra of the TP53 gene and other tumor suppressors contain multiple hotspots, i.e., sites of non-random, frequent mutation in tumors and/or the germline. The origin of the hotspots remains unclear, the general view being that they represent highly mutable nucleotide contexts which likely reflect effects of different endogenous and exogenous factors shaping the mutation process in specific tissues. The origin of hotspots is of major importance because it has been suggested that mutable contexts could be used to infer mechanisms of mutagenesis contributing to tumorigenesis. RESULTS: Here we apply three independent tests, accounting for non-uniform base compositions in synonymous and non-synonymous sites, to test whether the hotspots emerge via selection or due to mutational bias. All three tests consistently indicate that the hotspots in the TP53 gene evolve, primarily, via positive selection. The results were robust to the elimination of the highly mutable CpG dinucleotides. By contrast, only one, the least conservative test reveals the signature of positive selection in BRCA1, BRCA2, and p16. Elucidation of the origin of the hotspots in these genes requires more data on somatic mutations in tumors. CONCLUSION: The results of this analysis seem to indicate that positive selection for gain-of-function in tumor suppressor genes is an important aspect of tumorigenesis, blurring the distinction between tumor suppressors and oncogenes. REVIEWERS: This article was reviewed by Sandor Pongor, Christopher Lee and Mikhail Blagosklonny

    Evolutionary conservation suggests a regulatory function of AUG triplets in 50 -UTRs of eukaryotic genes

    Get PDF
    By comparing sequences of human, mouse and rat orthologous genes, we show that in 50 -untranslated regions (50 -UTRs) of mammalian cDNAs but not in 30 - UTRs or coding sequences, AUG is conserved to a significantly greater extent than any of the other 63 nt triplets. This effect is likely to reflect, primarily, bona fide evolutionary conservation, rather than cDNA annotation artifacts, because the excess of conserved upstream AUGs (uAUGs) is seen in 50 -UTRs containing stop codons in-frame with the start AUG and many of the conserved AUGs are found in different frames, consistent with the location in authentic non-coding sequences. Altogether, conserved uAUGs are present in at least 20–30% of mammalian genes. Qualitatively similar results were obtained by comparison of orthologous genes from different species of the yeast genus Saccharomyces. Together with the observation that mammalian and yeast 50 -UTRs are significantly depleted in overall AUG content, these findings suggest that AUG triplets in 50 -UTRs are subject to the pressure of purifying selection in two opposite directions: the uAUGs that have no specific function tend to be deleterious and get eliminated during evolution, whereas those uAUGs thatdoserveafunctionareconserved.Mostprobably, the principal role of the conserved uAUGs is attenuation of translation at the initiation stage, which is often additionally regulated by alternative splicing in the mammalian 50 -UTRs. Consistent with this hypothesis, we found that open reading frames starting from conserved uAUGs are significantly shorter than those starting from non-conserved uAUGs, possibly, owing to selection for optimization of the level of attenuation

    Evolutionary conservation suggests a regulatory function of AUG triplets in 5′-UTRs of eukaryotic genes

    Get PDF
    By comparing sequences of human, mouse and rat orthologous genes, we show that in 5′-untranslated regions (5′-UTRs) of mammalian cDNAs but not in 3′-UTRs or coding sequences, AUG is conserved to a significantly greater extent than any of the other 63 nt triplets. This effect is likely to reflect, primarily, bona fide evolutionary conservation, rather than cDNA annotation artifacts, because the excess of conserved upstream AUGs (uAUGs) is seen in 5′-UTRs containing stop codons in-frame with the start AUG and many of the conserved AUGs are found in different frames, consistent with the location in authentic non-coding sequences. Altogether, conserved uAUGs are present in at least 20–30% of mammalian genes. Qualitatively similar results were obtained by comparison of orthologous genes from different species of the yeast genus Saccharomyces. Together with the observation that mammalian and yeast 5′-UTRs are significantly depleted in overall AUG content, these findings suggest that AUG triplets in 5′-UTRs are subject to the pressure of purifying selection in two opposite directions: the uAUGs that have no specific function tend to be deleterious and get eliminated during evolution, whereas those uAUGs that do serve a function are conserved. Most probably, the principal role of the conserved uAUGs is attenuation of translation at the initiation stage, which is often additionally regulated by alternative splicing in the mammalian 5′-UTRs. Consistent with this hypothesis, we found that open reading frames starting from conserved uAUGs are significantly shorter than those starting from non-conserved uAUGs, possibly, owing to selection for optimization of the level of attenuation

    Protein composition of interband regions in polytene and cell line chromosomes of Drosophila melanogaster

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Despite many efforts, little is known about distribution and interactions of chromatin proteins which contribute to the specificity of chromomeric organization of interphase chromosomes. To address this issue, we used publicly available datasets from several recent Drosophila genome-wide mapping and annotation projects, in particular, those from modENCODE project, and compared molecular organization of 13 interband regions which were accurately mapped previously.</p> <p>Results</p> <p>Here we demonstrate that in interphase chromosomes of <it>Drosophila </it>cell lines, the interband regions are enriched for a specific set of proteins generally characteristic of the "open" chromatin (RNA polymerase II, CHRIZ (CHRO), BEAF-32, BRE1, dMI-2, GAF, NURF301, WDS and TRX). These regions also display reduced nucleosome density, histone H1 depletion and pronounced enrichment for ORC2, a pre-replication complex component. Within the 13 interband regions analyzed, most were around 3-4 kb long, particularly those where many of said protein features were present. We estimate there are about 3500 regions with similar properties in chromosomes of <it>D. melanogaster </it>cell lines, which fits quite well the number of cytologically observed interbands in salivary gland polytene chromosomes.</p> <p>Conclusions</p> <p>Our observations suggest strikingly similar organization of interband chromatin in polytene chromosomes and in chromosomes from cell lines thereby reflecting the existence of a universal principle of interphase chromosome organization.</p

    The contribution of exon-skipping events on chromosome 22 to protein coding diversity

    Get PDF
    Completion of the human genome sequence provides evidence for a gene count with lower bound 30,000–40,000. Significant protein complexity may derive in part from multiple transcript isoforms. Recent EST based studies have revealed that alternate transcription, including alternative splicing, polyadenylation and transcription start sites, occurs within at least 30–40% of human genes. Transcript form surveys have yet to integrate the genomic context, expression, frequency, and contribution to protein diversity of isoform variation. We determine here the degree to which protein coding diversity may be influenced by alternate expression of transcripts by exhaustive manual confirmation of genome sequence annotation, and comparison to available transcript data to accurately associate skipped exon isoforms with genomic sequence. Relative expression levels of transcripts are estimated from EST database representation. The rigorous in silico method accurately identifies exon skipping using verified genome sequence. 545 genes have been studied in this first hand-curated assessment of exon skipping on chromosome 22
    corecore