461 research outputs found

    Large introns in relation to alternative splicing and gene evolution: a case study of Drosophila bruno-3

    Get PDF
    Background: Alternative splicing (AS) of maturing mRNA can generate structurally and functionally distinct transcripts from the same gene. Recent bioinformatic analyses of available genome databases inferred a positive correlation between intron length and AS. To study the interplay between intron length and AS empirically and in more detail, we analyzed the diversity of alternatively spliced transcripts (ASTs) in the Drosophila RNA-binding Bruno-3 (Bru-3) gene. This gene was known to encode thirteen exons separated by introns of diverse sizes, ranging from 71 to 41,973 nucleotides in D. melanogaster. Although Bru-3's structure is expected to be conducive to AS, only two ASTs of this gene were previously described. Results: Cloning of RT-PCR products of the entire ORF from four species representing three diverged Drosophila lineages provided an evolutionary perspective, high sensitivity, and long-range contiguity of splice choices currently unattainable by high-throughput methods. Consequently, we identified three new exons, a new exon fragment and thirty-three previously unknown ASTs of Bru-3. All exon-skipping events in the gene were mapped to the exons surrounded by introns of at least 800 nucleotides, whereas exons split by introns of less than 250 nucleotides were always spliced contiguously in mRNA. Cases of exon loss and creation during Bru-3 evolution in Drosophila were also localized within large introns. Notably, we identified a true de novo exon gain: exon 8 was created along the lineage of the obscura group from intronic sequence between cryptic splice sites conserved among all Drosophila species surveyed. Exon 8 was included in mature mRNA by the species representing all the major branches of the obscura group. To our knowledge, the origin of exon 8 is the first documented case of exonization of intronic sequence outside vertebrates. Conclusion: We found that large introns can promote AS via exon-skipping and exon turnover during evolution likely due to frequent errors in their removal from maturing mRNA. Large introns could be a reservoir of genetic diversity, because they have a greater number of mutable sites than short introns. Taken together, gene structure can constrain and/or promote gene evolution

    The Origins, Evolution, and Functional Potential of Alternative Splicing in Vertebrates

    Get PDF
    Alternative splicing (AS) has the potential to greatly expand the functional repertoire of mammalian transcriptomes. However, few variant transcripts have been characterized functionally, making it difficult to assess the contribution of AS to the generation of phenotypic complexity and to study the evolution of splicing patterns. We have compared the AS of 309 protein-coding genes in the human ENCODE pilot regions against their mouse orthologs in unprecedented detail, utilizing traditional transcriptomic and RNAseq data. The conservation status of every transcript has been investigated, and each functionally categorized as coding (separated into coding sequence [CDS] or nonsense-mediated decay [NMD] linked) or noncoding. In total, 36.7% of human and 19.3% of mouse coding transcripts are species specific, and we observe a 3.6 times excess of human NMD transcripts compared with mouse; in contrast to previous studies, the majority of species-specific AS is unlinked to transposable elements. We observe one conserved CDS variant and one conserved NMD variant per 2.3 and 11.4 genes, respectively. Subsequently, we identify and characterize equivalent AS patterns for 22.9% of these CDS or NMD-linked events in nonmammalian vertebrate genomes, and our data indicate that functional NMD-linked AS is more widespread and ancient than previously thought. Furthermore, although we observe an association between conserved AS and elevated sequence conservation, as previously reported, we emphasize that 30% of conserved AS exons display sequence conservation below the average score for constitutive exons. In conclusion, we demonstrate the value of detailed comparative annotation in generating a comprehensive set of AS transcripts, increasing our understanding of AS evolution in vertebrates. Our data supports a model whereby the acquisition of functional AS has occurred throughout vertebrate evolution and is considered alongside amino acid change as a key mechanism in gene evolution

    Comparative transcriptomics of pathogenic and non-pathogenic Listeria species

    Get PDF
    Comparative RNA-seq analysis of two related pathogenic and non-pathogenic bacterial strains reveals a hidden layer of divergence in the non-coding genome as well as conserved, widespread regulatory structures called ‘Excludons', which mediate regulation through long non-coding antisense RNAs

    How the other half lives: CRISPR-Cas's influence on bacteriophages

    Full text link
    CRISPR-Cas is a genetic adaptive immune system unique to prokaryotic cells used to combat phage and plasmid threats. The host cell adapts by incorporating DNA sequences from invading phages or plasmids into its CRISPR locus as spacers. These spacers are expressed as mobile surveillance RNAs that direct CRISPR-associated (Cas) proteins to protect against subsequent attack by the same phages or plasmids. The threat from mobile genetic elements inevitably shapes the CRISPR loci of archaea and bacteria, and simultaneously the CRISPR-Cas immune system drives evolution of these invaders. Here we highlight our recent work, as well as that of others, that seeks to understand phage mechanisms of CRISPR-Cas evasion and conditions for population coexistence of phages with CRISPR-protected prokaryotes.Comment: 24 pages, 8 figure

    Deep Transfer Learning on Satellite Imagery Improves Air Quality Estimates in Developing Nations

    Get PDF
    Urban air pollution is a public health challenge in low- and middle-income countries (LMICs). However, LMICs lack adequate air quality (AQ) monitoring infrastructure. A persistent challenge has been our inability to estimate AQ accurately in LMIC cities, which hinders emergency preparedness and risk mitigation. Deep learning-based models that map satellite imagery to AQ can be built for high-income countries (HICs) with adequate ground data. Here we demonstrate that a scalable approach that adapts deep transfer learning on satellite imagery for AQ can extract meaningful estimates and insights in LMIC cities based on spatiotemporal patterns learned in HIC cities. The approach is demonstrated for Accra in Ghana, Africa, with AQ patterns learned from two US cities, specifically Los Angeles and New York

    Alu-Alu Recombination Underlying the First Large Genomic Deletion in GlcNAc-Phosphotransferase Alpha/Beta (GNPTAB) Gene in a MLII Alpha/Beta Patient

    Get PDF
    Mucolipidosis type II α/β is a severe, autosomal recessive lysosomal storage disorder, caused by a defect in the GNPTAB gene that codes for the α/β subunits of the GlcNAc-phosphotransferase. To date, over 100 different mutations have been identified in MLII α/β patients, but no large deletions have been reported. Here we present the first case of a large homozygous intragenic GNPTAB gene deletion (c.3435-386_3602 + 343del897) encompassing exon 19, identified in a ML II α/β patient. Long-range PCR and sequencing methodologies were used to refine the characterization of this rearrangement, leading to the identification of a 21 bp repetitive motif in introns 18 and 19. Further analysis revealed that both the 5' and 3' breakpoints were located within highly homologous Alu elements (Alu-Sz in intron 18 and Alu-Sq2, in intron 19), suggesting that this deletion has probably resulted from Alu-Alu unequal homologous recombination. RT-PCR methods were used to further evaluate the consequences of the alteration for the processing of the mutant pre mRNA GNPTAB, revealing the production of three abnormal transcripts: one without exon 19 (p.Lys1146_Trp1201del); another with an additional loss of exon 20 (p.Arg1145Serfs*2), and a third in which exon 19 was substituted by a pseudoexon inclusion consisting of a 62 bp fragment from intron 18 (p.Arg1145Serfs*16). Interestingly, this 62 bp fragment corresponds to the Alu-Sz element integrated in intron 18.This represents the first description of a large deletion identified in the GNPTAB gene and contributes to enrich the knowledge on the molecular mechanisms underlying causative mutations in ML II.This work was supported by FCT - project PIC/IC/83252/2007 (http://alfa.fct.mctes.pt/). Coutinho MF and Quental S received grants from the FCT (SFRH/BD/48103/2008; SFRH/BPD/64025/2009)

    Simultaneous identification of long similar substrings in large sets of sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Sequence comparison faces new challenges today, with many complete genomes and large libraries of transcripts known. Gene annotation pipelines match these sequences in order to identify genes and their alternative splice forms. However, the software currently available cannot simultaneously compare sets of sequences as large as necessary especially if errors must be considered.</p> <p>Results</p> <p>We therefore present a new algorithm for the identification of almost perfectly matching substrings in very large sets of sequences. Its implementation, called ClustDB, is considerably faster and can handle 16 times more data than VMATCH, the most memory efficient exact program known today. ClustDB simultaneously generates large sets of exactly matching substrings of a given minimum length as seeds for a novel method of match extension with errors. It generates alignments of maximum length with a considered maximum number of errors within each overlapping window of a given size. Such alignments are not optimal in the usual sense but faster to calculate and often more appropriate than traditional alignments for genomic sequence comparisons, EST and full-length cDNA matching, and genomic sequence assembly. The method is used to check the overlaps and to reveal possible assembly errors for 1377 <it>Medicago truncatula </it>BAC-size sequences published at <url>http://www.medicago.org/genome/assembly_table.php?chr=1</url>.</p> <p>Conclusion</p> <p>The program ClustDB proves that window alignment is an efficient way to find long sequence sections of homogenous alignment quality, as expected in case of random errors, and to detect systematic errors resulting from sequence contaminations. Such inserts are systematically overlooked in long alignments controlled by only tuning penalties for mismatches and gaps.</p> <p>ClustDB is freely available for academic use.</p

    Aerosol chemistry, transport, and climatic implications during extreme biomass burning emissions over the Indo-Gangetic Plain

    Get PDF
    The large-scale emissions of airborne particulates from burning of agricultural residues particularly over the upper Indo-Gangetic Plain (IGP) have often been associated with frequent formation of haze, adverse health impacts, and modification in aerosol climatology and thereby aerosol impact on regional climate. In this study, short-term variations in aerosol climatology during extreme biomass burning emissions over the IGP were investigated. Size-segregated particulate concentration was initially measured and submicron particles (PM1.1) were found to dominate particulate mass within the fine mode (PM2.1). Particulate-bound water-soluble ions were mainly secondary in nature and primarily composed of sulfate and nitrate. There was evidence of gaseous NH3 dominating neutralization of acidic aerosol species (SO42−) in submicron particles, in contrast to crustal-dominating neutralization in coarser particulates. Diurnal variation in black carbon (BC) mass ratio was primarily influenced by regional meteorology, while gradual increase in BC concentration was consistent with the increase in Delta-C, referring to biomass burning emissions. The influence of biomass burning emissions was established using specific organic (levoglucosan), inorganic (K+ and NH4+), and satellite-based (UV aerosol index, UVAI) tracers. Levoglucosan was the most abundant species within submicron particles (649±177&thinsp;ng&thinsp;m−3), with a very high ratio (&gt;&thinsp;50) to other anhydrosugars, indicating exclusive emissions from burning of agriculture residues. Spatiotemporal distribution of aerosol and a few trace gases (CO and NO2) was evaluated using both spaceborne active and passive sensors. A significant increase in columnar aerosol loading (aerosol optical depth, AOD: 0.98) was evident, with the presence of absorbing aerosols (UVAI&thinsp;&gt;&thinsp;1.5) having low aerosol layer height ( ∼  1.5&thinsp;km). A strong intraseasonality in the aerosol cross-sectional altitudinal profile was even noted from CALIPSO, referring to the dominance of smoke and polluted continental aerosols across the IGP. A possible transport mechanism of biomass smoke was established using cluster analysis and concentration-weighted air mass back trajectories. Short-wave aerosol radiative forcing (ARF) was further simulated considering intraseasonality in aerosol properties, which resulted in a considerable increase in atmospheric ARF (135&thinsp;W&thinsp;m−2) and heating rate (4.3&thinsp;K&thinsp;day−1) during extreme biomass burning emissions compared to the non-dominating period (56&thinsp;W&thinsp;m−2, 1.8&thinsp;K&thinsp;day−1). Our analysis will be useful to improve understanding of short-term variation in aerosol chemistry over the IGP and to reduce uncertainties in regional aerosol–climate models.</p
    corecore