6,259 research outputs found
Deep proteogenomics; high throughput gene validation by multidimensional liquid chromatography and mass spectrometry of proteins from the fungal wheat pathogen Stagonospora nodorum
BACKGROUND: Stagonospora nodorum, a fungal ascomycete in the class dothideomycetes, is a
damaging pathogen of wheat. It is a model for necrotrophic fungi that cause necrotic symptoms via
the interaction of multiple effector proteins with cultivar-specific receptors. A draft genome
sequence and annotation was published in 2007. A second-pass gene prediction using a training set
of 795 fully EST-supported genes predicted a total of 10762 version 2 nuclear-encoded genes, with
an additional 5354 less reliable version 1 genes also retained.
RESULTS: In this study, we subjected soluble mycelial proteins to proteolysis followed by 2D LC
MALDI-MS/MS. Comparison of the detected peptides with the gene models validated 2134 genes.
62% of these genes (1324) were not supported by prior EST evidence. Of the 2134 validated genes,
all but 188 were version 2 annotations. Statistical analysis of the validated gene models revealed a
preponderance of cytoplasmic and nuclear localised proteins, and proteins with intracellularassociated
GO terms. These statistical associations are consistent with the source of the peptides
used in the study. Comparison with a 6-frame translation of the S. nodorum genome assembly
confirmed 905 existing gene annotations (including 119 not previously confirmed) and provided
evidence supporting 144 genes with coding exon frameshift modifications, 604 genes with
extensions of coding exons into annotated introns or untranslated regions (UTRs), 3 new gene
annotations which were supported by tblastn to NR, and 44 potential new genes residing within
un-assembled regions of the genome.
CONCLUSION: We conclude that 2D LC MALDI-MS/MS is a powerful, rapid and economical tool to
aid in the annotation of fungal genomic assemblies
Methods to study splicing from high-throughput RNA Sequencing data
The development of novel high-throughput sequencing (HTS) methods for RNA
(RNA-Seq) has provided a very powerful mean to study splicing under multiple
conditions at unprecedented depth. However, the complexity of the information
to be analyzed has turned this into a challenging task. In the last few years,
a plethora of tools have been developed, allowing researchers to process
RNA-Seq data to study the expression of isoforms and splicing events, and their
relative changes under different conditions. We provide an overview of the
methods available to study splicing from short RNA-Seq data. We group the
methods according to the different questions they address: 1) Assignment of the
sequencing reads to their likely gene of origin. This is addressed by methods
that map reads to the genome and/or to the available gene annotations. 2)
Recovering the sequence of splicing events and isoforms. This is addressed by
transcript reconstruction and de novo assembly methods. 3) Quantification of
events and isoforms. Either after reconstructing transcripts or using an
annotation, many methods estimate the expression level or the relative usage of
isoforms and/or events. 4) Providing an isoform or event view of differential
splicing or expression. These include methods that compare relative
event/isoform abundance or isoform expression across two or more conditions. 5)
Visualizing splicing regulation. Various tools facilitate the visualization of
the RNA-Seq data in the context of alternative splicing. In this review, we do
not describe the specific mathematical models behind each method. Our aim is
rather to provide an overview that could serve as an entry point for users who
need to decide on a suitable tool for a specific analysis. We also attempt to
propose a classification of the tools according to the operations they do, to
facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde
The importance of being divisible by three in alternative splicing
Alternative splicing events that are conserved in orthologous genes in different species are commonly viewed as reliable evidence of authentic, functionally significant alternative splicing events. Several recent bioinformatic analyses have shown that conserved alternative exons possess several features that distinguish them from alternative exons that are species-specific. One of the most striking differences between conserved and species-specific alternative exons is the high percentage of exons that preserve the reading frame (exons whose length is an exact multiple of 3, termed symmetrical exons) among the conserved alternative exons. Here, we examined conserved alternative exons and found several features that differentiate between symmetrical and non-symmetrical alternative exons. We show that symmetrical alternative exons have a strong tendency not to disrupt protein domain structures, whereas the tendency of non-symmetrical alternative exons to overlap with different fractions of protein domains is similar to that of constitutive exons. Additionally, skipping isoforms of non-symmetrical alternative exons are strongly underrepresented, compared with their including isoforms, suggesting that skipping of a large fraction of non-symmetrical alternative exons produces transcripts that are degraded by the nonsense-mediated mRNA decay mechanism. Non-symmetrical alternative exons also show a tendency to reside in the 5ā² half of the CDS. These findings suggest that alternative splicing of symmetrical and non-symmetrical exons is governed by different selective pressures and serves different purposes
Tissue resolved, gene structure refined equine transcriptome.
BackgroundTranscriptome interpretation relies on a good-quality reference transcriptome for accurate quantification of gene expression as well as functional analysis of genetic variants. The current annotation of the horse genome lacks the specificity and sensitivity necessary to assess gene expression especially at the isoform level, and suffers from insufficient annotation of untranslated regions (UTR) usage. We built an annotation pipeline for horse and used it to integrate 1.9 billion reads from multiple RNA-seq data sets into a new refined transcriptome.ResultsThis equine transcriptome integrates eight different tissues from 59 individuals and improves gene structure and isoform resolution, while providing considerable tissue-specific information. We utilized four levels of transcript filtration in our pipeline, aimed at producing several transcriptome versions that are suitable for different downstream analyses. Our most refined transcriptome includes 36,876 genes and 76,125 isoforms, with 6474 candidate transcriptional loci novel to the equine transcriptome.ConclusionsWe have employed a variety of descriptive statistics and figures that demonstrate the quality and content of the transcriptome. The equine transcriptomes that are provided by this pipeline show the best tissue-specific resolution of any equine transcriptome to date and are flexible for several downstream analyses. We encourage the integration of further equine transcriptomes with our annotation pipeline to continue and improve the equine transcriptome
RNA-Seq analysis of splicing in Plasmodium falciparum uncovers new splice junctions, alternative splicing and splicing of antisense transcripts.
Over 50% of genes in Plasmodium falciparum, the deadliest human malaria parasite, contain predicted introns, yet experimental characterization of splicing in this organism remains incomplete. We present here a transcriptome-wide characterization of intraerythrocytic splicing events, as captured by RNA-Seq data from four timepoints of a single highly synchronous culture. Gene model-independent analysis of these data in conjunction with publically available RNA-Seq data with HMMSplicer, an in-house developed splice site detection algorithm, revealed a total of 977 new 5' GU-AG 3' and 5 new 5' GC-AG 3' junctions absent from gene models and ESTs (11% increase to the current annotation). In addition, 310 alternative splicing events were detected in 254 (4.5%) genes, most of which truncate open reading frames. Splicing events antisense to gene models were also detected, revealing complex transcriptional arrangements within the parasite's transcriptome. Interestingly, antisense introns overlap sense introns more than would be expected by chance, perhaps indicating a functional relationship between overlapping transcripts or an inherent organizational property of the transcriptome. Independent experimental validation confirmed over 30 new antisense and alternative junctions. Thus, this largest assemblage of new and alternative splicing events to date in Plasmodium falciparum provides a more precise, dynamic view of the parasite's transcriptome
Novel deletions causing pseudoxanthoma elasticum underscore the genomic instability of the ABCC6 region
Mutations in ABCC6 cause pseudoxanthoma elasticum (PXE), a heritable disease that affects elastic fibers. Thus far, >200 mutations have been characterized by various PCR-based techniques (primarily direct sequencing), identifying up to 90% of PXE-causing alleles. This study wanted to assess the importance of deletions and insertions in the ABCC6 genomic region, which is known to have a high recombinational potential. To detect ABCC6 deletions/insertions, which can be missed by direct sequencing, multiplex ligation-dependent probe amplification (MLPA) was applied in PXE patients with an incomplete genotype. MLPA was performed in 35 PXE patients with at least one unidentified mutant allele after exonic sequencing and exclusion of the recurrent exon 23-29 deletion. Six multi-exon deletions and four single-exon deletions were detected. Using MLPA in addition to sequencing, we expanded the ABCC6 mutation spectrum with 9 novel deletions and characterized 25% of unidentified disease alleles. Our results further illustrate the instability of the ABCC6 genomic region and stress the importance of screening for deletions in the molecular diagnosis of PXE. Journal of Human Genetics (2010) 55, 112-117; doi: 10.1038/jhg.2009.132; published online 15 January 201
Identify Alternative Splicing Events Based on Position-Specific Evolutionary Conservation
The evolution of eukaryotes is accompanied by the increased complexity of alternative splicing which greatly expands genome information. One of the greatest challenges in the post-genome era is a complete revelation of human transcriptome with consideration of alternative splicing. Here, we introduce a comparative genomics approach to systemically identify alternative splicing events based on the differential evolutionary conservation between exons and introns and the high-quality annotation of the ENCODE regions. Specifically, we focus on exons that are included in some transcripts but are completely spliced out for others and we call them conditional exons. First, we characterize distinguishing features among conditional exons, constitutive exons and introns. One of the most important features is the position-specific conservation score. There are dramatic differences in conservation scores between conditional exons and constitutive exons. More importantly, the differences are position-specific. For flanking intronic regions, the differences between conditional exons and constitutive exons are also position-specific. Using the Random Forests algorithm, we can classify conditional exons with high specificities (97% for the identification of conditional exons from intron regions and 95% for the classification of known exons) and fair sensitivities (64% and 32% respectively). We applied the method to the human genome and identified 39,640 introns that actually contain conditional exons and classified 8,813 conditional exons from the current RefSeq exon list. Among those, 31,673 introns containing conditional exons and 5,294 conditional exons classified from known exons cannot be inferred from RefSeq, UCSC or Ensembl annotations. Some of these de novo predictions were experimentally verified
- ā¦