22 research outputs found

    RNA-Seq analysis of splicing in Plasmodium falciparum uncovers new splice junctions, alternative splicing and splicing of antisense transcripts.

    Get PDF
    Over 50% of genes in Plasmodium falciparum, the deadliest human malaria parasite, contain predicted introns, yet experimental characterization of splicing in this organism remains incomplete. We present here a transcriptome-wide characterization of intraerythrocytic splicing events, as captured by RNA-Seq data from four timepoints of a single highly synchronous culture. Gene model-independent analysis of these data in conjunction with publically available RNA-Seq data with HMMSplicer, an in-house developed splice site detection algorithm, revealed a total of 977 new 5' GU-AG 3' and 5 new 5' GC-AG 3' junctions absent from gene models and ESTs (11% increase to the current annotation). In addition, 310 alternative splicing events were detected in 254 (4.5%) genes, most of which truncate open reading frames. Splicing events antisense to gene models were also detected, revealing complex transcriptional arrangements within the parasite's transcriptome. Interestingly, antisense introns overlap sense introns more than would be expected by chance, perhaps indicating a functional relationship between overlapping transcripts or an inherent organizational property of the transcriptome. Independent experimental validation confirmed over 30 new antisense and alternative junctions. Thus, this largest assemblage of new and alternative splicing events to date in Plasmodium falciparum provides a more precise, dynamic view of the parasite's transcriptome

    Recovery of divergent avian bornaviruses from cases of proventricular dilatation disease: Identification of a candidate etiologic agent

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Proventricular dilatation disease (PDD) is a fatal disorder threatening domesticated and wild psittacine birds worldwide. It is characterized by lymphoplasmacytic infiltration of the ganglia of the central and peripheral nervous system, leading to central nervous system disorders as well as disordered enteric motility and associated wasting. For almost 40 years, a viral etiology for PDD has been suspected, but to date no candidate etiologic agent has been reproducibly linked to the disease.</p> <p>Results</p> <p>Analysis of 2 PDD case-control series collected independently on different continents using a pan-viral microarray revealed a bornavirus hybridization signature in 62.5% of the PDD cases (5/8) and none of the controls (0/8). Ultra high throughput sequencing was utilized to recover the complete viral genome sequence from one of the virus-positive PDD cases. This revealed a bornavirus-like genome organization for this agent with a high degree of sequence divergence from all prior bornavirus isolates. We propose the name avian bornavirus (ABV) for this agent. Further specific ABV PCR analysis of an additional set of independently collected PDD cases and controls yielded a significant difference in ABV detection rate among PDD cases (71%, n = 7) compared to controls (0%, n = 14) (P = 0.01; Fisher's Exact Test). Partial sequence analysis of a total of 16 ABV isolates we have now recovered from these and an additional set of cases reveals at least 5 distinct ABV genetic subgroups.</p> <p>Conclusion</p> <p>These studies clearly demonstrate the existence of an avian reservoir of remarkably diverse bornaviruses and provide a compelling candidate in the search for an etiologic agent of PDD.</p

    HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data

    Get PDF
    Background: High-throughput sequencing of an organism’s transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown. Methodology/Principal Findings: Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity. Conclusions/Significance: HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on prebuilt gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6 % of 39 splice sites and 1.4% of 59 splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available a

    The Long March: A Sample Preparation Technique that Enhances Contig Length and Coverage by High-Throughput Short-Read Sequencing

    Get PDF
    High-throughput short-read technologies have revolutionized DNA sequencing by drastically reducing the cost per base of sequencing information. Despite producing gigabases of sequence per run, these technologies still present obstacles in resequencing and de novo assembly applications due to biased or insufficient target sequence coverage. We present here a simple sample preparation method termed the “long march” that increases both contig lengths and target sequence coverage using high-throughput short-read technologies. By incorporating a Type IIS restriction enzyme recognition motif into the sequencing primer adapter, successive rounds of restriction enzyme cleavage and adapter ligation produce a set of nested sub-libraries from the initial amplicon library. Sequence reads from these sub-libraries are offset from each other with enough overlap to aid assembly and contig extension. We demonstrate the utility of the long march in resequencing of the Plasmodium falciparum transcriptome, where the number of genomic bases covered was increased by 39%, as well as in metagenomic analysis of a serum sample from a patient with hepatitis B virus (HBV)-related acute liver failure, where the number of HBV bases covered was increased by 42%. We also offer a theoretical optimization of the long march for de novo sequence assembly

    Strand-Specific RNA-seq Applied to Malaria Samples

    No full text
    Over the past few years only, next-generation sequencing technologies became accessible and many applications were rapidly derived, such as the development of RNA-seq, a technique that uses deep sequencing to profile whole transcriptomes. RNA-seq has the power to discover new transcripts and splicing variants, single nucleotide variations, fusion genes, and mRNA levels-based expression profiles. Preparing RNA-seq libraries can be delicate and usually obligates buying expensive kits that require large amounts of stating materials. The method presented here is flexible and cost-effective. Using this method, we prepared high quality strand-specific RNA-seq libraries from RNA extracted from the human malaria parasite Plasmodium falciparum. The libraries are compatible with Illumina®'s sequencers Genome Analyzer and Hi-Seq. The method can however be easily adapted to other platforms

    Datasets.

    No full text
    <p>*The 48-bp reads in the NCBI SRA set have a 2 bp initial barcode that was trimmed, resulting in 46 bp reads.</p><p>Datasets used for benchmark tests. For <i>H. sapiens</i> and <i>P. falciparum</i>, two times are given for TopHat. For <i>H. sapiens</i>, the longer time is with more sensitive settings, but the shorter time resulted in less than 5% fewer junctions at a similar specificity. For <i>P. falciparum</i>, the longer time is with more sensitive but less stringent settings whereas the shorter time is for the more stringent settings that resulted in significantly fewer junctions but with a much higher specificity.</p

    HMM Parameter Values.

    No full text
    <p>The initial and trained values for the HMM. The first two columns (“1→2” and “2→1”) show the probability of transitioning from State 1 to State 2 and the reverse. The probability of transitioning from State 2 to State 1 is fixed at 0 (indicating a 100% probability of remaining in State 2). For each state, the probability of a match at each quality bin is reported. The initial values were used to validate the HMM. HMMSplicer uses Initial Value Set 2, though the initial values do not impact the final trained values (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0013875#pone-0013875-g002" target="_blank">Figure 2b</a>). The trained values are shown for each dataset analyzed. The Human values are the same as those shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0013875#pone-0013875-g001" target="_blank">Figure 1</a>, though in more detail.</p
    corecore