Search CORE

33 research outputs found

RNA-Seq analysis of splicing in Plasmodium falciparum uncovers new splice junctions, alternative splicing and splicing of antisense transcripts.

Author: DeRisi Joseph L
Dimon Michelle T
Sorber Katherine
Publication venue: eScholarship, University of California
Publication date: 17/01/2011
Field of study

Over 50% of genes in Plasmodium falciparum, the deadliest human malaria parasite, contain predicted introns, yet experimental characterization of splicing in this organism remains incomplete. We present here a transcriptome-wide characterization of intraerythrocytic splicing events, as captured by RNA-Seq data from four timepoints of a single highly synchronous culture. Gene model-independent analysis of these data in conjunction with publically available RNA-Seq data with HMMSplicer, an in-house developed splice site detection algorithm, revealed a total of 977 new 5' GU-AG 3' and 5 new 5' GC-AG 3' junctions absent from gene models and ESTs (11% increase to the current annotation). In addition, 310 alternative splicing events were detected in 254 (4.5%) genes, most of which truncate open reading frames. Splicing events antisense to gene models were also detected, revealing complex transcriptional arrangements within the parasite's transcriptome. Interestingly, antisense introns overlap sense introns more than would be expected by chance, perhaps indicating a functional relationship between overlapping transcripts or an inherent organizational property of the transcriptome. Independent experimental validation confirmed over 30 new antisense and alternative junctions. Thus, this largest assemblage of new and alternative splicing events to date in Plasmodium falciparum provides a more precise, dynamic view of the parasite's transcriptome

PubMed Central

eScholarship - University of California

ReCombine: A Suite of Programs for Detection and Analysis of Meiotic Recombination in Whole-Genome Datasets

Author: Ashwini Oke
B Langmead
Carol M. Anderson
DD Perkins
E Mancera
E Mancera
E Martini
EA Winzeler
FW Stahl
GA Cromie
H Li
H Zhao
HP Papazian
Illumina
J Qi
J van Oeveren
Jennifer C. Fung
JH McCusker
JM Cherry
Joseph L. DeRisi
JW Szostak
K Sorber
Michael Lichten
Michelle T. Dimon
MS McPeek
Q Zhao
R Bourgon
R Li
R Li
S Kurtz
Stacy Y. Chen
SY Chen
T de los Santos
T Hassold
W Wei
Z Ning
Publication venue: Public Library of Science
Publication date: 25/10/2011
Field of study

In meiosis, the exchange of DNA between chromosomes by homologous recombination is a critical step that ensures proper chromosome segregation and increases genetic diversity. Products of recombination include reciprocal exchanges, known as crossovers, and non-reciprocal gene conversions or non-crossovers. The mechanisms underlying meiotic recombination remain elusive, largely because of the difficulty of analyzing large numbers of recombination events by traditional genetic methods. These traditional methods are increasingly being superseded by high-throughput techniques capable of surveying meiotic recombination on a genome-wide basis. Next-generation sequencing or microarray hybridization is used to genotype thousands of polymorphic markers in the progeny of hybrid yeast strains. New computational tools are needed to perform this genotyping and to find and analyze recombination events. We have developed a suite of programs, ReCombine, for using short sequence reads from next-generation sequencing experiments to genotype yeast meiotic progeny. Upon genotyping, the program CrossOver, a component of ReCombine, then detects recombination products and classifies them into categories based on the features found at each location and their distribution among the various chromatids. CrossOver is also capable of analyzing segregation data from microarray experiments or other sources. This package of programs is designed to allow even researchers without computational expertise to use high-throughput, whole-genome methods to study the molecular mechanisms of meiotic recombination

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data

Author: A Ameur
A Mortazavi
B Langmead
BT Wilhelm
C Sidrauski
C Trapnell
C Trapnell
Cynthia Gibas
D Ramsköld
DA Benson
DW Bryant
ET Wang
F De Bona
F Lu
GA Heap
GE Crooks
H Li
H Li
H Nagasaki
H Richard
H Yoshida
JC Dohm
Joseph L. DeRisi
JS Cox
K Sorber
Katherine Sorber
KD Pruitt
KF Au
L Baum
M Deutsch
M Yano
MC Wahl
Michelle T. Dimon
MJ Gardner
PJ Shepard
Q Pan
R Li
R Lister
S Sen
S Stamm
TW Nilsen
U Nagalakshmi
WJ Kent
WJ Kent
Z Wang
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: High-throughput sequencing of an organism’s transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown. Methodology/Principal Findings: Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity. Conclusions/Significance: HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on prebuilt gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6 % of 39 splice sites and 1.4% of 59 splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available a

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

RNA-seq analyses of blood-induced changes in gene expression in the mosquito vector species, Aedes aegypti

Abstract Background Hematophagy is a common trait of insect vectors of disease. Extensive genome-wide transcriptional changes occur in mosquitoes after blood meals, and these are related to digestive and reproductive processes, among others. Studies of these changes are expected to reveal molecular targets for novel vector control and pathogen transmission-blocking strategies. The mosquito <it>Aedes aegypti </it>(Diptera, Culicidae), a vector of Dengue viruses, Yellow Fever Virus (YFV) and Chikungunya virus (CV), is the subject of this study to look at genome-wide changes in gene expression following a blood meal. Results Transcriptional changes that follow a blood meal in <it>Ae. aegypti </it>females were explored using RNA-seq technology. Over 30% of more than 18,000 investigated transcripts accumulate differentially in mosquitoes at five hours after a blood meal when compared to those fed only on sugar. Forty transcripts accumulate only in blood-fed mosquitoes. The list of regulated transcripts correlates with an enhancement of digestive activity and a suppression of environmental stimuli perception and innate immunity. The alignment of more than 65 million high-quality short reads to the <it>Ae. aegypti </it>reference genome permitted the refinement of the current annotation of transcript boundaries, as well as the discovery of novel transcripts, exons and splicing variants. <it>Cis</it>-regulatory elements (CRE) and <it>cis</it>-regulatory modules (CRM) enriched significantly at the 5'end flanking sequences of blood meal-regulated genes were identified. Conclusions This study provides the first global view of the changes in transcript accumulation elicited by a blood meal in <it>Ae. aegypti </it>females. This information permitted the identification of classes of potentially co-regulated genes and a description of biochemical and physiological events that occur immediately after blood feeding. The data presented here serve as a basis for novel vector control and pathogen transmission-blocking strategies including those in which the vectors are modified genetically to express anti-pathogen effector molecules.</p

Crossref

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

IMSA: Integrated metagenomic sequence analysis for identification of exogenous reads in a host genomic background

Author: AA Pragman
AD Kostic
AJ Saldanha
AL Kistler
B Langmead
C Conway
C Runckel
D Hernandez
DA Benson
G Grard
H Feng
H Feng
Henry M. Wood
J Cheval
J Handelsman
J Yang
JC Lagier
Mark R. Liles
MB Eisen
MD Stenglein
Michelle T. Dimon
Pamela H. Rabbitts
Sarah T. Arron
SF Altschul
Z Lin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Metagenomics, the study of microbial genomes within diverse environments, is a rapidly developing field. The identification of microbial sequences within a host organism enables the study of human intestinal, respiratory, and skin microbiota, and has allowed the identification of novel viruses in diseases such as Merkel cell carcinoma. There are few publicly available tools for metagenomic high throughput sequence analysis. We present Integrated Metagenomic Sequence Analysis (IMSA), a flexible, fast, and robust computational analysis pipeline that is available for public use. IMSA takes input sequence from high throughput datasets and uses a user-defined host database to filter out host sequence. IMSA then aligns the filtered reads to a user-defined universal database to characterize exogenous reads within the host background. IMSA assigns a score to each node of the taxonomy based on read frequency, and can output this as a taxonomy report suitable for cluster analysis or as a taxonomy map (TaxMap). IMSA also outputs the specific sequence reads assigned to a taxon of interest for downstream analysis. We demonstrate the use of IMSA to detect pathogens and normal flora within sequence data from a primary human cervical cancer carrying HPV16, a primary human cutaneous squamous cell carcinoma carrying HPV 16, the CaSki cell line carrying HPV16, and the HeLa cell line carrying HPV18

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

White Rose Research Online

Datasets.

Author: Joseph L. DeRisi (14837)
Katherine Sorber (269886)
Michelle T. Dimon (199476)
Publication venue
Publication date: 20/02/2013
Field of study

*The 48-bp reads in the NCBI SRA set have a 2 bp initial barcode that was trimmed, resulting in 46 bp reads.Datasets used for benchmark tests. For H. sapiens and P. falciparum, two times are given for TopHat. For H. sapiens, the longer time is with more sensitive settings, but the shorter time resulted in less than 5% fewer junctions at a similar specificity. For P. falciparum, the longer time is with more sensitive but less stringent settings whereas the shorter time is for the more stringent settings that resulted in significantly fewer junctions but with a much higher specificity.</p

FigShare

Simulation results.

Author: Joseph L. DeRisi (14837)
Katherine Sorber (269886)
Michelle T. Dimon (199476)
Publication venue
Publication date
Field of study

(a) Results for HMMSplicer and TopHat for 50 and 75 bp reads. Although values are similar at higher coverage levels, HMMSplicer exhibits substantial increases in sensitivity at lower coverage levels. (b) ROC curve for the 50 bp simulation results at 1×, 10×, and 50× coverage demonstrates that HMMSplicer's scoring algorithm accurately discriminates between true and false junctions. The number in parentheses is the area under the curve for each coverage level.</p

FigShare

XBP1 non-canonical intron.

Author: Joseph L. DeRisi (14837)
Katherine Sorber (269886)
Michelle T. Dimon (199476)
Publication venue
Publication date
Field of study

HMMSplicer discovers the non-canonical XBP1 intron. HMMSplicer identifies three reads containing the non-canonical CA-AG splice site in XBP1. Because the reads are fairly evenly split, both read-halves aligned to the genome. The edges identified by HMMSplicer are 2 and 4 bp off from the actual splice site because the sequence at the beginning of the intron repeats the sequence at the beginning of the subsequent exon. When identical junctions are collapsed, there are two junctions, one with a score of 1024 and one with a score of 1030, which puts them in the top 0.5% of the collapsed non-canonical junctions.</p

FigShare

HMMSplicer pipeline.

Author: Joseph L. DeRisi (14837)
Katherine Sorber (269886)
Michelle T. Dimon (199476)
Publication venue
Publication date
Field of study

After removing reads that have full-length alignments to the genome, reads are divided in half and aligned to the genome (step 1 as defined in the <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0013875#s4" target="_blank">Materials and Methods</a>). The HMM is trained using a subset of the read-half alignments (step 2a). The HMM bins quality scores into five levels. Although only three levels are shown in this overview for simplification, the values for all five levels can be found in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0013875#pone-0013875-t001" target="_blank">Table 1</a>. The trained HMM is then used to determine the splice position within each read-half alignment (step 2b). The remaining second piece of the read is then matched downstream to find the other intron edge (step 3). The initial set of splice junctions then proceed to rescue (step 4) and filter and collapse (step 5) to generate the final set of splice junctions.</p

FigShare