190,404 research outputs found
MSIQ: Joint Modeling of Multiple RNA-seq Samples for Accurate Isoform Quantification
Next-generation RNA sequencing (RNA-seq) technology has been widely used to
assess full-length RNA isoform abundance in a high-throughput manner. RNA-seq
data offer insight into gene expression levels and transcriptome structures,
enabling us to better understand the regulation of gene expression and
fundamental biological processes. Accurate isoform quantification from RNA-seq
data is challenging due to the information loss in sequencing experiments. A
recent accumulation of multiple RNA-seq data sets from the same tissue or cell
type provides new opportunities to improve the accuracy of isoform
quantification. However, existing statistical or computational methods for
multiple RNA-seq samples either pool the samples into one sample or assign
equal weights to the samples when estimating isoform abundance. These methods
ignore the possible heterogeneity in the quality of different samples and could
result in biased and unrobust estimates. In this article, we develop a method,
which we call "joint modeling of multiple RNA-seq samples for accurate isoform
quantification" (MSIQ), for more accurate and robust isoform quantification by
integrating multiple RNA-seq samples under a Bayesian framework. Our method
aims to (1) identify a consistent group of samples with homogeneous quality and
(2) improve isoform quantification accuracy by jointly modeling multiple
RNA-seq samples by allowing for higher weights on the consistent group. We show
that MSIQ provides a consistent estimator of isoform abundance, and we
demonstrate the accuracy and effectiveness of MSIQ compared with alternative
methods through simulation studies on D. melanogaster genes. We justify MSIQ's
advantages over existing approaches via application studies on real RNA-seq
data from human embryonic stem cells, brain tissues, and the HepG2 immortalized
cell line
Comparison of reproducibility, accuracy, sensitivity, and specificity of miRNA quantification platforms
Given the increasing interest in their use as disease biomarkers, the establishment of reproducible, accurate, sensitive, and specific platforms for microRNA (miRNA) quantification in biofluids is of high priority. We compare four platforms for these characteristics: small RNA sequencing (RNA-seq), FirePlex, EdgeSeq, and nCounter. For a pool of synthetic miRNAs, coefficients of variation for technical replicates are lower for EdgeSeq (6.9%) and RNA-seq (8.2%) than for FirePlex (22.4%); nCounter replicates are not performed. Receiver operating characteristic analysis for distinguishing present versus absent miRNAs shows small RNA-seq (area under curve 0.99) is superior to EdgeSeq (0.97), nCounter (0.94), and FirePlex (0.81). Expected differences in expression of placenta-associated miRNAs in plasma from pregnant and non-pregnant women are observed with RNA-seq and EdgeSeq, but not FirePlex or nCounter. These results indicate that differences in performance among miRNA profiling platforms impact ability to detect biological differences among samples and thus their relative utility for research and clinical use
Near-optimal RNA-Seq quantification
We present a novel approach to RNA-Seq quantification that is near optimal in
speed and accuracy. Software implementing the approach, called kallisto, can be
used to analyze 30 million unaligned paired-end RNA-Seq reads in less than 5
minutes on a standard laptop computer while providing results as accurate as
those of the best existing tools. This removes a major computational bottleneck
in RNA-Seq analysis.Comment: - Added some results (paralog analysis, allele specific expression
analysis, alignment comparison, accuracy analysis with TPMs) - Switched
bootstrap analysis to human sample from SEQC-MAQCIII - Provided link to a
snakefile that allows for reproducibility of all results and figures in the
pape
PLIT: An alignment-free computational tool for identification of long non-coding RNAs in plant transcriptomic datasets
Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs which play a significant role in several biological processes. RNA-seq based transcriptome sequencing has been extensively used for identification of lncRNAs. However, accurate identification of lncRNAs in RNA-seq datasets is crucial for exploring their characteristic functions in the genome as most coding potential computation (CPC) tools fail to accurately identify them in transcriptomic data. Well-known CPC tools such as CPC2, lncScore, CPAT are primarily designed for prediction of lncRNAs based on the GENCODE, NONCODE and CANTATAdb databases. The prediction accuracy of these tools often drops when tested on transcriptomic datasets. This leads to higher false positive results and inaccuracy in the function annotation process. In this study, we present a novel tool, PLIT, for the identification of lncRNAs in plants RNA-seq datasets. PLIT implements a feature selection method based on L1 regularization and iterative Random Forests (iRF) classification for selection of optimal features. Based on sequence and codon-bias features, it classifies the RNA-seq derived FASTA sequences into coding or long non-coding transcripts. Using L1 regularization, 31 optimal features were obtained based on lncRNA and protein-coding transcripts from 8 plant species. The performance of the tool was evaluated on 7 plant RNA-seq datasets using 10-fold cross-validation. The analysis exhibited superior accuracy when evaluated against currently available state-of-the-art CPC tools
Gene expression and splicing alterations analyzed by high throughput RNA sequencing of chronic lymphocytic leukemia specimens.
BackgroundTo determine differentially expressed and spliced RNA transcripts in chronic lymphocytic leukemia specimens a high throughput RNA-sequencing (HTS RNA-seq) analysis was performed.MethodsTen CLL specimens and five normal peripheral blood CD19+ B cells were analyzed by HTS RNA-seq. The library preparation was performed with Illumina TrueSeq RNA kit and analyzed by Illumina HiSeq 2000 sequencing system.ResultsAn average of 48.5 million reads for B cells, and 50.6 million reads for CLL specimens were obtained with 10396 and 10448 assembled transcripts for normal B cells and primary CLL specimens respectively. With the Cuffdiff analysis, 2091 differentially expressed genes (DEG) between B cells and CLL specimens based on FPKM (fragments per kilobase of transcript per million reads and false discovery rate, FDR q < 0.05, fold change >2) were identified. Expression of selected DEGs (n = 32) with up regulated and down regulated expression in CLL from RNA-seq data were also analyzed by qRT-PCR in a test cohort of CLL specimens. Even though there was a variation in fold expression of DEG genes between RNA-seq and qRT-PCR; more than 90 % of analyzed genes were validated by qRT-PCR analysis. Analysis of RNA-seq data for splicing alterations in CLL and B cells was performed by Multivariate Analysis of Transcript Splicing (MATS analysis). Skipped exon was the most frequent splicing alteration in CLL specimens with 128 significant events (P-value <0.05, minimum inclusion level difference >0.1).ConclusionThe RNA-seq analysis of CLL specimens identifies novel DEG and alternatively spliced genes that are potential prognostic markers and therapeutic targets. High level of validation by qRT-PCR for a number of DEG genes supports the accuracy of this analysis. Global comparison of transcriptomes of B cells, IGVH non-mutated CLL (U-CLL) and mutated CLL specimens (M-CLL) with multidimensional scaling analysis was able to segregate CLL and B cell transcriptomes but the M-CLL and U-CLL transcriptomes were indistinguishable. The analysis of HTS RNA-seq data to identify alternative splicing events and other genetic abnormalities specific to CLL is an added advantage of RNA-seq that is not feasible with other genome wide analysis
RNA-seq transcriptional profiling of peripheral blood leukocytes from cattle infected with Mycobacterium bovis
Bovine tuberculosis, caused by infection with Mycobacterium bovis, is a major endemic disease affecting cattle populations worldwide, despite the implementation of stringent surveillance and control programs in many countries. The development of high-throughput functional genomics technologies, including gene expression microarrays and RNA-sequencing (RNA-seq), has enabled detailed analysis of the host transcriptome to M. bovis infection, particularly at the macrophage and peripheral blood level. In the present study, we have analyzed the peripheral blood leukocyte (PBL) transcriptome of eight natural M. bovis-infected and eight age- and sex-matched non-infected control Holstein-Friesian animals using RNA-seq. In addition, we compared gene expression profiles generated using RNA-seq with those previously generated using the high-density Affymetrix(®) GeneChip(®) Bovine Genome Array platform from the same PBL-extracted RNA. A total of 3,250 differentially expressed (DE) annotated genes were detected in the M. bovis-infected samples relative to the controls (adjusted P-value ≤0.05), with the number of genes displaying decreased relative expression (1,671) exceeding those with increased relative expression (1,579). Ingenuity(®) Systems Pathway Analysis (IPA) of all DE genes revealed enrichment for genes with immune function. Notably, transcriptional suppression was observed among several of the top-ranking canonical pathways including Leukocyte Extravasation Signaling. Comparative platform analysis demonstrated that RNA-seq detected a larger number of annotated DE genes (3,250) relative to the microarray (1,398), of which 917 genes were common to both technologies and displayed the same direction of expression. Finally, we show that RNA-seq had an increased dynamic range compared to the microarray for estimating differential gene expression
An observation of circular RNAs in bacterial RNA-seq data
Circular RNAs (circRNAs) are a class of RNA with an important role in micro
RNA (miRNA) regulation recently discovered in Human and various other
eukaryotes as well as in archaea. Here, we have analyzed RNA-seq data obtained
from {\it Enterococcus faecalis} and {\it Escherichia coli} in a way similar to
previous studies performed on eukaryotes. We report observations of circRNAs in
RNA-seq data that are reproducible across multiple experiments performed with
different protocols or growth conditions
- …
