116 research outputs found
MGMR: leveraging RNA-Seq population data to optimize expression estimation
<p>Abstract</p> <p>Background</p> <p>RNA-Seq is a technique that uses Next Generation Sequencing to identify transcripts and estimate transcription levels. When applying this technique for quantification, one must contend with reads that align to multiple positions in the genome (multireads). Previous efforts to resolve multireads have shown that RNA-Seq expression estimation can be improved using probabilistic allocation of reads to genes. These methods use a probabilistic generative model for data generation and resolve ambiguity using likelihood-based approaches. In many instances, RNA-seq experiments are performed in the context of a population. The generative models of current methods do not take into account such population information, and it is an open question whether this information can improve quantification of the individual samples</p> <p>Results</p> <p>In order to explore the contribution of population level information in RNA-seq quantification, we apply a hierarchical probabilistic generative model, which assumes that expression levels of different individuals are sampled from a Dirichlet distribution with parameters specific to the population, and reads are sampled from the distribution of expression levels. We introduce an optimization procedure for the estimation of the model parameters, and use HapMap data and simulated data to demonstrate that the model yields a significant improvement in the accuracy of expression levels of paralogous genes.</p> <p>Conclusions</p> <p>We provide a proof of principal of the benefit of drawing on population commonalities to estimate expression. The results of our experiments demonstrate this approach can be beneficial, primarily for estimation at the gene level.</p
ExpressionPlot: a web-based framework for analysis of RNA-Seq and microarray gene expression data
RNA-Seq and microarray platforms have emerged as important tools for detecting changes in gene expression and RNA processing in biological samples. We present ExpressionPlot, a software package consisting of a default back end, which prepares raw sequencing or Affymetrix microarray data, and a web-based front end, which offers a biologically centered interface to browse, visualize, and compare different data sets. Download and installation instructions, a user's manual, discussion group, and a prototype are available at http://expressionplot.com/ webcite.ALS Therapy Allianc
Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
<p>Abstract</p> <p>Background</p> <p>High-throughput sequencing technologies, such as the Illumina Genome Analyzer, are powerful new tools for investigating a wide range of biological and medical questions. Statistical and computational methods are key for drawing meaningful and accurate conclusions from the massive and complex datasets generated by the sequencers. We provide a detailed evaluation of statistical methods for normalization and differential expression (DE) analysis of Illumina transcriptome sequencing (mRNA-Seq) data.</p> <p>Results</p> <p>We compare statistical methods for detecting genes that are significantly DE between two types of biological samples and find that there are substantial differences in how the test statistics handle low-count genes. We evaluate how DE results are affected by features of the sequencing platform, such as, varying gene lengths, base-calling calibration method (with and without phi X control lane), and flow-cell/library preparation effects. We investigate the impact of the read count normalization method on DE results and show that the standard approach of scaling by total lane counts (e.g., RPKM) can bias estimates of DE. We propose more general quantile-based normalization procedures and demonstrate an improvement in DE detection.</p> <p>Conclusions</p> <p>Our results have significant practical and methodological implications for the design and analysis of mRNA-Seq experiments. They highlight the importance of appropriate statistical methods for normalization and DE inference, to account for features of the sequencing platform that could impact the accuracy of results. They also reveal the need for further research in the development of statistical and computational methods for mRNA-Seq.</p
Characterization and Comparison of the Leukocyte Transcriptomes of Three Cattle Breeds
In this study, mRNA-Seq was used to characterize and compare the leukocyte transcriptomes from two taurine breeds (Holstein and Jersey), and one indicine breed (Cholistani). At the genomic level, we identified breed-specific base changes in protein coding regions. Among 7,793,425 coding bases, only 165 differed between Holstein and Jersey, and 3,383 (0.04%) differed between Holstein and Cholistani, 817 (25%) of which resulted in amino acid changes in 627 genes. At the transcriptional level, we assembled transcripts and estimated their abundances including those from more than 3,000 unannotated intergeneic regions. Differential gene expression analysis showed a high similarity between Holstein and Jersey, and a much greater difference between the taurine breeds and the indicine breed. We identified gene ontology pathways that were systematically altered, including the electron transport chain and immune response pathways that may contribute to different levels of heat tolerance and disease resistance in taurine and indicine breeds. At the post-transcriptional level, sequencing mRNA allowed us to identify a number of genes undergoing differential alternative splicing among different breeds. This study provided a high-resolution survey of the variation between bovine transcriptomes at different levels and may provide important biological insights into the phenotypic differentiation among cattle breeds
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
<p>Abstract</p> <p>Background</p> <p>RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments.</p> <p>Results</p> <p>We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene.</p> <p>Conclusions</p> <p>RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.</p
Whole Transcriptome Sequencing Reveals Gene Expression and Splicing Differences in Brain Regions Affected by Alzheimer's Disease
Recent studies strongly indicate that aberrations in the control of gene expression might contribute to the initiation and progression of Alzheimer's disease (AD). In particular, alternative splicing has been suggested to play a role in spontaneous cases of AD. Previous transcriptome profiling of AD models and patient samples using microarrays delivered conflicting results. This study provides, for the first time, transcriptomic analysis for distinct regions of the AD brain using RNA-Seq next-generation sequencing technology. Illumina RNA-Seq analysis was used to survey transcriptome profiles from total brain, frontal and temporal lobe of healthy and AD post-mortem tissue. We quantified gene expression levels, splicing isoforms and alternative transcript start sites. Gene Ontology term enrichment analysis revealed an overrepresentation of genes associated with a neuron's cytological structure and synapse function in AD brain samples. Analysis of the temporal lobe with the Cufflinks tool revealed that transcriptional isoforms of the apolipoprotein E gene, APOE-001, -002 and -005, are under the control of different promoters in normal and AD brain tissue. We also observed differing expression levels of APOE-001 and -002 splice variants in the AD temporal lobe. Our results indicate that alternative splicing and promoter usage of the APOE gene in AD brain tissue might reflect the progression of neurodegeneration
Divergence of the Yeast Transcription Factor FZF1 Affects Sulfite Resistance
Changes in gene expression are commonly observed during evolution. However, the phenotypic consequences of expression divergence are frequently unknown and difficult to measure. Transcriptional regulators provide a mechanism by which phenotypic divergence can occur through multiple, coordinated changes in gene expression during development or in response to environmental changes. Yet, some changes in transcriptional regulators may be constrained by their pleiotropic effects on gene expression. Here, we use a genome-wide screen for promoters that are likely to have diverged in function and identify a yeast transcription factor, FZF1, that has evolved substantial differences in its ability to confer resistance to sulfites. Chimeric alleles from four Saccharomyces species show that divergence in FZF1 activity is due to changes in both its coding and upstream noncoding sequence. Between the two closest species, noncoding changes affect the expression of FZF1, whereas coding changes affect the expression of SSU1, a sulfite efflux pump activated by FZF1. Both coding and noncoding changes also affect the expression of many other genes. Our results show how divergence in the coding and promoter region of a transcription factor alters the response to an environmental stress
Muon reconstruction and identification efficiency in ATLAS using the full Run 2 pp collision data set at \sqrt{s}=13 TeV
This article documents the muon reconstruction and identification efficiency obtained by the ATLAS experiment for 139 \hbox {fb}^{-1} of pp collision data at \sqrt{s}=13 TeV collected between 2015 and 2018 during Run 2 of the LHC. The increased instantaneous luminosity delivered by the LHC over this period required a reoptimisation of the criteria for the identification of prompt muons. Improved and newly developed algorithms were deployed to preserve high muon identification efficiency with a low misidentification rate and good momentum resolution. The availability of large samples of Z\rightarrow \mu \mu and J/\psi \rightarrow \mu \mu decays, and the minimisation of systematic uncertainties, allows the efficiencies of criteria for muon identification, primary vertex association, and isolation to be measured with an accuracy at the per-mille level in the bulk of the phase space, and up to the percent level in complex kinematic configurations. Excellent performance is achieved over a range of transverse momenta from 3 GeV to several hundred GeV, and across the full muon detector acceptance of |\eta |<2.7
- …