209 research outputs found

    Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

    Get PDF
    Bowtie: a new ultrafast memory-efficient tool for the alignment of short DNA sequence reads to large genomes

    Computational methods for transcriptome annotation and quantification using RNA-seq

    Get PDF
    High-throughput RNA sequencing (RNA-seq) promises a comprehensive picture of the transcriptome, allowing for the complete annotation and quantification of all genes and their isoforms across samples. Realizing this promise requires increasingly complex computational methods. These computational challenges fall into three main categories: (i) read mapping, (ii) transcriptome reconstruction and (iii) expression quantification. Here we explain the major conceptual and practical challenges, and the general classes of solutions for each category. Finally, we highlight the interdependence between these categories and discuss the benefits for different biological applications

    Improving RNA-Seq expression estimates by correcting for fragment bias

    Get PDF
    The biochemistry of RNA-Seq library preparation results in cDNA fragments that are not uniformly distributed within the transcripts they represent. This non-uniformity must be accounted for when estimating expression levels, and we show how to perform the needed corrections using a likelihood based approach. We find improvements in expression estimates as measured by correlation with independently performed qRT-PCR and show that correction of bias leads to improved replicability of results across libraries and sequencing technologies

    TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions

    Get PDF
    TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat

    RNase-mediated protein footprint sequencing reveals protein-binding sites throughout the human transcriptome

    Get PDF
    Although numerous approaches have been developed to map RNA-binding sites of individual RNA-binding proteins (RBPs), few methods exist that allow assessment of global RBP–RNA interactions. Here, we describe PIP-seq, a universal, high-throughput, ribonuclease-mediated protein footprint sequencing approach that reveals RNA-protein interaction sites throughout a transcriptome of interest. We apply PIP-seq to the HeLa transcriptome and compare binding sites found using different cross-linkers and ribonucleases. From this analysis, we identify numerous putative RBP-binding motifs, reveal novel insights into co-binding by RBPs, and uncover a significant enrichment for disease-associated polymorphisms within RBP interaction sites

    Computational methods for transcriptome annotation and quantification using RNA-seq

    Get PDF
    High-throughput RNA sequencing (RNA-seq) promises a comprehensive picture of the transcriptome, allowing for the complete annotation and quantification of all genes and their isoforms across samples. Realizing this promise requires increasingly complex computational methods. These computational challenges fall into three main categories: (i) read mapping, (ii) transcriptome reconstruction and (iii) expression quantification. Here we explain the major conceptual and practical challenges, and the general classes of solutions for each category. Finally, we highlight the interdependence between these categories and discuss the benefits for different biological applications

    Thyroid hormone regulates distinct paths to maturation in pigment cell lineages

    Get PDF
    Thyroid hormone (TH) regulates diverse developmental events and can drive disparate cellular outcomes. In zebrafish, TH has opposite effects on neural crest derived pigment cells of the adult stripe pattern, limiting melanophore population expansion, yet increasing yellow/orange xanthophore numbers. To learn how TH elicits seemingly opposite responses in cells having a common embryological origin, we analyzed individual transcriptomes from thousands of neural crest-derived cells, reconstructed developmental trajectories, identified pigment cell-lineage specific responses to TH, and assessed roles for TH receptors. We show that TH promotes maturation of both cell types but in distinct ways. In melanophores, TH drives terminal differentiation, limiting final cell numbers. In xanthophores, TH promotes accumulation of orange carotenoids, making the cells visible. TH receptors act primarily to repress these programs when TH is limiting. Our findings show how a single endocrine factor integrates very different cellular activities during the generation of adult form

    Single-cell transcriptomics reveals receptor transformations during olfactory neurogenesis

    Get PDF
    The sense of smell allows chemicals to be perceived as diverse scents. We used single neuron RNA-Sequencing (RNA-Seq) to explore developmental mechanisms that shape this ability as nasal olfactory neurons mature in mice. Most mature neurons expressed only one of the roughly 1000 odorant receptor genes (Olfrs) available, and that at high levels. However, many immature neurons expressed low levels of multiple Olfrs. Coexpressed Olfrs localized to overlapping zones of the nasal epithelium, suggesting regional biases, but not to single genomic loci. A single immature neuron could express Olfrs from up to seven different chromosomes. The mature state in which expression of Olfr genes is restricted to one per neuron emerges over a developmental progression that appears independent of neuronal activity requiring sensory transduction molecules

    Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks

    Get PDF
    Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ~1 h of hands-on time

    Differential analysis of gene regulation at transcript resolution with RNA-seq

    Get PDF
    Differential analysis of gene and transcript expression using high-throughput RNA sequencing (RNA-seq) is complicated by several sources of measurement variability and poses numerous statistical challenges. We present Cuffdiff 2, an algorithm that estimates expression at transcript-level resolution and controls for variability evident across replicate libraries. Cuffdiff 2 robustly identifies differentially expressed transcripts and genes and reveals differential splicing and promoter-preference changes. We demonstrate the accuracy of our approach through differential analysis of lung fibroblasts in response to loss of the developmental transcription factor HOXA1, which we show is required for lung fibroblast and HeLa cell cycle progression. Loss of HOXA1 results in significant expression level changes in thousands of individual transcripts, along with isoform switching events in key regulators of the cell cycle. Cuffdiff 2 performs robust differential analysis in RNA-seq experiments at transcript resolution, revealing a layer of regulation not readily observable with other high-throughput technologies
    • …
    corecore