34 research outputs found
Bayesian transcriptome assembly
RNA sequencing allows for simultaneous transcript discovery and quantification, but reconstructing complete transcripts from such data remains difficult. Here, we introduce Bayesembler, a novel probabilistic method for transcriptome assembly built on a Bayesian model of the RNA sequencing process. Under this model, samples from the posterior distribution over transcripts and their abundance values are obtained using Gibbs sampling. By using the frequency at which transcripts are observed during sampling to select the final assembly, we demonstrate marked improvements in sensitivity and precision over state-of-the-art assemblers on both simulated and real data. Bayesembler is available at https://github.com/bioinformatics-centre/bayesembler. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-014-0501-4) contains supplementary material, which is available to authorized users
Recommended from our members
Genotyping structural variants in pangenome graphs using the vg toolkit.
Structural variants (SVs) remain challenging to represent and study relative to point mutations despite their demonstrated importance. We show that variation graphs, as implemented in the vg toolkit, provide an effective means for leveraging SV catalogs for short-read SV genotyping experiments. We benchmark vg against state-of-the-art SV genotypers using three sequence-resolved SV catalogs generated by recent long-read sequencing studies. In addition, we use assemblies from 12 yeast strains to show that graphs constructed directly from aligned de novo assemblies improve genotyping compared to graphs built from intermediate SV catalogs in the VCF format
Sequencing and de novo assembly of 150 genomes from Denmark as a population reference
Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark