78,813 research outputs found

    Minimap2: pairwise alignment for nucleotide sequences

    Full text link
    Motivation: Recent advances in sequencing technologies promise ultra-long reads of \sim100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results: Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of \ge100bp in length, \ge1kb genomic reads at error rate \sim15%, full-length noisy Direct RNA or cDNA reads, and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions (INDELs) and introduces new heuristics to reduce spurious alignments. It is 3-4 times faster than mainstream short-read mappers at comparable accuracy and \ge30 times faster at higher accuracy for both genomic and mRNA reads, surpassing most aligners specialized in one type of alignment. Availability and implementation: https://github.com/lh3/minimap2 Contact: [email protected]: The final submitted versio

    Partial bisulfite conversion for unique template sequencing

    Get PDF
    We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone

    Single cell transcriptome analysis using next generation sequencing.

    Get PDF
    The heterogeneity of tissues, especially in cancer research, is a central issue in transcriptome analysis. In recent years, research has primarily focused on the development of methods for single cell analysis. Single cell analysis aims at gaining (novel) insights into biological processes of healthy and diseased cells. Some of the challenges in transcriptome analysis concern low abundance of sample starting material, necessary sample amplification steps and subsequent analysis. In this study, two fundamentally different approaches to amplification were compared using next-generation sequencing analysis: I. exponential amplification using polymerase-chain-reaction (PCR) and II. linear amplification. For both approaches, protocols for single cell extraction, cell lysis, cDNA synthesis, cDNA amplification and preparation of next-generation sequencing libraries were developed. We could successfully show that transcriptome analysis of low numbers of cells is feasible with both exponential and linear amplification. Using exponential amplification, the highest amplification rates up to 106 were possible. The reproducibility of results is a strength of the linear amplification method. The analysis of next generation sequencing data in single cell samples showed detectable expression in at least 16.000 genes. The variance between samples results in a need to work with a greater amount of biological replicates. In summary it can be said that single cell transcriptome analysis with next generation sequencing is possible but improvements leading to a higher yield of transcriptome reads is required. In the near future by comparing single cancer cells with healthy ones for example, a basis for improved prognosis and diagnosis can be realised

    Capturing the ‘ome’ : the expanding molecular toolbox for RNA and DNA library construction

    Get PDF
    All sequencing experiments and most functional genomics screens rely on the generation of libraries to comprehensively capture pools of targeted sequences. In the past decade especially, driven by the progress in the field of massively parallel sequencing, numerous studies have comprehensively assessed the impact of particular manipulations on library complexity and quality, and characterized the activities and specificities of several key enzymes used in library construction. Fortunately, careful protocol design and reagent choice can substantially mitigate many of these biases, and enable reliable representation of sequences in libraries. This review aims to guide the reader through the vast expanse of literature on the subject to promote informed library generation, independent of the application

    Digital gene expression analysis of the zebra finch genome

    Get PDF
    Background: In order to understand patterns of adaptation and molecular evolution it is important to quantify both variation in gene expression and nucleotide sequence divergence. Gene expression profiling in non-model organisms has recently been facilitated by the advent of massively parallel sequencing technology. Here we investigate tissue specific gene expression patterns in the zebra finch (Taeniopygia guttata) with special emphasis on the genes of the major histocompatibility complex (MHC). Results: Almost 2 million 454-sequencing reads from cDNA of six different tissues were assembled and analysed. A total of 11,793 zebra finch transcripts were represented in this EST data, indicating a transcriptome coverage of about 65%. There was a positive correlation between the tissue specificity of gene expression and non-synonymous to synonymous nucleotide substitution ratio of genes, suggesting that genes with a specialised function are evolving at a higher rate (or with less constraint) than genes with a more general function. In line with this, there was also a negative correlation between overall expression levels and expression specificity of contigs. We found evidence for expression of 10 different genes related to the MHC. MHC genes showed relatively tissue specific expression levels and were in general primarily expressed in spleen. Several MHC genes, including MHC class I also showed expression in brain. Furthermore, for all genes with highest levels of expression in spleen there was an overrepresentation of several gene ontology terms related to immune function. Conclusions: Our study highlights the usefulness of next-generation sequence data for quantifying gene expression in the genome as a whole as well as in specific candidate genes. Overall, the data show predicted patterns of gene expression profiles and molecular evolution in the zebra finch genome. Expression of MHC genes in particular, corresponds well with expression patterns in other vertebrates

    Single genome sequencing of near full-length HIV-1 RNA using a limiting dilution approach

    Get PDF
    Sequencing very long stretches of the HIV-1 genome can advance studies on virus evolution and in vivo recombination but remains technically challenging. We developed an efficient procedure to sequence near full-length HIV-1 RNA using a two-amplicon approach. The whole genome was successfully amplified for 107 (88%) of 121 plasma samples including samples from patients infected with HIV-1 subtype A1, B, C, D, F1, G, H, CRF01_AE and CRF02_AG. For the 17 samples with a viral load below 1000 c/ml and the 104 samples with a viral load above 1000 c/ml, the amplification efficiency was respectively 53% and 94%. The sensitivity of the method was further evaluated using limiting dilution of RNA extracted from a plasma pool containing an equimolar mixture of three HIV-1 subtypes (B, C and CRF02_AG) and diluted before and after cDNA generation. Both RNA and cDNA dilution showed comparable sensitivity and equal accuracy in reflecting the subtype distribution of the plasma pool. One single event of in vitro recombination was detected amongst the 41 sequences obtained after cDNA dilution but no indications for in vitro recombination were found after RNA dilution. In conclusion, a two-amplicon strategy and limiting dilution of viral RNA followed by reverse transcription, nested PCR and Sanger sequencing, allows near full genome sequencing of individual HIV-1 RNA molecules. This method will be a valuable tool in the study of virus evolution and recombination
    corecore