465 research outputs found

    mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications

    Get PDF
    Cataloged from PDF version of article.High throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for processing and downstream analysis. While tools that report the 'best' mapping location of each read provide a fast way to process HTS data, they are not suitable for many types of downstream analysis such as structural variation detection, where it is important to report multiple mapping loci for each read. For this purpose we introduce mrsFAST-Ultra, a fast, cache oblivious, SNP-aware aligner that can handle the multi-mapping of HTS reads very efficiently. mrsFAST-Ultra improves mrsFAST, our first cache oblivious read aligner capable of handling multi-mapping reads, through new and compact index structures that reduce not only the overall memory usage but also the number of CPU operations per alignment. In fact the size of the index generated by mrsFAST-Ultra is 10 times smaller than that of mrsFAST. As importantly, mrsFAST-Ultra introduces new features such as being able to (i) obtain the best mapping loci for each read, and (ii) return all reads that have at most n mapping loci (within an error threshold), together with these loci, for any user specified n. Furthermore, mrsFAST-Ultra is SNP-aware, i.e. it can map reads to reference genome while discounting the mismatches that occur at common SNP locations provided by db-SNP; this significantly increases the number of reads that can be mapped to the reference genome. Notice that all of the above features are implemented within the index structure and are not simple post-processing steps and thus are performed highly efficiently. Finally, mrsFAST-Ultra utilizes multiple available cores and processors and can be tuned for various memory settings. Our results show that mrsFAST-Ultra is roughly five times faster than its predecessor mrsFAST. In comparison to newly enhanced popular tools such as Bowtie2, it is more sensitive (it can report 10 times or more mappings per read) and much faster (six times or more) in the multi-mapping mode. Furthermore, mrsFAST-Ultra has an index size of 2GB for the entire human reference genome, which is roughly half of that of Bowtie2. mrsFAST-Ultra is open source and it can be accessed at http://mrsfast.sourceforge.net

    Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery

    Get PDF
    Recent years have witnessed an increase in research activity for the detection of structural variants (SVs) and their association to human disease. The advent of next-generation sequencing technologies make it possible to extend the scope of structural variation studies to a point previously unimaginable as exemplified by the 1000 Genomes Project. Although various computational methods have been described for the detection of SVs, no such algorithm is yet fully capable of discovering transposon insertions, a very important class of SVs to the study of human evolution and disease. In this article, we provide a complete and novel formulation to discover both loci and classes of transposons inserted into genomes sequenced with high-throughput sequencing technologies. In addition, we also present ‘conflict resolution’ improvements to our earlier combinatorial SV detection algorithm (VariationHunter) by taking the diploid nature of the human genome into consideration. We test our algorithms with simulated data from the Venter genome (HuRef) and are able to discover >85% of transposon insertion events with precision of >90%. We also demonstrate that our conflict resolution algorithm (denoted as VariationHunter-CR) outperforms current state of the art (such as original VariationHunter, BreakDancer and MoDIL) algorithms when tested on the genome of the Yoruba African individual (NA18507)

    Dissect: detection and characterization of novel structural alterations in transcribed sequences

    Get PDF
    Motivation: Computational identification of genomic structural variants via high-throughput sequencing is an important problem for which a number of highly sophisticated solutions have been recently developed. With the advent of high-throughput transcriptome sequencing (RNA-Seq), the problem of identifying structural alterations in the transcriptome is now attracting significant attention

    On the evolution of superposition of squeezed displaced number states with the multiphoton Jaynes-Cummings model

    Full text link
    In this paper we discuss the quantum properties for superposition of squeezed displaced number states against multiphoton Jaynes-Cummings model (JCM). In particular, we investigate atomic inversion, photon-number distribution, purity, quadrature squeezing, Mandel QQ parameter and Wigner function. We show that the quadrature squeezing for three-photon absorption case can exhibit revivals and collapses typical to those occurring in the atomic inversion for one-photon absorption case. Also we prove that for odd number absorption parameter there is a connection between the evolution of the atomic inversion and the evolution of the Wigner function at the origin in phase space. Furthermore, we show that the nonclassical states whose the Wigner functions values at the origins are negative will be always nonclassical when they are evolving through the JCM with even absorption parameter. Also we demonstrate that various types of cat states can be generated via this system.Comment: 27 pages, 10 figure

    Robustness of Massively Parallel Sequencing Platforms

    Get PDF
    The improvements in high throughput sequencing technologies (HTS) made clinical sequencing projects such as ClinSeq and Genomics England feasible. Although there are significant improvements in accuracy and reproducibility of HTS based analyses, the usability of these types of data for diagnostic and prognostic applications necessitates a near perfect data generation. To assess the usability of a widely used HTS platform for accurate and reproducible clinical applications in terms of robustness, we generated whole genome shotgun (WGS) sequence data from the genomes of two human individuals in two different genome sequencing centers. After analyzing the data to characterize SNPs and indels using the same tools (BWA, SAMtools, and GATK), we observed significant number of discrepancies in the call sets. As expected, the most of the disagreements between the call sets were found within genomic regions containing common repeats and segmental duplications, albeit only a small fraction of the discordant variants were within the exons and other functionally relevant regions such as promoters. We conclude that although HTS platforms are sufficiently powerful for providing data for first-pass clinical tests, the variant predictions still need to be confirmed using orthogonal methods before using in clinical applications

    Developing surrogate markers for predicting antibiotic resistance "hot spots" in rivers where limited data are available

    Get PDF
    Pinpointing environmental antibiotic resistance (AR) hot spots in low-and middle-income countries (LMICs) is hindered by a lack of available and comparable AR monitoring data relevant to such settings. Addressing this problem, we performed a comprehensive spatial and seasonal assessment of water quality and AR conditions in a Malaysian river catchment to identify potential "simple"surrogates that mirror elevated AR. We screened for resistant coliforms, 22 antibiotics, 287 AR genes and integrons, and routine water quality parameters, covering absolute concentrations and mass loadings. To understand relationships, we introduced standardized "effect sizes"(Cohen's D) for AR monitoring to improve comparability of field studies. Overall, water quality generally declined and environmental AR levels increased as one moved down the catchment without major seasonal variations, except total antibiotic concentrations that were higher in the dry season (Cohen's D > 0.8, P < 0.05). Among simple surrogates, dissolved oxygen (DO) most strongly correlated (inversely) with total AR gene concentrations (Spearman's ρ 0.81, P < 0.05). We suspect this results from minimally treated sewage inputs, which also contain AR bacteria and genes, depleting DO in the most impacted reaches. Thus, although DO is not a measure of AR, lower DO levels reflect wastewater inputs, flagging possible AR hot spots. DO measurement is inexpensive, already monitored in many catchments, and exists in many numerical water quality models (e.g., oxygen sag curves). Therefore, we propose combining DO data and prospective modeling to guide local interventions, especially in LMIC rivers with limited data
    corecore