591 research outputs found
Fast and accurate read mapping with approximate seeds and multiple backtracking
We present Masai, a read mapper representing the state-of-the-art in terms of speed and accuracy. Our tool is an order of magnitude faster than RazerS 3 and mrFAST, 2-4 times faster and more accurate than Bowtie 2 and BWA. The novelties of our read mapper are filtration with approximate seeds and a method for multiple backtracking. Approximate seeds, compared with exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds by taking advantage of the repetitiveness of next-generation sequencing data. Combined together, these two methods significantly speed up approximate search on genomic data sets. Masai is implemented in C++ using the SeqAn library. The source code is distributed under the BSD license and binaries for Linux, Mac OS X and Windows can be freely downloaded from http://www.seqan.de/projects/masai
RazerS 3: Faster, fully sensitive read mapping
Motivation: During the last years NGS sequencing has become a key technology for many applications in the biomedical sciences. Throughput continues to increase and new protocols provide longer reads than currently available. In almost all applications, read mapping is a first step. Hence, it is crucial to have algorithms and implementations that perform fast, with high sensitivity, and are able to deal with long reads and a large absolute number of indels.
Results: RazerS is a read mapping program with adjustable sensitivity based on counting q-grams. In this work we propose the successor RazerS 3 which now supports shared-memory parallelism, an additional seed-based filter with adjustable sensitivity, a much faster, banded version of the Myers’ bit-vector algorithm for verification, memory saving measures and support for the SAM output format. This leads to a much improved performance for mapping reads, in particular long reads with many errors. We extensively compare RazerS 3 with other popular read mappers and show that its results are often superior to them in terms of sensitivity while exhibiting practical and often competetive run times. In addition, RazerS 3 works without a precomputed index.
Availability and Implementation: Source code and binaries are freely available for download at http://www.seqan.de/projects/razers. RazerS 3 is implemented in C++ and OpenMP under a GPL license using the SeqAn library and supports Linux, Mac OS X, and Windows
RazerS - Fast Read Mapping with Sensitivity Control
Second-generation sequencing technologies deliver DNA sequence data at unprecedented high throughput. Common to most biological applications is a mapping of the reads to an almost identical or highly similar reference genome. Due to the large amounts of data, efficient algorithms and implementations are crucial for this task. We present an efficient read mapping tool called RazerS. It allows the user to align sequencing reads of arbitrary length using either the Hamming distance or the edit distance. Our tool can work either lossless or with a user-defined loss rate at higher speeds. Given the loss rate, we present an approach that guarantees not to lose more reads than specified. This enables the user to adapt to the problem at hand and provides a seamless tradeoff between sensitivity and running time
Segment-based multiple sequence alignment
Motivation: Many multiple sequence alignment tools have been developed in the past, progressing either in speed or alignment accuracy. Given
the importance and wide-spread use of alignment tools, progress in
both categories is a contribution to the community and has driven
research in the field so far. Results: We introduce a graph-based
extension to the consistency-based, progressive alignment strategy.
We apply the consistency notion to segments instead of single characters.
The main problem we solve in this context is to define segments of
the sequences in such a way that a graph-based alignment is possible.
We implemented the algorithm using the SeqAn library and report results
on amino acid and DNA sequences. The benefit of our approach is threefold:
(1) sequences with conserved blocks can be rapidly aligned, (2) the
implementation is conceptually easy, generic and fast and (3) the
consistency idea can be extended to align multiple genomic sequences.
Availability: The segment-based multiple sequence alignment tool
can be downloaded from http://www.seqan.de/projects/msa.html. A novel
version of T-Coffee interfaced with the tool is available from http://www.tcoffee.org.
The usage of the tool is described in both documentations. Contact:
[email protected]
Recommended from our members
PROPERTIES OF CP: COEFFICIENT OF THERMAL EXPANSION, DECOMPOSITION KINETICS, AND REACTION TO SPARK, FRICTION AND IMPACT
The properties of pentaamine (5-cyano-2H-tetrazolato-N2) cobalt (III) perchlorate (CP), which was first synthesized in 1968, continues to be of interest for predicting behavior in handling, shipping, aging, and thermal cook-off situations. We report coefficient of thermal expansion (CTE) values over four specific temperature ranges, decomposition kinetics using linear and isothermal heating, and the reaction to three different types of stimuli: impact, spark, and friction. The CTE was measured using a Thermal Mechanical Analyzer (TMA) for samples that were uniaxially compressed at 10,000 psi and analyzed over a dynamic temperature range of -20 C to 70 C. Differential scanning calorimetry, DSC, was used to monitor CP decomposition at linear heating rates of 1-7 C min{sup -1} in perforated pans and of 0.1-1.0 C min{sup -1} in sealed pans. The kinetic triplet was calculated using the LLNL code Kinetics05, and predictions for 210 and 240 C are compared to isothermal thermogravimetric analysis (TGA) experiments. Values are also reported for spark, friction, and impact sensitivity
Bis(μ-dimesitylborinato-κ2 O:O)bis[(2-methylpyridine-κN)lithium]
The title compound, [Li2(C18H22BO)2(C6H7N)2], is a lithium dimesitylboroxide dimer in which the lithium cation is also coordinated by one molecule of 2-methylpyridine. At the core of the structure is an Li2O2 four-membered ring. The structure is centrosymmetric with an inversion centre midway between two Li atoms. Intermolecular C—H⋯π interactions and π–π interactions between the 2-methylpyridine rings exist [centroid–centroid distance = 3.6312 (16) Å]
Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS
Motivation: The reliable detection of genomic variation in resequencing data is still a major challenge, especially for variants larger than a few base pairs. Sequencing reads crossing boundaries of structural variation carry the potential for their identification, but are difficult to map.
Results: Here we present a method for ‘split’ read mapping, where prefix and suffix match of a read may be interrupted by a longer gap in the read-to-reference alignment. We use this method to accurately detect medium-sized insertions and long deletions with precise breakpoints in genomic resequencing data. Compared with alternative split mapping methods, SplazerS significantly improves sensitivity for detecting large indel events, especially in variant-rich regions. Our method is robust in the presence of sequencing errors as well as alignment errors due to genomic mutations/divergence, and can be used on reads of variable lengths. Our analysis shows that SplazerS is a versatile tool applicable to unanchored or single-end as well as anchored paired-end reads. In addition, application of SplazerS to targeted resequencing data led to the interesting discovery of a complete, possibly functional gene retrocopy variant.
Availability: SplazerS is available from http://www.seqan.de/projects/ splazers
Recommended from our members
A Historical and Current Perspective on Predicting Thermal Cookoff Behavior
Prediction of thermal explosions using chemical kinetic models dates back nearly a century. However, it has only been within the past 25 years that kinetic models and digital computers made reliable predictions possible. Two basic approaches have been used to derive chemical kinetic models for high explosives: [1] measurement of the reaction rate of small samples by mass loss (thermogravimetric analysis, TGA), heat release (differential scanning calorimetry, DSC), or evolved gas analysis (mass spectrometry, infrared spectrometry, etc.) or [2] inference from larger-scale experiments measuring the critical temperature (T{sub m}, lowest T for self-initiation), the time to explosion as a function of temperature, and sometimes a few other results, such as temperature profiles. Some of the basic principles of chemical kinetics involved are outlined, and major advances in these two approaches through the years are reviewed
Recommended from our members
Exploring the Physical, Chemical and Thermal Characteristics of a New Potentially Insensitive High Explosive: RX-55-AE-5
Current work at the Energetic Materials Center, EMC, at Lawrence Livermore National Laboratory (LLNL) includes both understanding properties of old explosives and measuring properties of new ones [1]. The necessity to know and understand the properties of energetic materials is driven by the need to improve performance and enhance stability to various stimuli, such as thermal, friction and impact insult. This review will concentrate on the physical properties of RX-55-AE-5, which is formulated from heterocyclic explosive, 2,6-diamino-3,5-dinitropyrazine-1-oxide, LLM-105, and 2.5% Viton A. Differential scanning calorimetry (DSC) was used to measure a specific heat capacity, C{sub p}, of {approx} 0.950 J/g{center_dot} C and a thermal conductivity, {kappa}, of {approx} 0.475 W/m{center_dot} C. The LLNL kinetics modeling code Kinetics05 and the Advanced Kinetics and Technology Solutions (AKTS) code Thermokinetics were both used to calculate Arrhenius kinetics for decomposition of LLM-105. Both obtained an activation energy barrier E {approx} 180 kJ mol{sup -1} for mass loss in an open pan. Thermal mechanical analysis, TMA, was used to measure the coefficient of thermal expansion (CTE). The CTE for this formulation was calculated to be {approx} 61 {micro}m/m{center_dot} C. Impact, spark, friction are also reported
Recommended from our members
Solid-Solid Phase Transition Kinetics of FOX-7
Since it was developed in the late 1990s, 1,1-diamino-2,2-dinitroethene (FOX-7), with lower sensitivity and comparable performance to RDX, has received increasing interest. This paper will present our results for the phase changes of FOX-7 using DSC and HFC (Heat Flow Calorimetry). DSC thermal curves recorded at linear heating rates of 0.10, 0.35 and 1.0 C min{sup -1} show two endothermic peaks and two exothermic peaks. The two endothermic peaks represent solid-solid phase transitions, which have been observed in the literature at 114 C ({beta}-{gamma}) and 159 C ({gamma}-{delta}) by both DSC and XPD (X-ray powder diffraction) measurements. The first transition shifts from 114.5 to 115.8 C as the heating rate increases from 0.10 to 1.0 C min{sup -1}, while the second transition shifts from 158.5 to 160.4 C. Cyclical heating experiments show the endotherms and exotherms for a first heating through the {gamma} phase to the {delta} phase, a cooling and reversion to the {alpha} or {beta} phase, and a second heating to the {gamma} and {delta} phases. The data are interpreted using kinetic models with thermodynamic constraints
- …