An efficient simulator of 454 data using configurable statistical models


Background Roche 454 is one of the major 2nd generation sequencing platforms. The particular characteristics of 454 sequence data   pose new challenges for bioinformatic analyses, e.g. assembly and alignment search   algorithms. Simulation of these data is therefore useful, in order to further assess   how bioinformatic applications and algorithms handle 454 data. Findings We developed a new application named 454sim for simulation of 454 data at high speed   and accuracy. The program is multi-thread capable and is available as C++ source code   or pre-compiled binaries. Sequence reads are simulated by 454sim using a set of statistical   models for each chemistry. 454sim simulates recorded peak intensities, peak quality   deterioration and it calculates quality values. All three generations of the Roche   454 chemistry ('GS20', 'GS FLX' and 'Titanium') are supported and defined in external   text files for easy access and tweaking. Conclusions We present a new platform independent application named 454sim. 454sim is generally   200 times faster compared to previous programs and it allows for simple adjustments   of the statistical models. These improvements make it possible to carry out more complex   and rigorous algorithm evaluations in a reasonable time scale

    Similar works