80 research outputs found
SLIQ: Simple Linear Inequalities for Efficient Contig Scaffolding
Scaffolding is an important subproblem in "de novo" genome assembly in which
mate pair data are used to construct a linear sequence of contigs separated by
gaps. Here we present SLIQ, a set of simple linear inequalities derived from
the geometry of contigs on the line that can be used to predict the relative
positions and orientations of contigs from individual mate pair reads and thus
produce a contig digraph. The SLIQ inequalities can also filter out unreliable
mate pairs and can be used as a preprocessing step for any scaffolding
algorithm. We tested the SLIQ inequalities on five real data sets ranging in
complexity from simple bacterial genomes to complex mammalian genomes and
compared the results to the majority voting procedure used by many other
scaffolding algorithms. SLIQ predicted the relative positions and orientations
of the contigs with high accuracy in all cases and gave more accurate position
predictions than majority voting for complex genomes, in particular the human
genome. Finally, we present a simple scaffolding algorithm that produces linear
scaffolds given a contig digraph. We show that our algorithm is very efficient
compared to other scaffolding algorithms while maintaining high accuracy in
predicting both contig positions and orientations for real data sets.Comment: 16 pages, 6 figures, 7 table
- …