The Polygraph: A Data Structure for Genome Alignment and Variation Detection

Abstract

Comparing whole genomes and finding variation is an important and difficult bioinformatic task. We present the Polygraph, a data structure for reference- free, multiple whole genome alignment that can be used to identify genomic structural variation. This data structure is built from assembled genomes and preserves the genomic structure from the assembly. It avoids the \hairball graph structure that can occur in other graph methods such as de Bruijn graphs. The Polygraph can easily be visualized and be used for identification of structural variants. We apply the Polygraph to Escherichia coli and Saccharomyces cerevisiae for finding Structural Variants

    Similar works