2 research outputs found
Exploring neighborhoods in large metagenome assembly graphs reveals hidden sequence diversity
Genomes computationally inferred from large metagenomic data
sets are often incomplete and may be missing functionally important
content and strain variation. We introduce an information retrieval
system for large metagenomic data sets that exploits the sparsity
of DNA assembly graphs to efficiently extract subgraphs surround-
ing an inferred genome. We apply this system to recover missing
content from genome bins and show that substantial genomic se-
quence variation is present in a real metagenome. Our software
implementation is available at https://github.com/spacegraphcats/
spacegraphcats under the 3-Clause BSD License