gofasta:Command-line utilities for genomic epidemiology research

Abstract

SUMMARY: gofasta comprises a set of command-line utilities for handling alignments of short assembled genomes in a genomic epidemiology context. It was developed for processing large numbers of closely related SARS-CoV-2 viral genomes and should be useful with other densely sampled pathogen genomic datasets. It provides functions to convert sam-format pairwise alignments between assembled genomes to fasta format; to annotate mutations in multiple sequence alignments, and to extract sets of sequences by genetic distance measures for use in outbreak investigations. AVAILABILITY AND IMPLEMENTATION: gofasta is an open-source project distributed under the MIT license. Binaries are available at https://github.com/virus-evolution/gofasta, from Bioconda, and through the Go programming language’s package management system. Source code and further documentation, including walkthroughs for common use cases, are available on the GitHub repository

    Similar works