Bacteriocins are peptide-derived molecules produced by bacteria, whose
recently-discovered functions include virulence factors and signalling
molecules as well as their better known roles as antibiotics. To date, close to
five hundred bacteriocins have been identified and classified. Recent
discoveries have shown that bacteriocins are highly diverse and widely
distributed among bacterial species. Given the heterogeneity of bacteriocin
compounds, many tools struggle with identifying novel bacteriocins due to their
vast sequence and structural diversity. Many bacteriocins undergo
post-translational processing or modifications necessary for the biosynthesis
of the final mature form. Enzymatic modification of bacteriocins as well as
their export is achieved by proteins whose genes are often located in a
discrete gene cluster proximal to the bacteriocin precursor gene, referred to
as \textit{context genes} in this study. Although bacteriocins themselves are
structurally diverse, context genes have been shown to be largely conserved
across unrelated species. Using this knowledge, we set out to identify new
candidates for context genes which may clarify how bacteriocins are
synthesized, and identify new candidates for bacteriocins that bear no sequence
similarity to known toxins. To achieve these goals, we have developed a
software tool, Bacteriocin Operon and gene block Associator (BOA) that can
identify homologous bacteriocin associated gene clusters and predict novel
ones. We discover that several phyla have a strong preference for bactericon
genes, suggesting distinct functions for this group of molecules. Availability:
https://github.com/idoerg/BOAComment: Accepted for publication in BMC Bioinformatic