ENVirT: inference of ecological characteristics of viruses from metagenomic data

Abstract

Background Estimating the parameters that describe the ecology of viruses,particularly those that are novel, can be made possible using metagenomic approaches. However, the best-performing existing methods require databases to first estimate an average genome length of a viral community before being able to estimate other parameters, such as viral richness. Although this approach has been widely used, it can adversely skew results since the majority of viruses are yet to be catalogued in databases. Results In this paper, we present ENVirT, a method for estimating the richness of novel viral mixtures, and for the first time we also show that it is possible to simultaneously estimate the average genome length without a priori information. This is shown to be a significant improvement over database-dependent methods, since we can now robustly analyze samples that may include novel viral types under-represented in current databases. We demonstrate that the viral richness estimates produced by ENVirT are several orders of magnitude higher in accuracy than the estimates produced by existing methods named PHACCS and CatchAll when benchmarked against simulated data. We repeated the analysis of 20 metavirome samples using ENVirT, which produced results in close agreement with complementary in virto analyses. Conclusions These insights were previously not captured by existing computational methods. As such, ENVirT is shown to be an essential tool for enhancing our understanding of novel viral populations.This work was supported partially by Australia Research Council [grant numbers LP140100670 and DP150103512] and the Biodiversity Research Center, Academia Sinica, Taiwan. DJ, DH, DS and YS were funded by the MIFRS and MIRS scholarships of The University of Melbourne. Publication costs were funded by The Australian National University

    Similar works