AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees

A Dereeper; A Loytynoja; A Wehe; AR Nabhan; B Rannala; BG Hall; C Chauve; C Notredame; C Zhou; Chan Zhou; CR Linder; DA Benson; DJ Zwickl; DM Hillis; DT Jones; F Jacobsen; F Plazzi; F Ronquist; F Ronquist; Fenglou Mao; J Kim; J Pecon-Slattery; JA Eisen; Jinling Huang; Johann Peter Gogarten; JP Jenuth; JP Townsend; JP Townsend; K Katoh; K Katoh; K Liu; K Liu; K Tamura; KA Cranston; KB Li; KS Pick; L Liu; L Liu; M Poptsova; MN Price; MS Poptsova; MS Rosenberg; MS Rosenberg; N Lartillot; O Gascuel; Paul Jaak Janssen; PD Faith; RC Edgar; RD Page; RI Vane-Wright; S Guindon; S Guindon; S Nelesen; S Whelan; SF Altschul; T Frickey; Y Yin; Y Yin; Yanbin Yin; Ying Xu

research

AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees

Authors: A Dereeper
A Loytynoja
A Wehe
AR Nabhan
B Rannala
BG Hall
C Chauve
C Notredame
C Zhou
Chan Zhou
CR Linder
DA Benson
DJ Zwickl
DM Hillis
DT Jones
F Jacobsen
F Plazzi
F Ronquist
F Ronquist
Fenglou Mao
J Kim
J Pecon-Slattery
JA Eisen
Jinling Huang
Johann Peter Gogarten
JP Jenuth
JP Townsend
JP Townsend
K Katoh
K Katoh
K Liu
K Liu
K Tamura
KA Cranston
KB Li
KS Pick
L Liu
L Liu
M Poptsova
MN Price
MS Poptsova
MS Rosenberg
MS Rosenberg
N Lartillot
O Gascuel
Paul Jaak Janssen
PD Faith
RC Edgar
RD Page
RI Vane-Wright
S Guindon
S Guindon
S Nelesen
S Whelan
SF Altschul
T Frickey
Y Yin
Y Yin
Yanbin Yin
Ying Xu
Publication date: 1 January 2014
Publisher: 'Public Library of Science (PLoS)'
Doi

Abstract

A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php