6 research outputs found
Inference of Transposable Element Ancestry
<div><p>Most common methods for inferring transposable element (TE) evolutionary relationships are based on dividing TEs into subfamilies using shared diagnostic nucleotides. Although originally justified based on the āmaster geneā model of TE evolution, computational and experimental work indicates that many of the subfamilies generated by these methods contain multiple source elements. This implies that subfamily-based methods give an incomplete picture of TE relationships. Studies on selection, functional exaptation, and predictions of horizontal transfer may all be affected. Here, we develop a Bayesian method for inferring TE ancestry that gives the probability that each sequence was replicative, its frequency of replication, and the probability that each extant TE sequence came from each possible ancestral sequence. Applying our method to 986 members of the newly-discovered LAVA family of TEs, we show that there were far more source elements in the history of LAVA expansion than subfamilies identified using the CoSeg subfamily-classification program. We also identify multiple replicative elements in the <i>Alu</i>Sc subfamily in humans. Our results strongly indicate that a reassessment of subfamily structures is necessary to obtain accurate estimates of mutation processes, phylogenetic relationships and historical times of activity.</p></div
Ancestry networks of <i>Alu</i>Sc sequences.
<p>The predicted network of <i>Alu</i>Sc ancestral relationships is shown. A) All sequences that replicated with probability >30% are represented as nodes in the network. Arrows are drawn between sequences if there was at least 5% probability that an ancestral relationship existed between those sequences, with the direction of the ancestor-descendant relationships indicated by the arrows. Sequences are colored based on their CoSeg subfamily assignments. B) The network in A is modified by the addition of all extant TEs in the data added to the network as nodes represented by small dots. Edges are drawn between an element and an ancestral sequence if there was at least 5% probability the element descended from the ancestral sequence. Nodes are colored based on CoSeg subfamily assignment.</p
Ancestral relationships among LAVA elements.
<p>The predicted network of LAVA ancestral relationships is shown. A) All sequences that replicated with probability >30% are represented as nodes in the network. Arrows are drawn between sequences if there was at least 5% probability that an ancestral relationship existed between those sequences, with the direction of the ancestor-descendant relationships indicated by the arrows. Sequences are colored based on their CoSeg subfamily assignments (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004482#pgen.1004482.s005" target="_blank">Table S2</a>). Sequences colored white do not exist in the data, but are inferred to have existed ancestrally. B) The network in A is modified by the addition of all extant TEs in the data added to the network as nodes represented by small dots. Edges are drawn between an element and an ancestral sequence if there was at least 5% probability the element descended from the ancestral sequence. Nodes are colored based on CoSeg subfamily assignment.</p
Number of replicative sequences identified for different prior penalties in LAVA and <i>Alu</i>Sc.
<p>Number of replicative sequences identified for different prior penalties in LAVA and <i>Alu</i>Sc.</p
Posterior distribution of the number of replicative sequences.
<p>The Posterior distribution of the number of replicative sequences in A)LAVA and B)<i>Alu</i>Sc is given for MCMC runs with different penalties applied to each additional replicative sequence. Higher penalties indicate a prior distribution favoring fewer replicative sequences. Each distribution is an average over 10 replicates.</p
New AnTE subfamily assignments for LAVA elements.
<p>The predicted network of LAVA TE ancestral relationships is shown, as in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004482#pgen-1004482-g003" target="_blank">Figure 3</a>. A) All sequences that replicated with probability >30% are represented as nodes in the network, exactly as in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004482#pgen-1004482-g003" target="_blank">Figure 3A</a> except that nodes are colored based on their new AnTE-based subfamily assignments. B) As in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004482#pgen-1004482-g004" target="_blank">Figure 4A</a>, all TEs in the data are added to the network as nodes, represented by small dots, and using the coloring scheme of the new AnTE-based subfamily assignments.</p