11 research outputs found
Recommended from our members
Evolutionary Inference in Transposable Elements
Transposable elements (TEs) are a large component of many eukaryotic genomes, and the evolution of TEs is closely connected to that of their hosts. Accurate inference of TE evolutionary relationships is essential to understanding the biology and evolution of TE families and the role they play in genome evolution. Additionally, the great quantity of TEs makes them a useful model system for understanding genomic processes such as mutation and recombination, and their utility as a research system also depends on accurate evolutionary inference.
In this dissertation, I describe novel computational methods for evolutionary inference in TEs, applying them primarily to the Alu family of primate retroelements. A major task in TE evolutionary study is the classification of elements of a family into subfamilies. I developed the AnTE algorithm, a Bayesian approach to subfamily classification that, in contrast to previous deterministic methods, allows for probabilistic subfamily classification, an important advance due to the high uncertainty involved. I use AnTE to provide a more complete picture of the evolutionary history of Alu elements than provided by previous analyses, especially regarding the role of gene conversion. This work suggests that current Alu subfamily classification found in widely-used databases such as RepeatMasker and RepBase provides a misleading account of Alu evolutionary relationships.
Building on the AnTE research, I developed a Bayesian phylogenetics approach to the detection and characterization of gene conversion events among TEs in a genome. I use this approach to identify a burst of interlocus gene conversion among Alu elements in the gorilla genome, occurring at much higher rates than on any other branch of the Great Ape phylogeny. Abnormally high Alu gene conversion rates in gorilla appear to be driven by binding to Alu by PRDM9, a rapidly-evolving protein that targets DNA sequence motifs for double-strand breaks in meiosis. These findings indicate one evolutionary pathway for rapid gene conversion in a TE family, and the conversion events identified provide a rich dataset for understanding the dynamics of gene conversion in primates
Datafile
This file is a .zip compressed archive containing subdirectories each having raw data and analysis scripts for producing the 7 figures in the paper
Inference of Transposable Element Ancestry
<div><p>Most common methods for inferring transposable element (TE) evolutionary relationships are based on dividing TEs into subfamilies using shared diagnostic nucleotides. Although originally justified based on the āmaster geneā model of TE evolution, computational and experimental work indicates that many of the subfamilies generated by these methods contain multiple source elements. This implies that subfamily-based methods give an incomplete picture of TE relationships. Studies on selection, functional exaptation, and predictions of horizontal transfer may all be affected. Here, we develop a Bayesian method for inferring TE ancestry that gives the probability that each sequence was replicative, its frequency of replication, and the probability that each extant TE sequence came from each possible ancestral sequence. Applying our method to 986 members of the newly-discovered LAVA family of TEs, we show that there were far more source elements in the history of LAVA expansion than subfamilies identified using the CoSeg subfamily-classification program. We also identify multiple replicative elements in the <i>Alu</i>Sc subfamily in humans. Our results strongly indicate that a reassessment of subfamily structures is necessary to obtain accurate estimates of mutation processes, phylogenetic relationships and historical times of activity.</p></div
Ancestral relationships among LAVA elements.
<p>The predicted network of LAVA ancestral relationships is shown. A) All sequences that replicated with probability >30% are represented as nodes in the network. Arrows are drawn between sequences if there was at least 5% probability that an ancestral relationship existed between those sequences, with the direction of the ancestor-descendant relationships indicated by the arrows. Sequences are colored based on their CoSeg subfamily assignments (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004482#pgen.1004482.s005" target="_blank">Table S2</a>). Sequences colored white do not exist in the data, but are inferred to have existed ancestrally. B) The network in A is modified by the addition of all extant TEs in the data added to the network as nodes represented by small dots. Edges are drawn between an element and an ancestral sequence if there was at least 5% probability the element descended from the ancestral sequence. Nodes are colored based on CoSeg subfamily assignment.</p
Posterior distribution of the number of replicative sequences.
<p>The Posterior distribution of the number of replicative sequences in A)LAVA and B)<i>Alu</i>Sc is given for MCMC runs with different penalties applied to each additional replicative sequence. Higher penalties indicate a prior distribution favoring fewer replicative sequences. Each distribution is an average over 10 replicates.</p
Number of replicative sequences identified for different prior penalties in LAVA and <i>Alu</i>Sc.
<p>Number of replicative sequences identified for different prior penalties in LAVA and <i>Alu</i>Sc.</p
New AnTE subfamily assignments for LAVA elements.
<p>The predicted network of LAVA TE ancestral relationships is shown, as in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004482#pgen-1004482-g003" target="_blank">Figure 3</a>. A) All sequences that replicated with probability >30% are represented as nodes in the network, exactly as in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004482#pgen-1004482-g003" target="_blank">Figure 3A</a> except that nodes are colored based on their new AnTE-based subfamily assignments. B) As in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004482#pgen-1004482-g004" target="_blank">Figure 4A</a>, all TEs in the data are added to the network as nodes, represented by small dots, and using the coloring scheme of the new AnTE-based subfamily assignments.</p
Ancestry networks of <i>Alu</i>Sc sequences.
<p>The predicted network of <i>Alu</i>Sc ancestral relationships is shown. A) All sequences that replicated with probability >30% are represented as nodes in the network. Arrows are drawn between sequences if there was at least 5% probability that an ancestral relationship existed between those sequences, with the direction of the ancestor-descendant relationships indicated by the arrows. Sequences are colored based on their CoSeg subfamily assignments. B) The network in A is modified by the addition of all extant TEs in the data added to the network as nodes represented by small dots. Edges are drawn between an element and an ancestral sequence if there was at least 5% probability the element descended from the ancestral sequence. Nodes are colored based on CoSeg subfamily assignment.</p