60 research outputs found

    Detection of lineage-specific evolutionary changes among primate species

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparison of the human genome with other primates offers the opportunity to detect evolutionary events that created the diverse phenotypes among the primate species. Because the primate genomes are highly similar to one another, methods developed for analysis of more divergent species do not always detect signs of evolutionary selection.</p> <p>Results</p> <p>We have developed a new method, called DivE, specifically designed to find regions that have evolved either more or less rapidly than expected, for any clade within a set of very closely related species. Unlike some previous methods, DivE does not rely on rates of synonymous and nonsynonymous substitution, which enables it to detect evolutionary events in noncoding regions. We demonstrate using simulated data that DivE compares favorably to alternative methods, and we then apply DivE to the ENCODE regions in 14 primate species. We identify thousands of regions in these primates, ranging from 50 to >10000 bp in length, that appear to have experienced either constrained or accelerated rates of evolution. In particular, we detected 4942 regions that have potentially undergone positive selection in one or more primate species. Most of these regions occur outside of protein-coding genes, although we identified 20 proteins that have experienced positive selection.</p> <p>Conclusions</p> <p>DivE provides an easy-to-use method to predict both positive and negative selection in noncoding DNA, that is particularly well-suited to detecting lineage-specific selection in large genomes.</p

    Coalescent Theory and Yule Trees in time and space

    Get PDF
    Mathematically, Coalescent Theory describes genealogies within a population in the form of (binary) trees. The original Coalescent Model is based on population models that are evolving neutrally. With respect to graph isomorphy, the tree-structures it provides can be equivalently described in a discrete setting by the Yule Process. As a population evolves (in time), the genealogy of the population is subject to change, and so is the tree structure associated with it. A similar statement holds true if the population is assumed to be recombining; then, in space, i.e. along the genome, the genealogy of a sample may be subject to change in a similar way. The two main focuses of this thesis are the description of the processes that shape the genealogy in time and in space, making use of the relation between Coalescent and Yule Process. As for the process in time, the presented approach differs from existing ones mainly in that the population considered is strictly finite. The results we obtain are of mainly theoretical nature. In case of the process along the genome, we focus on mathematical properties of Linkage Disequilibrium, a quantity that is relevant in the analysis of population-genetical data. Similarities and differences between the two are discussed, and a possibility of performing similar analyses when the assumption of neutrality is abandoned is pointed out

    Complex modular architecture around a simple toolkit of wing pattern genes

    Get PDF
    Identifying the genomic changes that control morphological variation and understanding how they generate diversity is a major goal of evolutionary biology. In Heliconius butterflies, a small number of genes control the development of diverse wing colour patterns. Here, we used full-genome sequencing of individuals across the Heliconius erato radiation and closely related species to characterize genomic variation associated with wing pattern diversity. We show that variation around colour pattern genes is highly modular, with narrow genomic intervals associated with specific differences in colour and pattern. This modular architecture explains the diversity of colour patterns and provides a flexible mechanism for rapid morphological diversification.We acknowledge the University of Puerto Rico, the Puerto Rico INBRE grant P20 GM103475 from the National Institute for General Medical Sciences (NIGMS), a component of the National Institutes of Health (NIH); CNRS Nouraugues and CEBA awards (B.A.C.); National Science Foundation awards DEB-1257839 (B.A.C.), DEB-1257689 (W.O.M.), DEB-1027019 (W.O.M.); awards 1010094 and 1002410 from the Experimental Program to Stimulate Competitive Research (EPSCoR) program of the National Science Foundation (NSF) for computational resources; and the Smithsonian Institution. This research was supported in part by Lilly Endowment, Inc., through its support for the Indiana University Pervasive Technology Institute, and in part by the Indiana METACyt Initiative. The Indiana METACyt Initiative at IU is also supported in part by Lilly Endowment, Inc

    Postprocessing phylogenies

    Get PDF
    Es werden immer mehr phylogenetische BĂ€ume berechnet. Die berechneten Verwandtschaften zwischen den Arten können sich allerdings widersprechen. In diesem Fall sind Werkzeuge notwendig, welche die Höhe des Unterschiedes berechnen, die Gemeinsamkeiten zweier BĂ€ume extrahieren und mehrere BĂ€ume zusammenfassen indem sie die Unterschiede minimieren. Diese Werkzeuge werden unter dem Begriff ``Phylogenetic Postprocessing'' zusammengefasst. In dieser Arbeit werden zwei Aspekte des Phylogenetischen Postprocessings im Detail untersucht. Zuerst werden Baumdistanzen untersucht. Diese evaluieren den Unterschied zweier BĂ€ume. Die meisten Maße berĂŒcksichtigen dabei nur die topologische Information. Allerdings tragen auch die KantenlĂ€ngen der BĂ€ume Informationen, da sie z.B. eine SchĂ€tzung der Menge an Unterschied zwischen zwei Sequenzen sind. Ein Maß, welches sowohl die Topologie als auch die KantenlĂ€ngen berĂŒcksichtigt, ist die LĂ€nge des kĂŒrzesten Weges durch den Raum aller BĂ€ume mit KantenlĂ€ngen. Dies ist die geodĂ€tische Distanz. Hier prĂ€sentieren wir einen exakten Algorithmus um die geodĂ€tische Distanz zu berechnen, der in exponentieller Zeit lĂ€uft. Vergleiche mit ihren Approximationen zeigen, dass es einen bestimmten Weg gibt, der die geodĂ€tische Distanz gut annĂ€hert und in linearer Zeit berechnet werden kann. Phylogenetische BĂ€ume können auch daraufhin untersucht werden, ob sie statistisch Ă€hnlich oder unterschiedlich sind. Dabei kann ein topologisches Distanzmaß als Teststatistik verwendet und die assoziierten p-Werte werden unter einer Nullverteilung der BĂ€ume berechnet werden. Bei diskreten Testverfahren, muss allerdings die TestgrĂ¶ĂŸe konservativ gewĂ€hlt werden, d.h. sie darf das Signifikanzniveau nicht ĂŒberschreiten. Wir zeigen ein Beispiel auf, bei dem ein Test abgeĂ€ndert werden muss um dies zu gewĂ€hrleisten. Der zweite Aspekt ist die Kombination von BĂ€umen oder allgemein phylogenetischen DatensĂ€tzen. GenbĂ€ume mit sich ĂŒberschneidenden Artenmengen können zu einem sogenannten Supertree zusammengefĂŒgt werden. Eine andere Möglichkeit ist bereits die Genalignments zu kombinieren. Dabei werden die Genalignments aneinandergehangen, d.h. zu einem sogenannten Superalignment kombiniert. Anschließend wird eine Phylogenie aus diesem langen Alignment berechnet. Es gibt auch die dritte Möglichkeit, die Daten auf einer Stufe zwischen Superalignment und Supertree zu kombinieren. Mit Hilfe von Simulationen von Genalignments entlang ModellbĂ€umen können Methoden von diesen drei Stufen verglichen werden. Wir untersuchen verschiedene Parameter, z.B. vollstĂ€ndige oder sich ĂŒberschneidende Artenmengen, gleiche oder unterschiedliche Substitutionsparameter oder unterschiedliche Gentopologien. Die Simulationen zeigen gute Ergebnisse der Matrix-Representation-Methoden im Vergleich zu anderen Supertreemethoden. Weiterhin ist Superalignment gut geeignet bei unterschiedlichen Parametern zwischen den Genen, aber problematisch wenn es viele Unterschiede zwischen den wahren GenbĂ€umen gibt. ZusĂ€tzlich zu diesem praktischen Vergleich von Supertreemethoden sind auch theoretische und praktische Aspekte von Interesse. Daher untersuchen wir die Nullmodelle, die der Supertreerekonstruktion zugrunde liegen. Ein solches Nullmodell ist die Gleichverteilung der Splits, also jeder möglichen Unterteilung der Arten in zwei Mengen. Es stellt sich heraus, dass nur diese Verteilung angemessene Eigenschaften hat, wenn wenig Information vorhanden ist. Ein zweites Nullmodell ist die Gleichverteilung der BĂ€ume. Diese fĂŒgt allerdings eine Verzerrung zugunsten bestimmter Baumstrukturen in splitbasierte Supertreemethoden ein. Diese Verzerrung kann auf die ungleiche Verteilung der Splits in diesem Nullmodell zurĂŒckgefĂŒhrt werden. Schließlich kann ein Supertree auch als Median-Tree definiert werden, also als Baum, der die totale Distanz zu allen BĂ€umen in der Menge minimiert. Der Majority-Rule Consensus wurde als Median-Tree-Methode fĂŒr BĂ€ume mit gleichen Artenmengen beschrieben. FĂŒr BĂ€ume mit sich ĂŒberschneidenden Artenmengen gibt als allerdings unterschiedliche AusprĂ€gungen, und zwar MR(-)supertrees und MR(+)supertrees. Wir prĂ€sentieren Algorithmen um die entsprechenden Distanzen im Matrix-Representation-Framework zu berechnen. Durch die Anwendung ihrer Implementierungen auf simulierte DatensĂ€tze sehen wir deutlich bessere Ergebnisse fĂŒr MR(-) im Vergleich zu MR(+). Es ist naheliegend diesen Unterschied auf eine Verzerrung zugunsten bestimmter Baumstrukturen in MR(+) zurĂŒckzufĂŒhren. Zusammenfassend sehen wir, dass die zwei Aspekte des Phylogenetischen Postprocessings, also Baumdistanzen und Baumkombinationsmethoden, nicht unabhĂ€ngig sind, sondern durch die Definition des Median-Trees verbunden. Daher wird unser VerstĂ€ndnis von Baumdistanzen auch die Kombination von BĂ€umen beeinflussen und umgekehrt.More and more phylogenetic trees are generated, and it frequently occurs that the inferred relationships contradict each other. In this case, tools are necessary which evaluate the amount of difference between two trees, extract the congruencies of two trees, and combine multiple trees by minimizing the incongruencies. These tools are summarized by the term ``phylogenetic postprocessing''. In this thesis, two aspects of phylogenetic postprocessing are investigated in detail. First, tree distance computations evaluate the amount of difference between two trees. Most measures only take the topological information into account. There are a few measures that additionally focus on the branch lengths of the trees. One of these is the length of the shortest path in the space of weighted trees, also known as the geodesic distance. Here, an exact, but exponential-time, algorithm to compute the geodesic distance is presented. Comparisons with its approximations show that there is a particular path that approximates the geodesic distance well and that can be computed in linear time. Phylogenetic trees can also be tested for being statistically similar or different. Then a topological distance measure can be used as a test statistic where the associated p-value is computed under a null distribution of trees. Discrete tests must ensure that the size of the test is conservative, i.e. the size must not exceed the significance level. We present one example where a test has to be modified to ensure this property. Second, gene trees on overlapping taxon sets can be combined into a so-called supertree. Another possibility is to combine the gene alignments directly, namely, to concatenate the gene alignments into a superalignment and to reconstruct a phylogeny from this long alignment. There is also the possibility to combine the data at a level between superalignment and supertree methods. Simulations of gene alignments along model gene trees allow for the comparison of methods from all three levels. We investigate different settings, e.g. complete or overlapping taxon sets, equal or different substitution parameters or different gene topologies. The results show a good performance of matrix representation methods compared to other supertree and medium-level methods. Furthermore, superalignment is well applicable in the case of differing parameters between genes but is problematic when a high level of incongruence is present among the true gene trees. Additionally to the practical evaluation of supertree methods, theoretical and algorithmic aspects are of interest. Therefore we study different null models underlying supertree reconstruction. We find only the distribution of equally likely splits to behave in an appropriate way if little information is present. In contrast, the distribution of equally likely trees inserts a tree shape bias in split-based supertree methods. This bias can be traced back to the unequal split distribution in the null model. Finally, a supertree can also be defined by minimizing the total distance to the trees in the set, i.e. as a median tree. The majority-rule consensus is described as a median tree method for trees on the same taxon set. For trees on overlapping taxon sets, however, different specifications can be used, namely MR(-)supertrees and MR(+)supertrees. We present algorithms to compute the respective distances in the matrix representation framework. Applying their implementation to simulated data sets shows a clearly better performance of MR(-) compared to MR(+). This discrepancy is likely to trace back to a tree shape bias in MR(+). To conclude, we see that the two aspect of phylogenetic postprocessing, tree distances and tree combination methods, are not independent. Instead, they are linked by the definition of the median tree. Thus our understanding of tree distances influences data combination methods and vice versa

    Investigation of volume rendering performance through active learning and visual analysis

    Get PDF
    Volume visualization has many real world applications such as medical imaging and scientific research. Rendering volumes can be done directly by shooting rays from the camera through the volume data, or indirectly by extracting features such as iso-surfaces. Knowing the runtime performance of visualization techniques enables for optimized infrastructure planning, trained models could also be reused for interactive quality adaption. Prediction models can make use of information about renderer and datasets to determine execution times before rendering. In this thesis, we present a model based on neural networks to predict rendering times, by using volume properties and rendering configuration. Moreover, our model actively intervenes the sampling process to improve learning while decreasing the amount of necessary measurements. For this, it estimates how likely a drawn sample will improve future predictions. Our model consists of multiple submodels, using their disagreement about certain samples as criteria for possible improvement. We evaluate our model, using different sampling strategies, loss functions and volume rendering techniques. This includes predictions based on measurement data of a volume raycaster, as well as a continuous setup with interleaved execution and prediction of an indirect volume renderer. Our indirect renderer utilizes marching cubes to extract iso-surfaces as triangle mesh from a density field and organizes them in an octree. This way, highly parallel sorting on the graphics card is enabled that is necessary for rendering transparent surfaces in correct order

    LIPIcs, Volume 248, ISAAC 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 248, ISAAC 2022, Complete Volum

    An Algorithmic Walk from Static to Dynamic Graph Clustering

    Get PDF

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum

    15th Scandinavian Symposium and Workshops on Algorithm Theory: SWAT 2016, June 22-24, 2016, Reykjavik, Iceland

    Get PDF

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum
    • 

    corecore