Search CORE

60 research outputs found

Detection of lineage-specific evolutionary changes among primate species

Abstract Background Comparison of the human genome with other primates offers the opportunity to detect evolutionary events that created the diverse phenotypes among the primate species. Because the primate genomes are highly similar to one another, methods developed for analysis of more divergent species do not always detect signs of evolutionary selection. Results We have developed a new method, called DivE, specifically designed to find regions that have evolved either more or less rapidly than expected, for any clade within a set of very closely related species. Unlike some previous methods, DivE does not rely on rates of synonymous and nonsynonymous substitution, which enables it to detect evolutionary events in noncoding regions. We demonstrate using simulated data that DivE compares favorably to alternative methods, and we then apply DivE to the ENCODE regions in 14 primate species. We identify thousands of regions in these primates, ranging from 50 to >10000 bp in length, that appear to have experienced either constrained or accelerated rates of evolution. In particular, we detected 4942 regions that have potentially undergone positive selection in one or more primate species. Most of these regions occur outside of protein-coding genes, although we identified 20 proteins that have experienced positive selection. Conclusions DivE provides an easy-to-use method to predict both positive and negative selection in noncoding DNA, that is particularly well-suited to detecting lineage-specific selection in large genomes.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Repository at the University of Maryland

Coalescent Theory and Yule Trees in time and space

Author: Wirtz Johannes M.
Publication venue
Publication date: 01/01/2019
Field of study

Mathematically, Coalescent Theory describes genealogies within a population in the form of (binary) trees. The original Coalescent Model is based on population models that are evolving neutrally. With respect to graph isomorphy, the tree-structures it provides can be equivalently described in a discrete setting by the Yule Process. As a population evolves (in time), the genealogy of the population is subject to change, and so is the tree structure associated with it. A similar statement holds true if the population is assumed to be recombining; then, in space, i.e. along the genome, the genealogy of a sample may be subject to change in a similar way. The two main focuses of this thesis are the description of the processes that shape the genealogy in time and in space, making use of the relation between Coalescent and Yule Process. As for the process in time, the presented approach differs from existing ones mainly in that the population considered is strictly finite. The results we obtain are of mainly theoretical nature. In case of the process along the genome, we focus on mathematical properties of Linkage Disequilibrium, a quantity that is relevant in the analysis of population-genetical data. Similarities and differences between the two are discussed, and a possibility of performing similar analyses when the assumption of neutrality is abandoned is pointed out

Kölner UniversitätsPublikationsServer

Complex modular architecture around a simple toolkit of wing pattern genes

Author: Arias CF
Counterman BA
Hanly JJ
Hines HM
Jiggins CD
Lewis JJ
Linares M
Mallet J
Martin SH
McMillan WO
Moreira GRP
Papa R
Papanicolaou A
Rastas P
Ruiz M
Salazar C
Supple MA
Van Belleghem SM
Publication venue: Nature Ecology & Evolution
Publication date: 01/01/2017
Field of study

Identifying the genomic changes that control morphological variation and understanding how they generate diversity is a major goal of evolutionary biology. In Heliconius butterflies, a small number of genes control the development of diverse wing colour patterns. Here, we used full-genome sequencing of individuals across the Heliconius erato radiation and closely related species to characterize genomic variation associated with wing pattern diversity. We show that variation around colour pattern genes is highly modular, with narrow genomic intervals associated with specific differences in colour and pattern. This modular architecture explains the diversity of colour patterns and provides a flexible mechanism for rapid morphological diversification.We acknowledge the University of Puerto Rico, the Puerto Rico INBRE grant P20 GM103475 from the National Institute for General Medical Sciences (NIGMS), a component of the National Institutes of Health (NIH); CNRS Nouraugues and CEBA awards (B.A.C.); National Science Foundation awards DEB-1257839 (B.A.C.), DEB-1257689 (W.O.M.), DEB-1027019 (W.O.M.); awards 1010094 and 1002410 from the Experimental Program to Stimulate Competitive Research (EPSCoR) program of the National Science Foundation (NSF) for computational resources; and the Smithsonian Institution. This research was supported in part by Lilly Endowment, Inc., through its support for the Indiana University Pervasive Technology Institute, and in part by the Indiana METACyt Initiative. The Indiana METACyt Initiative at IU is also supported in part by Lilly Endowment, Inc

Crossref

UCL Discovery

edocUR

Western Sydney ResearchDirect

Apollo (Cambridge)

Postprocessing phylogenies

Author: Kupczok Anne
Publication venue
Publication date: 01/01/2010
Field of study

Es werden immer mehr phylogenetische Bäume berechnet. Die berechneten Verwandtschaften zwischen den Arten können sich allerdings widersprechen. In diesem Fall sind Werkzeuge notwendig, welche die Höhe des Unterschiedes berechnen, die Gemeinsamkeiten zweier Bäume extrahieren und mehrere Bäume zusammenfassen indem sie die Unterschiede minimieren. Diese Werkzeuge werden unter dem Begriff ``Phylogenetic Postprocessing'' zusammengefasst. In dieser Arbeit werden zwei Aspekte des Phylogenetischen Postprocessings im Detail untersucht. Zuerst werden Baumdistanzen untersucht. Diese evaluieren den Unterschied zweier Bäume. Die meisten Maße berücksichtigen dabei nur die topologische Information. Allerdings tragen auch die Kantenlängen der Bäume Informationen, da sie z.B. eine Schätzung der Menge an Unterschied zwischen zwei Sequenzen sind. Ein Maß, welches sowohl die Topologie als auch die Kantenlängen berücksichtigt, ist die Länge des kürzesten Weges durch den Raum aller Bäume mit Kantenlängen. Dies ist die geodätische Distanz. Hier präsentieren wir einen exakten Algorithmus um die geodätische Distanz zu berechnen, der in exponentieller Zeit läuft. Vergleiche mit ihren Approximationen zeigen, dass es einen bestimmten Weg gibt, der die geodätische Distanz gut annähert und in linearer Zeit berechnet werden kann. Phylogenetische Bäume können auch daraufhin untersucht werden, ob sie statistisch ähnlich oder unterschiedlich sind. Dabei kann ein topologisches Distanzmaß als Teststatistik verwendet und die assoziierten p-Werte werden unter einer Nullverteilung der Bäume berechnet werden. Bei diskreten Testverfahren, muss allerdings die Testgröße konservativ gewählt werden, d.h. sie darf das Signifikanzniveau nicht überschreiten. Wir zeigen ein Beispiel auf, bei dem ein Test abgeändert werden muss um dies zu gewährleisten. Der zweite Aspekt ist die Kombination von Bäumen oder allgemein phylogenetischen Datensätzen. Genbäume mit sich überschneidenden Artenmengen können zu einem sogenannten Supertree zusammengefügt werden. Eine andere Möglichkeit ist bereits die Genalignments zu kombinieren. Dabei werden die Genalignments aneinandergehangen, d.h. zu einem sogenannten Superalignment kombiniert. Anschließend wird eine Phylogenie aus diesem langen Alignment berechnet. Es gibt auch die dritte Möglichkeit, die Daten auf einer Stufe zwischen Superalignment und Supertree zu kombinieren. Mit Hilfe von Simulationen von Genalignments entlang Modellbäumen können Methoden von diesen drei Stufen verglichen werden. Wir untersuchen verschiedene Parameter, z.B. vollständige oder sich überschneidende Artenmengen, gleiche oder unterschiedliche Substitutionsparameter oder unterschiedliche Gentopologien. Die Simulationen zeigen gute Ergebnisse der Matrix-Representation-Methoden im Vergleich zu anderen Supertreemethoden. Weiterhin ist Superalignment gut geeignet bei unterschiedlichen Parametern zwischen den Genen, aber problematisch wenn es viele Unterschiede zwischen den wahren Genbäumen gibt. Zusätzlich zu diesem praktischen Vergleich von Supertreemethoden sind auch theoretische und praktische Aspekte von Interesse. Daher untersuchen wir die Nullmodelle, die der Supertreerekonstruktion zugrunde liegen. Ein solches Nullmodell ist die Gleichverteilung der Splits, also jeder möglichen Unterteilung der Arten in zwei Mengen. Es stellt sich heraus, dass nur diese Verteilung angemessene Eigenschaften hat, wenn wenig Information vorhanden ist. Ein zweites Nullmodell ist die Gleichverteilung der Bäume. Diese fügt allerdings eine Verzerrung zugunsten bestimmter Baumstrukturen in splitbasierte Supertreemethoden ein. Diese Verzerrung kann auf die ungleiche Verteilung der Splits in diesem Nullmodell zurückgeführt werden. Schließlich kann ein Supertree auch als Median-Tree definiert werden, also als Baum, der die totale Distanz zu allen Bäumen in der Menge minimiert. Der Majority-Rule Consensus wurde als Median-Tree-Methode für Bäume mit gleichen Artenmengen beschrieben. Für Bäume mit sich überschneidenden Artenmengen gibt als allerdings unterschiedliche Ausprägungen, und zwar MR(-)supertrees und MR(+)supertrees. Wir präsentieren Algorithmen um die entsprechenden Distanzen im Matrix-Representation-Framework zu berechnen. Durch die Anwendung ihrer Implementierungen auf simulierte Datensätze sehen wir deutlich bessere Ergebnisse für MR(-) im Vergleich zu MR(+). Es ist naheliegend diesen Unterschied auf eine Verzerrung zugunsten bestimmter Baumstrukturen in MR(+) zurückzuführen. Zusammenfassend sehen wir, dass die zwei Aspekte des Phylogenetischen Postprocessings, also Baumdistanzen und Baumkombinationsmethoden, nicht unabhängig sind, sondern durch die Definition des Median-Trees verbunden. Daher wird unser Verständnis von Baumdistanzen auch die Kombination von Bäumen beeinflussen und umgekehrt.More and more phylogenetic trees are generated, and it frequently occurs that the inferred relationships contradict each other. In this case, tools are necessary which evaluate the amount of difference between two trees, extract the congruencies of two trees, and combine multiple trees by minimizing the incongruencies. These tools are summarized by the term ``phylogenetic postprocessing''. In this thesis, two aspects of phylogenetic postprocessing are investigated in detail. First, tree distance computations evaluate the amount of difference between two trees. Most measures only take the topological information into account. There are a few measures that additionally focus on the branch lengths of the trees. One of these is the length of the shortest path in the space of weighted trees, also known as the geodesic distance. Here, an exact, but exponential-time, algorithm to compute the geodesic distance is presented. Comparisons with its approximations show that there is a particular path that approximates the geodesic distance well and that can be computed in linear time. Phylogenetic trees can also be tested for being statistically similar or different. Then a topological distance measure can be used as a test statistic where the associated p-value is computed under a null distribution of trees. Discrete tests must ensure that the size of the test is conservative, i.e. the size must not exceed the significance level. We present one example where a test has to be modified to ensure this property. Second, gene trees on overlapping taxon sets can be combined into a so-called supertree. Another possibility is to combine the gene alignments directly, namely, to concatenate the gene alignments into a superalignment and to reconstruct a phylogeny from this long alignment. There is also the possibility to combine the data at a level between superalignment and supertree methods. Simulations of gene alignments along model gene trees allow for the comparison of methods from all three levels. We investigate different settings, e.g. complete or overlapping taxon sets, equal or different substitution parameters or different gene topologies. The results show a good performance of matrix representation methods compared to other supertree and medium-level methods. Furthermore, superalignment is well applicable in the case of differing parameters between genes but is problematic when a high level of incongruence is present among the true gene trees. Additionally to the practical evaluation of supertree methods, theoretical and algorithmic aspects are of interest. Therefore we study different null models underlying supertree reconstruction. We find only the distribution of equally likely splits to behave in an appropriate way if little information is present. In contrast, the distribution of equally likely trees inserts a tree shape bias in split-based supertree methods. This bias can be traced back to the unequal split distribution in the null model. Finally, a supertree can also be defined by minimizing the total distance to the trees in the set, i.e. as a median tree. The majority-rule consensus is described as a median tree method for trees on the same taxon set. For trees on overlapping taxon sets, however, different specifications can be used, namely MR(-)supertrees and MR(+)supertrees. We present algorithms to compute the respective distances in the matrix representation framework. Applying their implementation to simulated data sets shows a clearly better performance of MR(-) compared to MR(+). This discrepancy is likely to trace back to a tree shape bias in MR(+). To conclude, we see that the two aspect of phylogenetic postprocessing, tree distances and tree combination methods, are not independent. Instead, they are linked by the definition of the median tree. Thus our understanding of tree distances influences data combination methods and vice versa

OTHES

Investigation of volume rendering performance through active learning and visual analysis

Author: Roth Stephan
Publication venue
Publication date: 01/01/2017
Field of study

Volume visualization has many real world applications such as medical imaging and scientific research. Rendering volumes can be done directly by shooting rays from the camera through the volume data, or indirectly by extracting features such as iso-surfaces. Knowing the runtime performance of visualization techniques enables for optimized infrastructure planning, trained models could also be reused for interactive quality adaption. Prediction models can make use of information about renderer and datasets to determine execution times before rendering. In this thesis, we present a model based on neural networks to predict rendering times, by using volume properties and rendering configuration. Moreover, our model actively intervenes the sampling process to improve learning while decreasing the amount of necessary measurements. For this, it estimates how likely a drawn sample will improve future predictions. Our model consists of multiple submodels, using their disagreement about certain samples as criteria for possible improvement. We evaluate our model, using different sampling strategies, loss functions and volume rendering techniques. This includes predictions based on measurement data of a volume raycaster, as well as a continuous setup with interleaved execution and prediction of an indirect volume renderer. Our indirect renderer utilizes marching cubes to extract iso-surfaces as triangle mesh from a density field and organizes them in an octree. This way, highly parallel sorting on the graphics card is enabled that is necessary for rendering transparent surfaces in correct order

LIPIcs, Volume 248, ISAAC 2022, Complete Volume

Author: Bae Sang Won
Park Heejin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Algorithms and Computation (ISAAC 2022)
Publication date: 01/01/2022
Field of study

LIPIcs, Volume 248, ISAAC 2022, Complete Volum

Dagstuhl Research Online Publication Server

An Algorithmic Walk from Static to Dynamic Graph Clustering

Author: Görke Robert
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2010
Field of study

CiteSeerX

KITopen

LIPIcs, Volume 244, ESA 2022, Complete Volume

Author: Chechik Shiri
Herman Grzegorz
Navarro Gonzalo
Rotenberg Eva
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual European Symposium on Algorithms (ESA 2022)
Publication date: 01/01/2022
Field of study

LIPIcs, Volume 244, ESA 2022, Complete Volum

Dagstuhl Research Online Publication Server

15th Scandinavian Symposium and Workshops on Algorithm Theory: SWAT 2016, June 22-24, 2016, Reykjavik, Iceland

Author
Publication venue: Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing
Publication date: 01/06/2016
Field of study

Digitale Bibliothek Thüringen

LIPIcs, Volume 274, ESA 2023, Complete Volume

Author: Farach-Colton Martin
Herman Grzegorz
Puglisi Simon J.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual European Symposium on Algorithms (ESA 2023)
Publication date: 01/01/2023
Field of study

LIPIcs, Volume 274, ESA 2023, Complete Volum

Dagstuhl Research Online Publication Server