10 research outputs found
Inferring phylogenetic trees under the general Markov model via a minimum spanning tree backbone
Phylogenetic trees are models of the evolutionary relationships among species, with species typically placed at the leaves of trees. We address the following problems regarding the calculation of phylogenetic trees. (1) Leaf-labeled phylogenetic trees may not be appropriate models of evolutionary relationships among rapidly evolving pathogens which may contain ancestor-descendant pairs. (2) The models of gene evolution that are widely used unrealistically assume that the base composition of DNA sequences does not evolve. Regarding problem (1) we present a method for inferring generally labeled phylogenetic trees that allow sampled species to be placed at non-leaf nodes of the tree. Regarding problem (2), we present a structural expectation maximization method (SEM-GM) for inferring leaf-labeled phylogenetic trees under the general Markov model (GM) which is the most complex model of DNA substitution that allows the evolution of base composition. In order to improve the scalability of SEM-GM we present a minimum spanning tree (MST) framework called MST-backbone. MST-backbone scales linearly with the number of leaves. However, the unrealistic location of the root as inferred on empirical data suggests that the GM model may be overtrained. MST-backbone was inspired by the topological relationship between MSTs and phylogenetic trees that was introduced by Choi et al. (2011). We discovered that the topological relationship does not necessarily hold if there is no unique MST. We propose so-called vertex-order based MSTs (VMSTs) that guarantee a topological relationship with phylogenetic trees.Phylogenetische BĂ€ume modellieren evolutionĂ€re Beziehungen zwischen Spezies, wobei die Spezies typischerweise an den BlĂ€ttern der BĂ€ume sitzen. Wir befassen uns mit den folgenden Problemen bei der Berechnung von phylogenetischen BĂ€umen. (1) Blattmarkierte phylogenetische BĂ€ume sind möglicherweise keine geeigneten Modelle der evolutionĂ€ren Beziehungen zwischen sich schnell entwickelnden Krankheitserregern, die Vorfahren-Nachfahren-Paare enthalten können. (2) Die weit verbreiteten Modelle der Genevolution gehen unrealistischerweise davon aus, dass sich die Basenzusammensetzung von DNA-Sequenzen nicht Ă€ndert. BezĂŒglich Problem (1) stellen wir eine Methode zur Ableitung von allgemein markierten phylogenetischen BĂ€umen vor, die es erlaubt, Spezies, fĂŒr die Proben vorliegen, an inneren des Baumes zu platzieren. BezĂŒglich Problem (2) stellen wir eine strukturelle Expectation-Maximization-Methode (SEM-GM) zur Ableitung von blattmarkierten phylogenetischen BĂ€umen unter dem allgemeinen Markov-Modell (GM) vor, das das komplexeste Modell von DNA-Substitution ist und das die Evolution von Basenzusammensetzung erlaubt. Um die Skalierbarkeit von SEM-GM zu verbessern, stellen wir ein Minimale Spannbaum (MST)-Methode vor, die als MST-Backbone bezeichnet wird. MST-Backbone skaliert linear mit der Anzahl der BlĂ€tter. Die Tatsache, dass die Lage der Wurzel aus empirischen Daten nicht immer realistisch abgeleitet warden kann, legt jedoch nahe, dass das GM-Modell möglicherweise ĂŒbertrainiert ist. MST-backbone wurde von einer topologischen Beziehung zwischen minimalen SpannbĂ€umen und phylogenetischen BĂ€umen inspiriert, die von Choi et al. 2011 eingefĂŒhrt wurde. Wir entdeckten, dass die topologische Beziehung nicht unbedingt Bestand hat, wenn es keinen eindeutigen minimalen Spannbaum gibt. Wir schlagen so genannte vertex-order-based MSTs (VMSTs) vor, die eine topologische Beziehung zu phylogenetischen BĂ€umen garantieren
LIPIcs, Volume 274, ESA 2023, Complete Volume
LIPIcs, Volume 274, ESA 2023, Complete Volum
LIPIcs, Volume 244, ESA 2022, Complete Volume
LIPIcs, Volume 244, ESA 2022, Complete Volum
Recommended from our members
Stochastic Network Design: Models and Scalable Algorithms
Many natural and social phenomena occur in networks. Examples include the spread of information, ideas, and opinions through a social network, the propagation of an infectious disease among people, and the spread of species within an interconnected habitat network. The ability to modify a phenomenon towards some desired outcomes has widely recognized benefits to our society and the economy. The outcome of a phenomenon is largely determined by the topology or properties of its underlying network. A decision maker can take management actions to modify a network and, therefore, change the outcome of the phenomenon. A management action is an activity that changes the topology or other properties of a network. For example, species that live in a small area may expand their population and gradually spread into an interconnected habitat network. However, human development of various structures such as highways and factories may destroy natural habitats or block paths connecting different habitat patches, which results in a population decline. To facilitate the dispersal of species and help the population recover, artificial corridors (e.g., a wildlife highway crossing) can be built to restore connectivity of isolated habitats, and conservation areas can be established to restore historical habitats of species, both of which are examples of management actions. The set of management actions that can be taken is restricted by a budget, so we must find cost-effective allocations of limited funding resources.
In the thesis, the problem of finding the (nearly) optimal set of management actions is formulated as a discrete and stochastic optimization problem. Specifically, a general decision-making framework called stochastic network design is defined to model a broad range of similar real-world problems. The framework is defined upon a stochastic network, in which edges are either present or absent with certain probabilities. It defines several metrics to measure the outcome of the underlying phenomenon and a set of management actions that modify the network or its parameters in specific ways. The goal is to select a subset of management actions, subject to a budget constraint, to maximize a specified metric.
The major contribution of the thesis is to develop scalable algorithms to find high- quality solutions for different problems within the framework. In general, these problems are NP-hard, and their objective functions are neither submodular nor super-modular. Existing algorithms, such as greedy algorithms and heuristic search algorithms, either lack theoretical guarantees or have limited scalability. In the thesis, fast approximate algorithms are developed under three different settings that are gradually more general. The most restricted setting is when a network is tree-structured. For this case, fully polynomial-time approximation schemes (FPTAS) are developed using dynamic programming algorithms and rounding techniques. A more general setting is when networks are general directed graphs. We use a sampling technique to convert the original stochastic optimization problem into a deterministic optimization problem and develop a primal-dual algorithm to solve it efficiently. In the previous two problem settings, the goal is to maximize connectivity of networks. In the most general setting, the goal is to maximize the number of nodes being connected and minimize the distance between these connected nodes. For example, we do not only want the species to reach a large number of habitat areas but also want them to be able to get there within a reasonable amount of time. The scalable algorithms for this setting combine a fast primal-dual algorithm and a sampling procedure.
Three real-world problems from the areas of computational sustainability and emergency response are used to evaluate these algorithms. They are the barrier removal problem aimed to determine which instream barriers to remove to help fish access their historical habitats in a river network, the spatial conservation planning problem to determine which habitat units to set as conservation areas to encourage the dispersal of endangered species in a landscape, and the pre-disaster preparation problem aimed to minimize the disruption of emergency medical services by natural disasters. In these three problems, the developed algorithms are much more scalable than the existing state-of-the-arts and produce high-quality solutions
Algorithms for Unit-Disk Graphs and Related Problems
In this dissertation, we study algorithms for several problems on unit-disk graphs and related problems. The unit-disk graph can be viewed as an intersection graph of a set of congruent disks. Unit-disk graphs have been extensively studied due to many of their applications, e.g., modeling the topology of wireless sensor networks. Lots of problems on unit-disk graphs have been considered in the literature, such as shortest paths, clique, independent set, distance oracle, diameter, etc. Specifically, we study the following problems in this dissertation: L1 shortest paths in unit-disk graphs, reverse shortest paths in unit-disk graphs, minimum bottleneck moving spanning tree, unit-disk range reporting, distance selection, etc. We develop efficient algorithms for these problems and our results are either first-known solutions or somehow improve the previous work.
Given a set P of n points in the plane and a parameter r \u3e 0, a unit-disk graph G(P) can be defined using P as its vertex set and two points of P are connected by an edge if the distance between these two points is at most r. The weight of an edge is one in the unweighted case and is equal to the distance between the two endpoints in the weighted case. Note that the distance between two points can be measured by different metrics, e.g., L1 or L2 metric.
In the first problem of L1 shortest paths in unit-disk graphs, we are given a point set P and a source point s â P, the problem is to find all shortest paths from s to all other vertices in the L1 weighted unit-disk graph defined on set P. We present an O(n log n) time algorithm, which matches the Ω(n log n)-time lower bound. In the second problem, we are given a set P of n points, parameters r, λ \u3e 0, and two points s and t of P, the goal is to compute the smallest r such that the shortest path length between s and t in the unit-disk graph with respect to set P and parameter r is at most λ. This problem can be defined in both unweighted and weighted cases. We propose an algorithm of O(âλâ · n log n) time and another algorithm of O(n5/4 log7/4 n) time for the unweighted case. We also given an O(n5/4 log5/2 n) time algorithm for the weighted case. In the third problem, we are given a set P of n points that are moving in the plane, the problem is to compute a spanning tree for these moving points that does not change its combinatorial structure during the point movement such that the bottleneck weight of the spanning tree (i.e., the largest Euclidean length of all edges) during the whole movement is minimized. We present an algorithm that runs in O(n4/3 log3 n) time. The fourth problem is unit-disk range reporting in which we are given a set P of n points in the plane and a value r, we need to construct a data structure so that given any query disk of radius r, all points of P in the disk can be reported efficiently. We build a data structure of O(n) space in O(n log n) time that can answer each query in O(k + log n) time, where k is the output size. The time complexity of our algorithm is the same as the previous result but our approach is much simpler. Finally, for the problem of distance selection, we are given a set P of n points in the plane and an integer 1 †k †(n2), the distance selection problem is to find the k-th smallest interpoint distance among all pairs of points of p. We propose an algorithm that runs in O(n4/3 log n) time. Our techniques yield two algorithmic frameworks for solving geometric optimization problems.
Many algorithms and techniques developed in this dissertation are quite general and fundamental, and we believe they will find other applications in future
Social informatics
5th International Conference, SocInfo 2013, Kyoto, Japan, November 25-27, 2013, Proceedings</p
Programming Languages and Systems
This open access book constitutes the proceedings of the 29th European Symposium on Programming, ESOP 2020, which was planned to take place in Dublin, Ireland, in April 2020, as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020. The actual ETAPS 2020 meeting was postponed due to the Corona pandemic. The papers deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems