36 research outputs found

    An evolution strategy approach for the balanced minimum evolution problem

    Get PDF
    Motivation: The Balanced Minimum Evolution (BME) is a powerful distance based phylogenetic estimation model introduced by Desper and Gascuel and nowadays implemented in popular tools for phylogenetic analyses. It was proven to be computationally less demanding than more sophisticated estimation methods, e.g. maximum likelihood or Bayesian inference while preserving the statistical consistency and the ability to run with almost any kind of data for which a dissimilarity measure is available. BME can be stated in terms of a nonlinear non-convex combinatorial optimization problem, usually referred to as the Balanced Minimum Evolution Problem (BMEP). Currently, the state-of-the-art among approximate methods for the BMEP is represented by FastME (version 2.0), a software which implements several deterministic phylogenetic construction heuristics combined with a local search on specific neighbourhoods derived by classical topological tree rearrangements. These combinations, however, may not guarantee convergence to close-to-optimal solutions to the problem due to the lack of solution space exploration, a phenomenon which is exacerbated when tackling molecular datasets characterized by a large number of taxa. Results: To overcome such convergence issues, in this article, we propose a novel metaheuristic, named PhyloES, which exploits the combination of an exploration phase based on Evolution Strategies, a special type of evolutionary algorithm, with a refinement phase based on two local search algorithms. Extensive computational experiments show that PhyloES consistently outperforms FastME, especially when tackling larger datasets, providing solutions characterized by a shorter tree length but also significantly different from the topological perspective

    On the optimality of the neighbor-joining algorithm

    Get PDF
    The popular neighbor-joining (NJ) algorithm used in phylogenetics is a greedy algorithm for finding the balanced minimum evolution (BME) tree associated to a dissimilarity map. From this point of view, NJ is ``optimal'' when the algorithm outputs the tree which minimizes the balanced minimum evolution criterion. We use the fact that the NJ tree topology and the BME tree topology are determined by polyhedral subdivisions of the spaces of dissimilarity maps R+(n2){\R}_{+}^{n \choose 2} to study the optimality of the neighbor-joining algorithm. In particular, we investigate and compare the polyhedral subdivisions for n≤8n \leq 8. A key requirement is the measurement of volumes of spherical polytopes in high dimension, which we obtain using a combination of Monte Carlo methods and polyhedral algorithms. We show that highly unrelated trees can be co-optimal in BME reconstruction, and that NJ regions are not convex. We obtain the l2l_2 radius for neighbor-joining for n=5n=5 and we conjecture that the ability of the neighbor-joining algorithm to recover the BME tree depends on the diameter of the BME tree

    UPGMA and the normalized equidistant minimum evolution problem

    Get PDF
    UPGMA (Unweighted Pair Group Method with Arithmetic Mean) is a widely used clustering method. Here we show that UPGMA is a greedy heuristic for the normalized equidistant minimum evolution (NEME) problem, that is, finding a rooted tree that minimizes the minimum evolution score relative to the dissimilarity matrix among all rooted trees with the same leaf-set in which all leaves have the same distance to the root. We prove that the NEME problem is NP-hard. In addition, we present some heuristic and approximation algorithms for solving the NEME problem, including a polynomial time algorithm that yields a binary, rooted tree whose NEME score is within O(log2n) of the optimum

    Compact mixed integer linear programming models to the Minimum Weighted Tree Reconstruction problem

    Get PDF
    The Minimum Weighted Tree Reconstruction (MWTR) problem consists of finding a minimum length weighted tree connecting a set of terminal nodes in such a way that the length of the path between each pair of terminal nodes is greater than or equal to a given distance between the considered pair of terminal nodes. This problem has applications in several areas, namely, the inference of phylogenetic trees, the modeling of traffic networks and the analysis of internet infrastructures. In this paper, we investigate the MWTR problem and we present two compact mixed-integer linear programming models to solve the problem. Computational results using two different sets of instances, one from the phylogenetic area and another from the telecommunications area, show that the best of the two models is able to solve instances of the problem having up to 15 terminal nodes

    Computing Phylogenetic Trees Using Topologically Related Minimum Spanning Trees

    No full text

    From trees to networks and back

    Get PDF
    The evolutionary history of a set of species is commonly represented by a phylogenetic tree. Often, however, the data contain conflicting signals, which can be better represented by a more general structure, namely a phylogenetic network. Such networks allow the display of several alternative evolutionary scenarios simultaneously but this can come at the price of complex visual representations. Using so-called circular split networks reduces this complexity, because this type of network can always be visualized in the plane without any crossing edges. These circular split networks form the core of this thesis. We construct them, use them as a search space for minimum evolution trees and explore their properties. More specifically, we present a new method, called SuperQ, to construct a circular split network summarising a collection of phylogenetic trees that have overlapping leaf sets. Then, we explore the set of phylogenetic trees associated with a �fixed circular split network, in particular using it as a search space for optimal trees. This set represents just a tiny fraction of the space of all phylogenetic trees, but we still �find trees within it that compare quite favourably with those obtained by a leading heuristic, which uses tree edit operations for searching the whole tree space. In the last part, we advance our understanding of the set of phylogenetic trees associated with a circular split network. Specifically, we investigate the size of the so-called circular tree neighbourhood for the three tree edit operations, tree bisection and reconnection (tbr), subtree prune and regraft (spr) and nearest neighbour interchange (nni)

    Inferenza filogenetica con il metodo della minima evoluzione bilanciata

    Get PDF
    La filogenesi molecolare è una branca della biologia che si occupa di determinare gerarchicamente le relazioni evolutive fra organismi (taxa) con applicazioni epidemiologiche e di dinamica delle popolazioni. In particolare il problema di Inferenza Filogenetica interessa, a partire da percentuali di DNA dissimili, il processo evolutivo a partire da un antenato comune delle specie in analisi. Nel capitolo 2 si vedra' un modello di tale processo ; nel captilo 3 diverse possibili istanze del problema, tra cui quella della Minima Evoluzione ; nel capitolo 4 la variante del criterio della Minima Evoluzione detta Bilanciata, con i relativi vantaggi biologici, computazionali e statistici ; nel capitolo 5 un algoritmo risolutivo esatto e una formulazione ILP per la variante bilanciat

    Descoberta da topologia de rede

    Get PDF
    Doutoramento em MatemáticaA monitorização e avaliação do desempenho de uma rede são essenciais para detetar e resolver falhas no seu funcionamento. De modo a conseguir efetuar essa monitorização, e essencial conhecer a topologia da rede, que muitas vezes e desconhecida. Muitas das técnicas usadas para a descoberta da topologia requerem a cooperação de todos os dispositivos de rede, o que devido a questões e políticas de segurança e quase impossível de acontecer. Torna-se assim necessário utilizar técnicas que recolham, passivamente e sem a cooperação de dispositivos intermédios, informação que permita a inferência da topologia da rede. Isto pode ser feito recorrendo a técnicas de tomografia, que usam medições extremo-a-extremo, tais como o atraso sofrido pelos pacotes. Nesta tese usamos métodos de programação linear inteira para resolver o problema de inferir uma topologia de rede usando apenas medições extremo-a-extremo. Apresentamos duas formulações compactas de programação linear inteira mista (MILP) para resolver o problema. Resultados computacionais mostraram que a medida que o número de dispositivos terminais cresce, o tempo que as duas formulações MILP compactas necessitam para resolver o problema, também cresce rapidamente. Consequentemente, elaborámos duas heurísticas com base nos métodos Feasibility Pump e Local ranching. Uma vez que as medidas de atraso têm erros associados, desenvolvemos duas abordagens robustas, um para controlar o número máximo de desvios e outra para reduzir o risco de custo alto. Criámos ainda um sistema que mede os atrasos de pacotes entre computadores de uma rede e apresenta a topologia dessa rede.Monitoring and evaluating the performance of a network is essential to detect and resolve network failures. In order to achieve this monitoring level, it is essential to know the topology of the network which is often unknown. Many of the techniques used to discover the topology require the cooperation of all network devices, which is almost impossible due to security and policy issues. It is therefore, necessary to use techniques that collect, passively and without the cooperation of intermediate devices, the necessary information to allow the inference of the network topology. This can be done using tomography techniques, which use end-to-end measurements, such as the packet delays. In this thesis, we used some integer linear programming theory and methods to solve the problem of inferring a network topology using only end-to-end measurements. We present two compact mixed integer linear programming (MILP) formulations to solve the problem. Computational results showed that as the number of end-devices grows, the time need by the two compact MILP formulations to solve the problem also grows rapidly. Therefore, we elaborate two heuristics based on the Feasibility Pump and Local Branching method. Since the packet delay measurements have some errors associated, we developed two robust approaches, one to control the maximum number of deviations and the other to reduce the risk of high cost. We also created a system that measures the packet delays between computers on a network and displays the topology of that network
    corecore