36 research outputs found
An evolution strategy approach for the balanced minimum evolution problem
Motivation: The Balanced Minimum Evolution (BME) is a powerful distance based phylogenetic estimation model introduced by Desper and Gascuel and nowadays implemented in popular tools for phylogenetic analyses. It was proven to be computationally less demanding than more sophisticated estimation methods, e.g. maximum likelihood or Bayesian inference while preserving the statistical consistency and the ability to run with almost any kind of data for which a dissimilarity measure is available. BME can be stated in terms of a nonlinear non-convex combinatorial optimization problem, usually referred to as the Balanced Minimum Evolution Problem (BMEP). Currently, the state-of-the-art among approximate methods for the BMEP is represented by FastME (version 2.0), a software which implements several deterministic phylogenetic construction heuristics combined with a local search on specific neighbourhoods derived by classical topological tree rearrangements. These combinations, however, may not guarantee convergence to close-to-optimal solutions to the problem due to the lack of solution space exploration, a phenomenon which is exacerbated when tackling molecular datasets characterized by a large number of taxa. Results: To overcome such convergence issues, in this article, we propose a novel metaheuristic, named PhyloES, which exploits the combination of an exploration phase based on Evolution Strategies, a special type of evolutionary algorithm, with a refinement phase based on two local search algorithms. Extensive computational experiments show that PhyloES consistently outperforms FastME, especially when tackling larger datasets, providing solutions characterized by a shorter tree length but also significantly different from the topological perspective
On the optimality of the neighbor-joining algorithm
The popular neighbor-joining (NJ) algorithm used in phylogenetics is a greedy
algorithm for finding the balanced minimum evolution (BME) tree associated to a
dissimilarity map. From this point of view, NJ is ``optimal'' when the
algorithm outputs the tree which minimizes the balanced minimum evolution
criterion. We use the fact that the NJ tree topology and the BME tree topology
are determined by polyhedral subdivisions of the spaces of dissimilarity maps
to study the optimality of the neighbor-joining
algorithm. In particular, we investigate and compare the polyhedral
subdivisions for . A key requirement is the measurement of volumes of
spherical polytopes in high dimension, which we obtain using a combination of
Monte Carlo methods and polyhedral algorithms. We show that highly unrelated
trees can be co-optimal in BME reconstruction, and that NJ regions are not
convex. We obtain the radius for neighbor-joining for and we
conjecture that the ability of the neighbor-joining algorithm to recover the
BME tree depends on the diameter of the BME tree
UPGMA and the normalized equidistant minimum evolution problem
UPGMA (Unweighted Pair Group Method with Arithmetic Mean) is a widely used clustering method. Here we show that UPGMA is a greedy heuristic for the normalized equidistant minimum evolution (NEME) problem, that is, finding a rooted tree that minimizes the minimum evolution score relative to the dissimilarity matrix among all rooted trees with the same leaf-set in which all leaves have the same distance to the root. We prove that the NEME problem is NP-hard. In addition, we present some heuristic and approximation algorithms for solving the NEME problem, including a polynomial time algorithm that yields a binary, rooted tree whose NEME score is within O(log2n) of the optimum
Compact mixed integer linear programming models to the Minimum Weighted Tree Reconstruction problem
The Minimum Weighted Tree Reconstruction (MWTR) problem consists of finding a minimum length weighted tree connecting a set of terminal nodes in such a way that the length of the path between each pair of terminal nodes is greater than or equal to a given distance between the considered pair of terminal nodes. This problem has applications in several areas, namely, the inference of phylogenetic trees, the modeling of traffic networks and the analysis of internet infrastructures. In this paper, we investigate the MWTR problem and we present two compact mixed-integer linear programming models to solve the problem. Computational results using two different sets of instances, one from the phylogenetic area and another from the telecommunications area, show that the best of the two models is able to solve instances of the problem having up to 15 terminal nodes
From trees to networks and back
The evolutionary history of a set of species is commonly represented by a phylogenetic tree. Often, however, the data contain conflicting signals, which can be better represented by a more general structure, namely a phylogenetic network. Such networks allow the display of
several alternative evolutionary scenarios simultaneously but this can come at the price of complex visual representations. Using so-called circular split networks reduces this complexity, because this type of network can always be visualized in the plane without any crossing
edges. These circular split networks form the core of this thesis. We construct them, use them as a search space for minimum evolution trees and explore their properties.
More specifically, we present a new method, called SuperQ, to construct a circular split network summarising a collection of phylogenetic trees that have overlapping leaf sets. Then, we explore the set of phylogenetic trees associated with a �fixed circular split network, in particular using it as a search space for optimal trees. This set
represents just a tiny fraction of the space of all phylogenetic trees, but we still �find trees within it that compare quite favourably with those obtained by a leading heuristic, which uses tree edit operations for searching the whole tree space. In the last part, we advance our
understanding of the set of phylogenetic trees associated with a circular split network. Specifically, we investigate the size of the so-called circular tree neighbourhood for the three tree edit operations, tree bisection and reconnection (tbr), subtree prune and regraft (spr) and nearest neighbour interchange (nni)
Inferenza filogenetica con il metodo della minima evoluzione bilanciata
La filogenesi molecolare è una branca della biologia che si occupa di
determinare gerarchicamente le relazioni evolutive fra organismi (taxa) con applicazioni epidemiologiche e di dinamica delle popolazioni. In particolare il problema di Inferenza Filogenetica interessa, a partire da percentuali di DNA dissimili, il processo evolutivo a partire da un antenato comune delle specie in analisi. Nel capitolo 2 si vedra' un modello di tale processo ; nel captilo 3 diverse possibili istanze del problema, tra cui quella della Minima Evoluzione ; nel capitolo 4 la variante del criterio della Minima Evoluzione detta Bilanciata, con i relativi vantaggi biologici, computazionali e statistici ; nel capitolo 5 un algoritmo risolutivo esatto e una formulazione ILP per la variante bilanciat
Descoberta da topologia de rede
Doutoramento em MatemáticaA monitorização e avaliação do desempenho de uma rede são essenciais
para detetar e resolver falhas no seu funcionamento. De modo a
conseguir efetuar essa monitorização, e essencial conhecer a topologia
da rede, que muitas vezes e desconhecida. Muitas das técnicas usadas
para a descoberta da topologia requerem a cooperação de todos os
dispositivos de rede, o que devido a questões e polÃticas de segurança
e quase impossÃvel de acontecer. Torna-se assim necessário utilizar
técnicas que recolham, passivamente e sem a cooperação de dispositivos
intermédios, informação que permita a inferência da topologia
da rede. Isto pode ser feito recorrendo a técnicas de tomografia, que
usam medições extremo-a-extremo, tais como o atraso sofrido pelos
pacotes.
Nesta tese usamos métodos de programação linear inteira para resolver
o problema de inferir uma topologia de rede usando apenas medições
extremo-a-extremo. Apresentamos duas formulações compactas de
programação linear inteira mista (MILP) para resolver o problema.
Resultados computacionais mostraram que a medida que o número de
dispositivos terminais cresce, o tempo que as duas formulações MILP
compactas necessitam para resolver o problema, também cresce rapidamente.
Consequentemente, elaborámos duas heurÃsticas com base
nos métodos Feasibility Pump e Local ranching. Uma vez que as medidas
de atraso têm erros associados, desenvolvemos duas abordagens
robustas, um para controlar o número máximo de desvios e outra para
reduzir o risco de custo alto. Criámos ainda um sistema que mede
os atrasos de pacotes entre computadores de uma rede e apresenta a
topologia dessa rede.Monitoring and evaluating the performance of a network is essential
to detect and resolve network failures. In order to achieve this monitoring
level, it is essential to know the topology of the network which
is often unknown. Many of the techniques used to discover the topology
require the cooperation of all network devices, which is almost
impossible due to security and policy issues. It is therefore, necessary
to use techniques that collect, passively and without the cooperation
of intermediate devices, the necessary information to allow the inference
of the network topology. This can be done using tomography
techniques, which use end-to-end measurements, such as the packet
delays.
In this thesis, we used some integer linear programming theory and
methods to solve the problem of inferring a network topology using
only end-to-end measurements. We present two compact mixed integer
linear programming (MILP) formulations to solve the problem. Computational
results showed that as the number of end-devices grows, the
time need by the two compact MILP formulations to solve the problem
also grows rapidly. Therefore, we elaborate two heuristics based on the
Feasibility Pump and Local Branching method. Since the packet delay
measurements have some errors associated, we developed two robust
approaches, one to control the maximum number of deviations and
the other to reduce the risk of high cost. We also created a system
that measures the packet delays between computers on a network and
displays the topology of that network