88,965 research outputs found
An ILP solution for the gene duplication problem
<p>Abstract</p> <p>Background</p> <p>The gene duplication (GD) problem seeks a species tree that implies the fewest gene duplication events across a given collection of gene trees. Solving this problem makes it possible to use large gene families with complex histories of duplication and loss to infer phylogenetic trees. However, the GD problem is NP-hard, and therefore, most analyses use heuristics that lack any performance guarantee.</p> <p>Results</p> <p>We describe the first integer linear programming (ILP) formulation to solve instances of the gene duplication problem exactly. With simulations, we demonstrate that the ILP solution can solve problem instances with up to 14 taxa. Furthermore, we apply the new ILP solution to solve the gene duplication problem for the seed plant phylogeny using a 12-taxon, 6, 084-gene data set. The unique, optimal solution, which places Gnetales sister to the conifers, represents a new, large-scale genomic perspective on one of the most puzzling questions in plant systematics.</p> <p>Conclusions</p> <p>Although the GD problem is NP-hard, our novel ILP solution for it can solve instances with data sets consisting of as many as 14 taxa and 1, 000 genes in a few hours. These are the largest instances that have been solved to optimally to date. Thus, this work can provide large-scale genomic perspectives on phylogenetic questions that previously could only be addressed by heuristic estimates.</p
Reconstructing phylogenetic level-1 networks from nondense binet and trinet sets
Binets and trinets are phylogenetic networks with two and three leaves, respectively. Here we consider the problem of deciding if there exists a binary level-1 phylogenetic network displaying a given set T of binary binets or trinets over a taxon set X, and constructing such a network whenever it exists. We show that this is NP-hard for trinets but polynomial-time solvable for binets. Moreover, we show that the problem is still polynomial-time solvable for inputs consisting of binets and trinets as long as the cycles in the trinets have size three. Finally, we present an O(3^{|X|} poly(|X|)) time algorithm for general sets of binets and trinets. The latter two algorithms generalise to instances containing level-1 networks with arbitrarily many leaves, and thus provide some of the first supernetwork algorithms for computing networks from a set of rooted 1 phylogenetic networks
Empirical Evaluation of Real World Tournaments
Computational Social Choice (ComSoc) is a rapidly developing field at the
intersection of computer science, economics, social choice, and political
science. The study of tournaments is fundamental to ComSoc and many results
have been published about tournament solution sets and reasoning in
tournaments. Theoretical results in ComSoc tend to be worst case and tell us
little about performance in practice. To this end we detail some experiments on
tournaments using real wold data from soccer and tennis. We make three main
contributions to the understanding of tournaments using real world data from
English Premier League, the German Bundesliga, and the ATP World Tour: (1) we
find that the NP-hard question of finding a seeding for which a given team can
win a tournament is easily solvable in real world instances, (2) using detailed
and principled methodology from statistical physics we show that our real world
data obeys a log-normal distribution; and (3) leveraging our log-normal
distribution result and using robust statistical methods, we show that the
popular Condorcet Random (CR) tournament model does not generate realistic
tournament data.Comment: 2 Figure
Efficient routing of snow removal vehicles
This research addresses the problem of finding a minimum cost set of routes for vehicles in a road network subject to some constraints. Extensions, such as multiple service requirements, and mixed networks have been considered. Variations of this problem exist in many practical applications such as snow removal, refuse collection, mail delivery, etc. An exact algorithm was developed using integer programming to solve small size problems. Since the problem is NP-hard, a heuristic algorithm needs to be developed. An algorithm was developed based on the Greedy Randomized Adaptive Search Procedure (GRASP) heuristic, in which each replication consists of applying a construction heuristic to find feasible and good quality solutions, followed by a local search heuristic. A simulated annealing heuristic was developed to improve the solutions obtained from the construction heuristic. The best overall solution was selected from the results of several replications. The heuristic was tested on four sets of problem instances (total of 115 instances) obtained from the literature. The simulated annealing heuristic was able to achieve average improvements of up to 26.36% over the construction results on these problem instances. The results obtained with the developed heuristic were compared to the results obtained with recent heuristics developed by other authors. The developed heuristic improved the best-known solution found by other authors on 18 of the 115 instances and matched the results on 89 of those instances. It worked specially better with larger problems. The average deviations to known lower bounds for all four datasets were found to range between 0.21 and 2.61%
An Atypical Survey of Typical-Case Heuristic Algorithms
Heuristic approaches often do so well that they seem to pretty much always
give the right answer. How close can heuristic algorithms get to always giving
the right answer, without inducing seismic complexity-theoretic consequences?
This article first discusses how a series of results by Berman, Buhrman,
Hartmanis, Homer, Longpr\'{e}, Ogiwara, Sch\"{o}ening, and Watanabe, from the
early 1970s through the early 1990s, explicitly or implicitly limited how well
heuristic algorithms can do on NP-hard problems. In particular, many desirable
levels of heuristic success cannot be obtained unless severe, highly unlikely
complexity class collapses occur. Second, we survey work initiated by Goldreich
and Wigderson, who showed how under plausible assumptions deterministic
heuristics for randomized computation can achieve a very high frequency of
correctness. Finally, we consider formal ways in which theory can help explain
the effectiveness of heuristics that solve NP-hard problems in practice.Comment: This article is currently scheduled to appear in the December 2012
issue of SIGACT New
Stable Invitations
We consider the situation in which an organizer is trying to convene an
event, and needs to choose a subset of agents to be invited. Agents have
preferences over how many attendees should be at the event and possibly also
who the attendees should be. This induces a stability requirement: All invited
agents should prefer attending to not attending, and all the other agents
should not regret being not invited. The organizer's objective is to find the
invitation of maximum size subject to the stability requirement. We investigate
the computational complexity of finding the maximum stable invitation when all
agents are truthful, as well as the mechanism design problem when agents may
strategically misreport their preferences.Comment: To appear in COMSOC 201
Ising formulations of many NP problems
We provide Ising formulations for many NP-complete and NP-hard problems,
including all of Karp's 21 NP-complete problems. This collects and extends
mappings to the Ising model from partitioning, covering and satisfiability. In
each case, the required number of spins is at most cubic in the size of the
problem. This work may be useful in designing adiabatic quantum optimization
algorithms.Comment: 27 pages; v2: substantial revision to intro/conclusion, many more
references; v3: substantial revision and extension, to-be-published versio
- …