11 research outputs found

    Towards explaining the speed of kk-means

    Get PDF
    The kk-means method is a popular algorithm for clustering, known for its speed in practice. This stands in contrast to its exponential worst-case running-time. To explain the speed of the kk-means method, a smoothed analysis has been conducted. We sketch this smoothed analysis and a generalization to Bregman divergences

    Lower Bounds for the Average and Smoothed Number of Pareto Optima

    Get PDF
    Smoothed analysis of multiobjective 0-1 linear optimization has drawn considerable attention recently. The number of Pareto-optimal solutions (i.e., solutions with the property that no other solution is at least as good in all the coordinates and better in at least one) for multiobjective optimization problems is the central object of study. In this paper, we prove several lower bounds for the expected number of Pareto optima. Our basic result is a lower bound of \Omega_d(n^(d-1)) for optimization problems with d objectives and n variables under fairly general conditions on the distributions of the linear objectives. Our proof relates the problem of lower bounding the number of Pareto optima to results in geometry connected to arrangements of hyperplanes. We use our basic result to derive (1) To our knowledge, the first lower bound for natural multiobjective optimization problems. We illustrate this for the maximum spanning tree problem with randomly chosen edge weights. Our technique is sufficiently flexible to yield such lower bounds for other standard objective functions studied in this setting (such as, multiobjective shortest path, TSP tour, matching). (2) Smoothed lower bound of min {\Omega_d(n^(d-1.5) \phi^{(d-log d) (1-\Theta(1/\phi))}), 2^{\Theta(n)}}$ for the 0-1 knapsack problem with d profits for phi-semirandom distributions for a version of the knapsack problem. This improves the recent lower bound of Brunsch and Roeglin

    On smoothed analysis of quicksort and Hoare's find

    Get PDF
    We provide a smoothed analysis of Hoare's find algorithm, and we revisit the smoothed analysis of quicksort. Hoare's find algorithm - often called quickselect or one-sided quicksort - is an easy-to-implement algorithm for finding the k-th smallest element of a sequence. While the worst-case number of comparisons that Hoare’s find needs is Theta(n^2), the average-case number is Theta(n). We analyze what happens between these two extremes by providing a smoothed analysis. In the first perturbation model, an adversary specifies a sequence of n numbers of [0,1], and then, to each number of the sequence, we add a random number drawn independently from the interval [0,d]. We prove that Hoare's find needs Theta(n/(d+1) sqrt(n/d) + n) comparisons in expectation if the adversary may also specify the target element (even after seeing the perturbed sequence) and slightly fewer comparisons for finding the median. In the second perturbation model, each element is marked with a probability of p, and then a random permutation is applied to the marked elements. We prove that the expected number of comparisons to find the median is Omega((1−p)n/p log n). Finally, we provide lower bounds for the smoothed number of comparisons of quicksort and Hoare’s find for the median-of-three pivot rule, which usually yields faster algorithms than always selecting the first element: The pivot is the median of the first, middle, and last element of the sequence. We show that median-of-three does not yield a significant improvement over the classic rule

    Analysis of FPTASes for the Multi-Objective Shortest Path Problem

    Get PDF
    We propose a new FPTAS for the multi-objective shortest path problem. The algorithm uses elements from both an exact labeling algorithm and an FPTAS proposed by Tsaggouris and Zaroliagis (2009). We analyze the running times of these three algorithms both from a the- oretical and a computational point of view. Theoretically, we show that there are instances for which the new FPTAS runs an arbitrary times faster than the other two algorithms. Fur- thermore, for the bi-objective case, the number of approximate solutions generated by the proposed FPTAS is at most the number of Pareto-optimal solutions multiplied by the number of nodes. By performing a set of computational tests, we show that the new FPTAS performs best in terms of running ti

    The smoothed number of Pareto optimal solutions in bicriteria integer optimization

    No full text
    Abstract. A well established heuristic approach for solving various bicriteria optimization problems is to enumerate the set of Pareto optimal solutions, typically using some kind of dynamic programming approach. The heuristics following this principle are often successful in practice. Their running time, however, depends on the number of enumerated solutions, which can be exponential in the worst case. In this paper, we prove an almost tight bound on the expected number of Pareto optimal solutions for general bicriteria integer optimization problems in the framework of smoothed analysis. Our analysis is based on a semi-random input model in which an adversary can specify an input which is subsequently slightly perturbed at random, e. g., using a Gaussian or uniform distribution. Our results directly imply tight polynomial bounds on the expected running time of the Nemhauser/Ullmann heuristic for the 0/1 knapsack problem. Furthermore, we can significantly improve the known results on the running time of heuristics for the bounded knapsack problem and for the bicriteria shortest path problem. Finally, our results also enable us to improve and simplify the previously known analysis of the smoothed complexity of integer programming.

    Stochastic Tree Models for Macroevolution: Development, Validation and Application

    Get PDF
    Phylogenetic trees capture the relationships between species and can be investigated by morphological and/or molecular data. When focusing on macroevolution, one considers the large-scale history of life with evolutionary changes affecting a single species of the entire clade leading to the enormous diversity of species obtained today. One major problem of biology is the explanation of this biodiversity. Therefore, one may ask which kind of macroevolutionary processes have given rise to observable tree shapes or patterns of species distribution which refers to the appearance of branching orders and time periods. Thus, with an increasing number of known species in the context of phylogenetic studies, testing hypotheses about evolution by analyzing the tree shape of the resulting phylogenetic trees became matter of particular interest. The attention of using those reconstructed phylogenies for studying evolutionary processes increased during the last decades. Many paleontologists (Raup et al., 1973; Gould et al., 1977; Gilinsky and Good, 1989; Nee, 2004) tried to describe such patterns of macroevolution by using models for growing trees. Those models describe stochastic processes to generate phylogenetic trees. Yule (1925) was the first who introduced such a model, the Equal Rate Markov (ERM) model, in the context of biological branching based on a continuous-time, uneven branching process. In the last decades, further dynamical models were proposed (Yule, 1925; Aldous, 1996; Nee, 2006; Rosen, 1978; Ford, 2005; HernĂĄndez-GarcĂ­a et al., 2010) to address the investigation of tree shapes and hence, capture the rules of macroevolutionary forces. A common model, is the Aldous\\\'' Branching (AB) model, which is known for generating trees with a similar structure of \\\"real\\\" trees. To infer those macroevolutionary forces structures, estimated trees are analyzed and compared to simulated trees generated by models. There are a few drawbacks on recent models such as a missing biological motivation or the generated tree shape does not fit well to one observed in empirical trees. The central aim of this thesis is the development and study of new biologically motivated approaches which might help to better understand or even discover biological forces which lead to the huge diversity of organisms. The first approach, called age model, can be defined as a stochastic procedure which describes the growth of binary trees by an iterative stochastic attachment of leaves, similar to the ERM model. At difference with the latter, the branching rate at each clade is no longer constant, but decreasing in time, i.e., with the age. Thus, species involved in recent speciation events have a tendency to speciate again. The second introduced model, is a branching process which mimics the evolution of species driven by innovations. The process involves a separation of time scales. Rare innovation events trigger rapid cascades of diversification where a feature combines with previously existing features. The model is called innovation model. Three data sets of estimated phylogenetic trees are used to analyze and compare the produced tree shape of the new growth models. A tree shape statistic considering a variety of imbalance measurements is performed. Results show that simulated trees of both growth models fit well to the tree shape observed in real trees. In a further study, a likelihood analysis is performed in order to rank models with respect to their ability to explain observed tree shapes. Results show that the likelihoods of the age model and the AB model are clearly correlated under the trees in the databases when considering small and medium-sized trees with up to 19 leaves. For a data set, representing of phylogenetic trees of protein families, the age model outperforms the AB model. But for another data set, representing phylogenetic trees of species, the AB model performs slightly better. To support this observation a further analysis using larger trees is necessary. But an exact computation of likelihoods for large trees implies a huge computational effort. Therefore, an efficient method for likelihood estimation is proposed and compared to the estimation using a naive sampling strategy. Nevertheless, both models describe the tree generation process in a way which is easy to interpret biologically. Another interesting field of research in biology is the coevolution between species. This is the interaction of species across groups such that the evolution of a species from one group can be triggered by a species from another group. Most prominent examples are systems of host species and their associated parasites. One problem is the reconciliation of the common history of both groups of species and to predict the associations between ancestral hosts and their parasites. To solve this problem some algorithmic methods have been developed in recent years. But only a few host parasite systems have been analyzed in sufficient detail which makes an evaluation of these methods complex. Within the scope of coevolution, the proposed age model is applied to the generation of cophylogenies to evaluate such host parasite reconciliation methods. The presented age model as well as the innovation model produce tree shapes which are similar to obtained tree structures of estimated trees. Both models describe an evolutionary dynamics and might provide a further opportunity to infer macroevolutionary processes which lead to the biodiversity which can be obtained today. Furthermore with the application of the age model in the context of coevolution by generating a useful benchmark set of cophylogenies is a first step towards systematic studies on evaluating reconciliation methods

    Smoothed Analysis of Selected Optimization Problems and Algorithms

    Get PDF
    Optimization problems arise in almost every field of economics, engineering, and science. Many of these problems are well-understood in theory and sophisticated algorithms exist to solve them efficiently in practice. Unfortunately, in many cases the theoretically most efficient algorithms perform poorly in practice. On the other hand, some algorithms are much faster than theory predicts. This discrepancy is a consequence of the pessimism inherent in the framework of worst-case analysis, the predominant analysis concept in theoretical computer science. We study selected optimization problems and algorithms in the framework of smoothed analysis in order to narrow the gap between theory and practice. In smoothed analysis, an adversary specifies the input, which is subsequently slightly perturbed at random. As one example we consider the successive shortest path algorithm for the minimumcost flow problem. While in the worst case the successive shortest path algorithm takes exponentially many steps to compute a minimum-cost flow, we show that its running time is polynomial in the smoothed setting. Another problem studied in this thesis is makespan minimization for scheduling with related machines. It seems to be unlikely that there exist fast algorithms to solve this problem exactly. This is why we consider three approximation algorithms: the jump algorithm, the lex-jump algorithm, and the list scheduling algorithm. In the worst case, the approximation guarantees of these algorithms depend on the number of machines. We show that there is no such dependence in smoothed analysis. We also apply smoothed analysis to multicriteria optimization problems. In particular, we consider integer optimization problems with several linear objectives that have to be simultaneously minimized. We derive a polynomial upper bound for the size of the set of Pareto-optimal solutions contrasting the exponential worst-case lower bound. As the icing on the cake we find that the insights gained from our smoothed analysis of the running time of the successive shortest path algorithm lead to the design of a randomized algorithm for finding short paths between two given vertices of a polyhedron. We see this result as an indication that, in future, smoothed analysis might also result in the development of fast algorithms.Optimierungsprobleme treten in allen wirtschaftlichen, naturwissenschaftlichen und technischen Gebieten auf. Viele dieser Probleme sind ausfĂŒhrlich untersucht und aus praktischer Sicht effizient lösbar. Leider erweisen sich in vielen FĂ€llen die theoretisch effizientesten Algorithmen in der Praxis als ungeeignet. Auf der anderen Seite sind einige Algorithmen viel schneller als die Theorie vorhersagt. Dieser scheinbare Widerspruch resultiert aus dem Pessimismus, der dem in der theoretischen Informatik vorherrschenden Analysekonzept, der Worst-Case-Analyse, innewohnt. Um die LĂŒcke zwischen Theorie und Praxis zu verkleinern, untersuchen wir ausgewĂ€hlte Optimierungsprobleme und Algorithmen auf gegnerisch vorgegebenen Instanzen, die durch ein leichtes Zufallsrauschen gestört werden. Solche perturbierten Instanzen bezeichnen wir als semi-zufĂ€llige Eingaben. Als Beispiel betrachten wir den Successive- Shortest-Path-Algorithmus fĂŒr das Minimum-Cost-Flow-Problem. WĂ€hrend dieser Algorithmus imWorst Case exponentiell viele Schritte benötigt, um einen Minimum-Cost-Flow zu berechnen, zeigen wir, dass seine Laufzeit auf semi-zufĂ€lligen Eingaben polynomiell ist. Ein weiteres Problem, das wir in dieser Arbeit untersuchen, ist die Minimierung des Makespans fĂŒr Scheduling auf unterschiedlich schnellen Maschinen. Es scheint, dass dieses Problem nicht effizient gelöst werden kann. Daher betrachten wir drei Approximationsalgorithmen: den Jump-, den Lex-Jump- und den List-Scheduling-Algorithmus. Im Worst Case hĂ€ngt die ApproximationsgĂŒte dieser Algorithmen von der Anzahl der Maschinen ab. Wir zeigen, dass das auf semi-zufĂ€lligen Eingaben nicht der Fall ist. Des Weiteren betrachten wir ganzzahlige Optimierungsprobleme mit mehreren linearen Zielfunktionen, die simultan minimiert werden sollen. Wir leiten eine polynomielle obere Schranke fĂŒr die GrĂ¶ĂŸe der Pareto-Menge auf semi-zufĂ€lligen Eingaben her, die im Gegensatz zu der exponentiellen unteren Worst-Case-Schranke steht. Mit den Erkenntnissen aus der Laufzeitanalyse des Successive-Shortest-Path-Algorithmus entwerfen wir einen randomisierten Algorithmus zur Bestimmung eines kurzen Pfades zwischen zwei gegebenen Ecken eines Polyeders. Wir betrachten dieses Ergebnis als ein Indiz dafĂŒr, dass in Zukunft Analysen auf semi-zufĂ€lligen Eingaben auch zu der Entwicklung schneller Algorithmen fĂŒhren könnten
    corecore