2,815 research outputs found

    On the tradeoff between stability and fit

    Get PDF
    In computing, as in many aspects of life, changes incur cost. Many optimization problems are formulated as a one-time instance starting from scratch. However, a common case that arises is when we already have a set of prior assignments and must decide how to respond to a new set of constraints, given that each change from the current assignment comes at a price. That is, we would like to maximize the fitness or efficiency of our system, but we need to balance it with the changeout cost from the previous state. We provide a precise formulation for this tradeoff and analyze the resulting stable extensions of some fundamental problems in measurement and analytics. Our main technical contribution is a stable extension of Probability Proportional to Size (PPS) weighted random sampling, with applications to monitoring and anomaly detection problems. We also provide a general framework that applies to top-k, minimum spanning tree, and assignment. In both cases, we are able to provide exact solutions and discuss efficient incremental algorithms that can find new solutions as the input changes

    Scalable Parameterised Algorithms for two Steiner Problems

    Get PDF
    In the Steiner Problem, we are given as input (i) a connected graph with nonnegative integer weights associated with the edges; and (ii) a subset of vertices called terminals. The task is to find a minimum-weight subgraph connecting all the terminals. In the Group Steiner Problem, we are given as input (i) a connected graph with nonnegative integer weights associated with the edges; and (ii) a collection of subsets of vertices called groups. The task is to find a minimum-weight subgraph that contains at least one vertex from each group. Even though the Steiner Problem and the Group Steiner Problem are NP-complete, they are known to admit parameterised algorithms that run in linear time in the size of the input graph and the exponential part can be restricted to the number of terminals and the number of groups, respectively. In this thesis, we discuss two parameterised algorithms for solving the Steiner Problem, and by reduction, the Group Steiner Problem: (a) a dynamic programming algorithm presented by Dreyfus and Wagner in 1971; and (b) an improvement of the Dreyfus-Wagner algorithm presented by Erickson, Monma and Veinott in 1987 that runs in linear time in the size of the input graph. We develop a parallel implementation of the Erickson-Monma-Veinott algorithm, and carry out extensive experiments to study the scalability of our implementation with respect to its runtime, memory bandwidth, and memory usage. Our experimental results demonstrate that the implementation can scale up to a billion edges on a single modern compute node provided that the number of terminals is small. For example, using our parallel implementation a Steiner tree for a graph with hundred million edges and ten terminals can be found in approximately twenty minutes. For an input graph with one hundred million edges and ten terminals, our parallel implementation is at least fifteen times faster than its serial counterpart on a Haswell compute node with two processors and twelve cores in each processor. Our implementation of the Erickson-Monma-Veinott algorithm is available as open source

    Structure-Aware Sampling: Flexible and Accurate Summarization

    Full text link
    In processing large quantities of data, a fundamental problem is to obtain a summary which supports approximate query answering. Random sampling yields flexible summaries which naturally support subset-sum queries with unbiased estimators and well-understood confidence bounds. Classic sample-based summaries, however, are designed for arbitrary subset queries and are oblivious to the structure in the set of keys. The particular structure, such as hierarchy, order, or product space (multi-dimensional), makes range queries much more relevant for most analysis of the data. Dedicated summarization algorithms for range-sum queries have also been extensively studied. They can outperform existing sampling schemes in terms of accuracy on range queries per summary size. Their accuracy, however, rapidly degrades when, as is often the case, the query spans multiple ranges. They are also less flexible - being targeted for range sum queries alone - and are often quite costly to build and use. In this paper we propose and evaluate variance optimal sampling schemes that are structure-aware. These summaries improve over the accuracy of existing structure-oblivious sampling schemes on range queries while retaining the benefits of sample-based summaries: flexible summaries, with high accuracy on both range queries and arbitrary subset queries

    A Sidetrack-Based Algorithm for Finding the k Shortest Simple Paths in a Directed Graph

    Get PDF
    We present an algorithm for the k shortest simple path problem on weighted directed graphs (kSSP) that is based on Eppstein's algorithm for a similar problem in which paths are allowed to contain cycles. In contrast to most other algorithms for kSSP, ours is not based on Yen's algorithm and does not solve replacement path problems. Its worst-case running time is on par with state-of-the-art algorithms for kSSP. Using our algorithm, one may find O(m) simple paths with a single shortest path tree computation and O(n + m) additional time per path in well-behaved cases, where n is the number of nodes and m is the number of edges. Our computational results show that on random graphs and large road networks, these well-behaved cases are quite common and our algorithm is faster than existing algorithms by an order of magnitude. Further, the running time is far better predictable due to very small dispersion

    Two Combinatorial Optimization Problems at the Interface of Computer Science and Operations Research

    Get PDF
    Solving large combinatorial optimization problems is a ubiquitous task across multiple disciplines. Developing efficient procedures for solving these problems has been of great interest to both researchers and practitioners. Over the last half century, vast amounts of research have been devoted to studying various methods in tackling these problems. These methods can be divided into two categories, heuristic methods and exact algorithms. Heuristic methods can often lead to near optimal solutions in a relatively time efficient manner, but provide no guarantees on optimality. Exact algorithms guarantee optimality, but are often very time consuming. This dissertation focuses on designing efficient exact algorithms that can solve larger problem instances with faster computational time. A general framework for an exact algorithm, called the Branch, Bound, and Remember algorithm, is proposed in this dissertation. Three variations of single machine scheduling problems are presented and used to evaluate the efficiency of the Branch, Bound, and Remember algorithm. The computational results show that the Branch, Bound, and Remember algorithms outperforms the best known algorithms in the literature. While the Branch, Bound, and Remember algorithm can be used for solving combinatorial optimization problems, it does not address the subject of post-optimality selection after the combinatorial optimization problem is solved. Post-optimality selection is a common problem in multi-objective combinatorial optimization problems where there exists a set of optimal solutions called Pareto optimal (non-dominated) solutions. Post-optimality selection is the process of selecting the best solutions within the Pareto optimal solution set. In many real-world applications, a Pareto solution set (either optimal or near-optimal) can be extremely large, and can be very challenging for a decision maker to evaluate and select the best solution. To address the post-optimality selection problem, this dissertation also proposes a new discrete optimization problem to help the decision-maker to obtain an optimal preferred subset of Pareto optimal solutions. This discrete optimization problem is proven to be NP-hard. To solve this problem, exact algorithms and heuristic methods are presented. Different multi-objective problems with various numbers of objectives and constraints are used to compare the performances of the proposed algorithms and heuristics
    • …
    corecore