102 research outputs found

    Computing Maximum Agreement Forests without Cluster Partitioning is Folly

    Get PDF
    Computing a maximum (acyclic) agreement forest (M(A)AF) of a pair of phylogenetic trees is known to be fixed-parameter tractable; the two main techniques are kernelization and depth-bounded search. In theory, kernelization-based algorithms for this problem are not competitive, but they perform remarkably well in practice. We shed light on why this is the case. Our results show that, probably unsurprisingly, the kernel is often much smaller in practice than the theoretical worst case, but not small enough to fully explain the good performance of these algorithms. The key to performance is cluster partitioning, a technique used in almost all fast M(A)AF algorithms. In theory, cluster partitioning does not help: some instances are highly clusterable, others not at all. However, our experiments show that cluster partitioning leads to substantial performance improvements for kernelization-based M(A)AF algorithms. In contrast, kernelizing the individual clusters before solving them using exponential search yields only very modest performance improvements or even hurts performance; for the vast majority of inputs, kernelization leads to no reduction in the maximal cluster size at all. The choice of the algorithm applied to solve individual clusters also significantly impacts performance, even though our limited experiment to evaluate this produced no clear winner; depth-bounded search, exponential search interleaved with kernelization, and an ILP-based algorithm all achieved competitive performance

    QuPARA: Query-Driven Large-Scale Portfolio Aggregate Risk Analysis on MapReduce

    Full text link
    Stochastic simulation techniques are used for portfolio risk analysis. Risk portfolios may consist of thousands of reinsurance contracts covering millions of insured locations. To quantify risk each portfolio must be evaluated in up to a million simulation trials, each capturing a different possible sequence of catastrophic events over the course of a contractual year. In this paper, we explore the design of a flexible framework for portfolio risk analysis that facilitates answering a rich variety of catastrophic risk queries. Rather than aggregating simulation data in order to produce a small set of high-level risk metrics efficiently (as is often done in production risk management systems), the focus here is on allowing the user to pose queries on unaggregated or partially aggregated data. The goal is to provide a flexible framework that can be used by analysts to answer a wide variety of unanticipated but natural ad hoc queries. Such detailed queries can help actuaries or underwriters to better understand the multiple dimensions (e.g., spatial correlation, seasonality, peril features, construction features, and financial terms) that can impact portfolio risk. We implemented a prototype system, called QuPARA (Query-Driven Large-Scale Portfolio Aggregate Risk Analysis), using Hadoop, which is Apache's implementation of the MapReduce paradigm. This allows the user to take advantage of large parallel compute servers in order to answer ad hoc risk analysis queries efficiently even on very large data sets typically encountered in practice. We describe the design and implementation of QuPARA and present experimental results that demonstrate its feasibility. A full portfolio risk analysis run consisting of a 1,000,000 trial simulation, with 1,000 events per trial, and 3,200 risk transfer contracts can be completed on a 16-node Hadoop cluster in just over 20 minutes.Comment: 9 pages, IEEE International Conference on Big Data (BigData), Santa Clara, USA, 201

    Cache-Oblivious Data Structures and Algorithms for Undirected Breadth-First Search and Shortest Paths

    Get PDF
    We present improved cache-oblivious data structures and algorithms for breadth-first search (BFS) on undirected graphs and the single-source shortest path (SSSP) problem on undirected graphs with non-negative edge weights. For the SSSP problem, our result closes the performance gap between the currently best cache-aware algorithm and the cache-oblivious counterpart. Our cache-oblivious SSSP-algorithm takes nearly full advantage of block transfers for dense graphs. The algorithm relies on a new data structure, called bucket heap, which is the first cache-oblivious priority queue to efficiently support a weak DECREASEKEY operation. For the BFS problem, we reduce the number of I/Os for sparse graphs by a factor of nearly sqrt{B}, where B is the cache-block size, nearly closing the performance gap between the currently best cache-aware and cache-oblivious algorithms

    Another virtue of wavelet forests?

    Full text link
    A wavelet forest for a text T[1..n]T [1..n] over an alphabet σ\sigma takes nH0(T)+o(nlogσ)n H_0 (T) + o (n \log \sigma) bits of space and supports access and rank on TT in O(logσ)O (\log \sigma) time. K\"arkk\"ainen and Puglisi (2011) implicitly introduced wavelet forests and showed that when TT is the Burrows-Wheeler Transform (BWT) of a string SS, then a wavelet forest for TT occupies space bounded in terms of higher-order empirical entropies of SS even when the forest is implemented with uncompressed bitvectors. In this paper we show experimentally that wavelet forests also have better access locality than wavelet trees and are thus interesting even when higher-order compression is not effective on SS, or when TT is not a BWT at all

    Geometric spanners with small chromatic number

    Get PDF
    AbstractGiven an integer k⩾2, we consider the problem of computing the smallest real number t(k) such that for each set P of points in the plane, there exists a t(k)-spanner for P that has chromatic number at most k. We prove that t(2)=3, t(3)=2, t(4)=2, and give upper and lower bounds on t(k) for k>4. We also show that for any ϵ>0, there exists a (1+ϵ)t(k)-spanner for P that has O(|P|) edges and chromatic number at most k. Finally, we consider an on-line variant of the problem where the points of P are given one after another, and the color of a point must be assigned at the moment the point is given. In this setting, we prove that t(2)=3, t(3)=1+3, t(4)=1+2, and give upper and lower bounds on t(k) for k>4

    I/O-Efficient Planar Separators and Applications

    No full text
    We present a new algorithm to compute a subset S of vertices of a planar graph G whose removal partitions G into O(N/h) subgraphs of size O(h) and with boundary size O( p h) each. The size of S is O(N= p h). Computing S takes O(sort(N)) I/Os and linear space, provided that M 56hlog² B. Together with recent reducibility results, this leads to O(sort(N)) I/O algorithms for breadth-first search (BFS), depth-first search (DFS), and single source shortest paths (SSSP) on undirected embedded planar graphs. Our separator algorithm does not need a BFS tree or an embedding of G to be given as part of the input. Instead we argue that "local embeddings" of subgraphs of G are enough
    corecore