102 research outputs found
Computing Maximum Agreement Forests without Cluster Partitioning is Folly
Computing a maximum (acyclic) agreement forest (M(A)AF) of a pair of phylogenetic trees is known to be fixed-parameter tractable; the two main techniques are kernelization and depth-bounded search. In theory, kernelization-based algorithms for this problem are not competitive, but they perform remarkably well in practice. We shed light on why this is the case. Our results show that, probably unsurprisingly, the kernel is often much smaller in practice than the theoretical worst case, but not small enough to fully explain the good performance of these algorithms. The key to performance is cluster partitioning, a technique used in almost all fast M(A)AF algorithms. In theory, cluster partitioning does not help: some instances are highly clusterable, others not at all. However, our experiments show that cluster partitioning leads to substantial performance improvements for kernelization-based M(A)AF algorithms. In contrast, kernelizing the individual clusters before solving them using exponential search yields only very modest performance improvements or even hurts performance; for the vast majority of inputs, kernelization leads to no reduction in the maximal cluster size at all. The choice of the algorithm applied to solve individual clusters also significantly impacts performance, even though our limited experiment to evaluate this produced no clear winner; depth-bounded search, exponential search interleaved with kernelization, and an ILP-based algorithm all achieved competitive performance
QuPARA: Query-Driven Large-Scale Portfolio Aggregate Risk Analysis on MapReduce
Stochastic simulation techniques are used for portfolio risk analysis. Risk
portfolios may consist of thousands of reinsurance contracts covering millions
of insured locations. To quantify risk each portfolio must be evaluated in up
to a million simulation trials, each capturing a different possible sequence of
catastrophic events over the course of a contractual year. In this paper, we
explore the design of a flexible framework for portfolio risk analysis that
facilitates answering a rich variety of catastrophic risk queries. Rather than
aggregating simulation data in order to produce a small set of high-level risk
metrics efficiently (as is often done in production risk management systems),
the focus here is on allowing the user to pose queries on unaggregated or
partially aggregated data. The goal is to provide a flexible framework that can
be used by analysts to answer a wide variety of unanticipated but natural ad
hoc queries. Such detailed queries can help actuaries or underwriters to better
understand the multiple dimensions (e.g., spatial correlation, seasonality,
peril features, construction features, and financial terms) that can impact
portfolio risk. We implemented a prototype system, called QuPARA (Query-Driven
Large-Scale Portfolio Aggregate Risk Analysis), using Hadoop, which is Apache's
implementation of the MapReduce paradigm. This allows the user to take
advantage of large parallel compute servers in order to answer ad hoc risk
analysis queries efficiently even on very large data sets typically encountered
in practice. We describe the design and implementation of QuPARA and present
experimental results that demonstrate its feasibility. A full portfolio risk
analysis run consisting of a 1,000,000 trial simulation, with 1,000 events per
trial, and 3,200 risk transfer contracts can be completed on a 16-node Hadoop
cluster in just over 20 minutes.Comment: 9 pages, IEEE International Conference on Big Data (BigData), Santa
Clara, USA, 201
Cache-Oblivious Data Structures and Algorithms for Undirected Breadth-First Search and Shortest Paths
We present improved cache-oblivious data structures and algorithms for breadth-first search (BFS) on undirected graphs and the single-source shortest path (SSSP) problem on undirected graphs with non-negative edge weights. For the SSSP problem, our result closes the performance gap between the currently best cache-aware algorithm and the cache-oblivious counterpart. Our cache-oblivious SSSP-algorithm takes nearly full advantage of block transfers for dense graphs. The algorithm relies on a new data structure, called bucket heap, which is the first cache-oblivious priority queue to efficiently support a weak DECREASEKEY operation. For the BFS problem, we reduce the number of I/Os for sparse graphs by a factor of nearly sqrt{B}, where B is the cache-block size, nearly closing the performance gap between the currently best cache-aware and cache-oblivious algorithms
Another virtue of wavelet forests?
A wavelet forest for a text over an alphabet takes bits of space and supports access and rank on in
time. K\"arkk\"ainen and Puglisi (2011) implicitly introduced
wavelet forests and showed that when is the Burrows-Wheeler Transform (BWT)
of a string , then a wavelet forest for occupies space bounded in terms
of higher-order empirical entropies of even when the forest is implemented
with uncompressed bitvectors. In this paper we show experimentally that wavelet
forests also have better access locality than wavelet trees and are thus
interesting even when higher-order compression is not effective on , or when
is not a BWT at all
Geometric spanners with small chromatic number
AbstractGiven an integer k⩾2, we consider the problem of computing the smallest real number t(k) such that for each set P of points in the plane, there exists a t(k)-spanner for P that has chromatic number at most k. We prove that t(2)=3, t(3)=2, t(4)=2, and give upper and lower bounds on t(k) for k>4. We also show that for any ϵ>0, there exists a (1+ϵ)t(k)-spanner for P that has O(|P|) edges and chromatic number at most k. Finally, we consider an on-line variant of the problem where the points of P are given one after another, and the color of a point must be assigned at the moment the point is given. In this setting, we prove that t(2)=3, t(3)=1+3, t(4)=1+2, and give upper and lower bounds on t(k) for k>4
I/O-Efficient Planar Separators and Applications
We present a new algorithm to compute a subset S of vertices of a planar graph G whose removal partitions G into O(N/h) subgraphs of size O(h) and with boundary size O( p h) each. The size of S is O(N= p h). Computing S takes O(sort(N)) I/Os and linear space, provided that M 56hlog² B. Together with recent reducibility results, this leads to O(sort(N)) I/O algorithms for breadth-first search (BFS), depth-first search (DFS), and single source shortest paths (SSSP) on undirected embedded planar graphs. Our separator algorithm does not need a BFS tree or an embedding of G to be given as part of the input. Instead we argue that "local embeddings" of subgraphs of G are enough
- …