Search CORE

102 research outputs found

Computing Maximum Agreement Forests without Cluster Partitioning is Folly

Author: Li Zhijiang
Zeh Norbert
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th Annual European Symposium on Algorithms (ESA 2017)
Publication date: 01/01/2017
Field of study

Computing a maximum (acyclic) agreement forest (M(A)AF) of a pair of phylogenetic trees is known to be fixed-parameter tractable; the two main techniques are kernelization and depth-bounded search. In theory, kernelization-based algorithms for this problem are not competitive, but they perform remarkably well in practice. We shed light on why this is the case. Our results show that, probably unsurprisingly, the kernel is often much smaller in practice than the theoretical worst case, but not small enough to fully explain the good performance of these algorithms. The key to performance is cluster partitioning, a technique used in almost all fast M(A)AF algorithms. In theory, cluster partitioning does not help: some instances are highly clusterable, others not at all. However, our experiments show that cluster partitioning leads to substantial performance improvements for kernelization-based M(A)AF algorithms. In contrast, kernelizing the individual clusters before solving them using exponential search yields only very modest performance improvements or even hurts performance; for the vast majority of inputs, kernelization leads to no reduction in the maximal cluster size at all. The choice of the algorithm applied to solve individual clusters also significantly impacts performance, even though our limited experiment to evaluate this produced no clear winner; depth-bounded search, exponential search interleaved with kernelization, and an ILP-based algorithm all achieved competitive performance

Dagstuhl Research Online Publication Server

QuPARA: Query-Driven Large-Scale Portfolio Aggregate Risk Analysis on MapReduce

Author: Rau-Chaplin Andrew
Varghese Blesson
Wilson Duane
Yao Zhimin
Zeh Norbert
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/08/2013
Field of study

Stochastic simulation techniques are used for portfolio risk analysis. Risk portfolios may consist of thousands of reinsurance contracts covering millions of insured locations. To quantify risk each portfolio must be evaluated in up to a million simulation trials, each capturing a different possible sequence of catastrophic events over the course of a contractual year. In this paper, we explore the design of a flexible framework for portfolio risk analysis that facilitates answering a rich variety of catastrophic risk queries. Rather than aggregating simulation data in order to produce a small set of high-level risk metrics efficiently (as is often done in production risk management systems), the focus here is on allowing the user to pose queries on unaggregated or partially aggregated data. The goal is to provide a flexible framework that can be used by analysts to answer a wide variety of unanticipated but natural ad hoc queries. Such detailed queries can help actuaries or underwriters to better understand the multiple dimensions (e.g., spatial correlation, seasonality, peril features, construction features, and financial terms) that can impact portfolio risk. We implemented a prototype system, called QuPARA (Query-Driven Large-Scale Portfolio Aggregate Risk Analysis), using Hadoop, which is Apache's implementation of the MapReduce paradigm. This allows the user to take advantage of large parallel compute servers in order to answer ad hoc risk analysis queries efficiently even on very large data sets typically encountered in practice. We describe the design and implementation of QuPARA and present experimental results that demonstrate its feasibility. A full portfolio risk analysis run consisting of a 1,000,000 trial simulation, with 1,000 events per trial, and 3,200 risk transfer contracts can be completed on a 16-node Hadoop cluster in just over 20 minutes.Comment: 9 pages, IEEE International Conference on Big Data (BigData), Santa Clara, USA, 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Cache-Oblivious Data Structures and Algorithms for Undirected Breadth-First Search and Shortest Paths

Author: Brodal Gerth Stølting
Fagerberg Rolf
Meyer Ulrich
Zeh Norbert
Publication venue: 'Aarhus University Library'
Publication date: 11/02/2004
Field of study

We present improved cache-oblivious data structures and algorithms for breadth-first search (BFS) on undirected graphs and the single-source shortest path (SSSP) problem on undirected graphs with non-negative edge weights. For the SSSP problem, our result closes the performance gap between the currently best cache-aware algorithm and the cache-oblivious counterpart. Our cache-oblivious SSSP-algorithm takes nearly full advantage of block transfers for dense graphs. The algorithm relies on a new data structure, called bucket heap, which is the first cache-oblivious priority queue to efficiently support a weak DECREASEKEY operation. For the BFS problem, we reduce the number of I/Os for sparse graphs by a factor of nearly sqrt{B}, where B is the cache-block size, nearly closing the performance gap between the currently best cache-aware and cache-oblivious algorithms

Tidsskrift.dk (Det Kongelige Bibliotek)

Another virtue of wavelet forests?

Author: Boucher Christina
Gagie Travis
Hong Aaron
Li Yansong
Zeh Norbert
Publication venue
Publication date: 15/08/2023
Field of study

A wavelet forest for a text

T [1..n]

over an alphabet

\sigma

takes

n H_0 (T) + o (n \log \sigma)

bits of space and supports access and rank on

T

O (\log \sigma)

time. K\"arkk\"ainen and Puglisi (2011) implicitly introduced wavelet forests and showed that when

T

is the Burrows-Wheeler Transform (BWT) of a string

S

, then a wavelet forest for

T

occupies space bounded in terms of higher-order empirical entropies of

S

even when the forest is implemented with uncompressed bitvectors. In this paper we show experimentally that wavelet forests also have better access locality than wavelet trees and are thus interesting even when higher-order compression is not effective on

S

, or when

T

is not a BWT at all

arXiv.org e-Print Archive

Geometric spanners with small chromatic number

Author: Bose Prosenjit
Carmi Paz
Couture Mathieu
Maheshwari Anil
Smid Michiel
Zeh Norbert
Publication venue: Elsevier B.V.
Publication date: 27/08/2008
Field of study

AbstractGiven an integer k⩾2, we consider the problem of computing the smallest real number t(k) such that for each set P of points in the plane, there exists a t(k)-spanner for P that has chromatic number at most k. We prove that t(2)=3, t(3)=2, t(4)=2, and give upper and lower bounds on t(k) for k>4. We also show that for any ϵ>0, there exists a (1+ϵ)t(k)-spanner for P that has O(|P|) edges and chromatic number at most k. Finally, we consider an on-line variant of the problem where the points of P are given one after another, and the color of a point must be assigned at the moment the point is given. In this setting, we prove that t(2)=3, t(3)=1+3, t(4)=1+2, and give upper and lower bounds on t(k) for k>4

Elsevier - Publisher Connector

Carleton University's Institutional Repository

I/O-Efficient Planar Separators and Applications

Author: Norbert Zeh
Publication venue: Manuscript
Publication date: 01/01/2001
Field of study

We present a new algorithm to compute a subset S of vertices of a planar graph G whose removal partitions G into O(N/h) subgraphs of size O(h) and with boundary size O( p h) each. The size of S is O(N= p h). Computing S takes O(sort(N)) I/Os and linear space, provided that M 56hlog² B. Together with recent reducibility results, this leads to O(sort(N)) I/O algorithms for breadth-first search (BFS), depth-first search (DFS), and single source shortest paths (SSSP) on undirected embedded planar graphs. Our separator algorithm does not need a BFS tree or an embedding of G to be given as part of the input. Instead we argue that "local embeddings" of subgraphs of G are enough

CiteSeerX