Search CORE

60 research outputs found

Lessons from the Congested Clique Applied to MapReduce

Author: A. Berns
A. Ene
B. Patt-Shamir
D. Dolev
H. Karloff
J. Dean
J. Gehweiler
R. Kumar
S. Lattanzi
Z. Lotker
Ö. Johansson
Publication venue
Publication date: 01/01/2014
Field of study

The main results of this paper are (I) a simulation algorithm which, under quite general constraints, transforms algorithms running on the Congested Clique into algorithms running in the MapReduce model, and (II) a distributed

O(\Delta)

-coloring algorithm running on the Congested Clique which has an expected running time of (i)

O(1)

rounds, if

\Delta \geq \Theta(\log^4 n)

; and (ii)

O(\log \log n)

rounds otherwise. Applying the simulation theorem to the Congested-Clique

O(\Delta)

-coloring algorithm yields an

O(1)

-round

O(\Delta)

-coloring algorithm in the MapReduce model. Our simulation algorithm illustrates a natural correspondence between per-node bandwidth in the Congested Clique model and memory per machine in the MapReduce model. In the Congested Clique (and more generally, any network in the

\mathcal{CONGEST}

model), the major impediment to constructing fast algorithms is the

O(\log n)

restriction on message sizes. Similarly, in the MapReduce model, the combined restrictions on memory per machine and total system memory have a dominant effect on algorithm design. In showing a fairly general simulation algorithm, we highlight the similarities and differences between these models.Comment: 15 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

EVOLUTIONARY ALGORITHMS FOR OVERLAPPING CORRELATION CLUSTERING

Author: C. E. Andrade
F. K. Miyasawa
H. J. Karloff
M. G. C. Resende
Publication venue
Publication date: 01/01/2014
Field of study

Abstract. In Overlapping Correlation Clustering (OCC), a number of objects are assigned to clusters. Two objects in the same cluster have correlated characteristics. As opposed to traditional clustering where objects are assigned to a single cluster, in OCC objects may be assigned to one or more clusters. since an object can have characteristics that are correlated with objects in more than one cluster. In this paper, we present Biased Random-Key Genetic Algorithms for OCC. Computational experiments are presented. 1

CiteSeerX

Crossref

On the integrality ratio for tree augmentation

Author: Goemans
H. Karloff
Held
Held
J. Cheriyan
J. Könemann
Jain
Monma
R. Khandekar
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Quicksort, Largest Bucket, and Min-Wise Hashing with Limited Independence

Author: A. Siegel
H. Karloff
J.L. Carter
J.P. Schmidt
M. Dietzfelbinger
M. Pǎtraşcu
R. Motwani
T. Christiani
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Randomized algorithms and data structures are often analyzed under the assumption of access to a perfect source of randomness. The most fundamental metric used to measure how "random" a hash function or a random number generator is, is its independence: a sequence of random variables is said to be

k

-independent if every variable is uniform and every size

k

subset is independent. In this paper we consider three classic algorithms under limited independence. We provide new bounds for randomized quicksort, min-wise hashing and largest bucket size under limited independence. Our results can be summarized as follows. -Randomized quicksort. When pivot elements are computed using a

5

-independent hash function, Karloff and Raghavan, J.ACM'93 showed

O ( n \log n)

expected worst-case running time for a special version of quicksort. We improve upon this, showing that the same running time is achieved with only

4

-independence. -Min-wise hashing. For a set

A

, consider the probability of a particular element being mapped to the smallest hash value. It is known that

5

-independence implies the optimal probability

O (1 /n)

. Broder et al., STOC'98 showed that

2

-independence implies it is

O(1 / \sqrt{|A|})

. We show a matching lower bound as well as new tight bounds for

3

- and

4

-independent hash functions. -Largest bucket. We consider the case where

n

balls are distributed to

n

buckets using a

k

-independent hash function and analyze the largest bucket size. Alon et. al, STOC'97 showed that there exists a

2

-independent hash function implying a bucket of size

\Omega ( n^{1/2})

. We generalize the bound, providing a

k

-independent family of functions that imply size

\Omega ( n^{1/k})

.Comment: Submitted to ICALP 201

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

The IT University of Copenhagen's Repository

Pattern Matching in Multiple Streams

Author: A. Amir
D. Breslauer
F. Ergun
G.M. Landau
G.M. Landau
H. Karloff
K. Abrahamson
M. Ružić
R. Clifford
R. Clifford
R. Clifford
R. Clifford
R. Clifford
T.S. Jayram
Z. Galil
Publication venue
Publication date: 01/01/2012
Field of study

We investigate the problem of deterministic pattern matching in multiple streams. In this model, one symbol arrives at a time and is associated with one of s streaming texts. The task at each time step is to report if there is a new match between a fixed pattern of length m and a newly updated stream. As is usual in the streaming context, the goal is to use as little space as possible while still reporting matches quickly. We give almost matching upper and lower space bounds for three distinct pattern matching problems. For exact matching we show that the problem can be solved in constant time per arriving symbol and O(m+s) words of space. For the k-mismatch and k-difference problems we give O(k) time solutions that require O(m+ks) words of space. In all three cases we also give space lower bounds which show our methods are optimal up to a single logarithmic factor. Finally we set out a number of open problems related to this new model for pattern matching.Comment: 13 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways

Author: A Wagner
AR Brady
Arthur Brady
B Papp
C Stark
E Nabieva
FP Roth
G Giaever
GF Berriz
H Karloff
I Ulitsky
JM Cherry
Joel S. Bader
JR Mullen
KH Berger
KS Dimmer
Kyle Maxwell
L Lovasz
LD Hurst
Lenore J. Cowen
LM Blank
M Ashburner
M Wagner
MR Garey
NA Ellis
NM Hollingsworth
Noah Daniels
PM Watt
R Harrison
R Karp
R Kelley
R Kelley
S Bandyopadhyay
SD Oh
T Beissbarth
X Ma
X Zeng
Z Gu
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

As increasing amounts of high-throughput data for the yeast interactome become available, more system-wide properties are uncovered. One interesting question concerns the fault tolerance of protein interaction networks: whether there exist alternative pathways that can perform some required function if a gene essential to the main mechanism is defective, absent or suppressed. A signature pattern for redundant pathways is the BPM (between-pathway model) motif, introduced by Kelley and Ideker. Past methods proposed to search the yeast interactome for BPM motifs have had several important limitations. First, they have been driven heuristically by local greedy searches, which can lead to the inclusion of extra genes that may not belong in the motif; second, they have been validated solely by functional coherence of the putative pathways using GO enrichment, making it difficult to evaluate putative BPMs in the absence of already known biological annotation. We introduce stable bipartite subgraphs, and show they form a clean and efficient way of generating meaningful BPMs which naturally discard extra genes included by local greedy methods. We show by GO enrichment measures that our BPM set outperforms previous work, covering more known complexes and functional pathways. Perhaps most importantly, since our BPMs are initially generated by examining the genetic-interaction network only, the location of edges in the protein-protein physical interaction network can then be used to statistically validate each candidate BPM, even with sparse GO annotation (or none at all). We uncover some interesting biological examples of previously unknown putative redundant pathways in such areas as vesicle-mediated transport and DNA repair

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A lower bound on the size of universal sets for planar graphs

Author: Fary I
H. Karloff
M. Chrobak
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

A Better Approximation Algorithm for Finding Planar Subgraphs

Author: Calinescu G.
Fernandes C.
Finkler U.
Karloff H.
Publication venue
Publication date: 01/01/1998
Field of study

MPG.PuRe

New Results on Server Problems

Author: H. Karloff
M. Chrobak
S. Vishwanathan
T. Payne
Publication venue
Publication date
Field of study

In the k-server problem, we must choose how k mobile servers will serve each of a sequence of requests, making our decisions in an online manner. We exhibit an optimal deterministic online strategy when the requests fall on the real line. For the weighted-cache problem, in which the cost of moving to x from any other point is w(x), the weight of x, we also provide an optimal deterministic algorithm. We prove the nonexistence of competitive algorithms for the asymmetric two-server problem, and of memoryless algorithms for the weighted-cache problem. We give a fast algorithm for offline computing of an optimal schedule, and show that finding an optimal offline schedule is at least as hard as the assignment problem. 1 Introduction The k-server problem can be stated as follows. We are given a metric space M , and k servers which move among the points of M , each occupying one point of M . Repeatedly, a request (a point x 2 M) appears. To serve x, each server moves some distance, possibly..

CiteSeerX