60 research outputs found
Lessons from the Congested Clique Applied to MapReduce
The main results of this paper are (I) a simulation algorithm which, under
quite general constraints, transforms algorithms running on the Congested
Clique into algorithms running in the MapReduce model, and (II) a distributed
-coloring algorithm running on the Congested Clique which has an
expected running time of (i) rounds, if ;
and (ii) rounds otherwise. Applying the simulation theorem to
the Congested-Clique -coloring algorithm yields an -round
-coloring algorithm in the MapReduce model.
Our simulation algorithm illustrates a natural correspondence between
per-node bandwidth in the Congested Clique model and memory per machine in the
MapReduce model. In the Congested Clique (and more generally, any network in
the model), the major impediment to constructing fast
algorithms is the restriction on message sizes. Similarly, in the
MapReduce model, the combined restrictions on memory per machine and total
system memory have a dominant effect on algorithm design. In showing a fairly
general simulation algorithm, we highlight the similarities and differences
between these models.Comment: 15 page
EVOLUTIONARY ALGORITHMS FOR OVERLAPPING CORRELATION CLUSTERING
Abstract. In Overlapping Correlation Clustering (OCC), a number of objects are assigned to clusters. Two objects in the same cluster have correlated characteristics. As opposed to traditional clustering where objects are assigned to a single cluster, in OCC objects may be assigned to one or more clusters. since an object can have characteristics that are correlated with objects in more than one cluster. In this paper, we present Biased Random-Key Genetic Algorithms for OCC. Computational experiments are presented. 1
Quicksort, Largest Bucket, and Min-Wise Hashing with Limited Independence
Randomized algorithms and data structures are often analyzed under the
assumption of access to a perfect source of randomness. The most fundamental
metric used to measure how "random" a hash function or a random number
generator is, is its independence: a sequence of random variables is said to be
-independent if every variable is uniform and every size subset is
independent. In this paper we consider three classic algorithms under limited
independence. We provide new bounds for randomized quicksort, min-wise hashing
and largest bucket size under limited independence. Our results can be
summarized as follows.
-Randomized quicksort. When pivot elements are computed using a
-independent hash function, Karloff and Raghavan, J.ACM'93 showed expected worst-case running time for a special version of quicksort.
We improve upon this, showing that the same running time is achieved with only
-independence.
-Min-wise hashing. For a set , consider the probability of a particular
element being mapped to the smallest hash value. It is known that
-independence implies the optimal probability . Broder et al.,
STOC'98 showed that -independence implies it is . We show
a matching lower bound as well as new tight bounds for - and -independent
hash functions.
-Largest bucket. We consider the case where balls are distributed to
buckets using a -independent hash function and analyze the largest bucket
size. Alon et. al, STOC'97 showed that there exists a -independent hash
function implying a bucket of size . We generalize the
bound, providing a -independent family of functions that imply size .Comment: Submitted to ICALP 201
Pattern Matching in Multiple Streams
We investigate the problem of deterministic pattern matching in multiple
streams. In this model, one symbol arrives at a time and is associated with one
of s streaming texts. The task at each time step is to report if there is a new
match between a fixed pattern of length m and a newly updated stream. As is
usual in the streaming context, the goal is to use as little space as possible
while still reporting matches quickly. We give almost matching upper and lower
space bounds for three distinct pattern matching problems. For exact matching
we show that the problem can be solved in constant time per arriving symbol and
O(m+s) words of space. For the k-mismatch and k-difference problems we give
O(k) time solutions that require O(m+ks) words of space. In all three cases we
also give space lower bounds which show our methods are optimal up to a single
logarithmic factor. Finally we set out a number of open problems related to
this new model for pattern matching.Comment: 13 pages, 1 figur
Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways
As increasing amounts of high-throughput data for the yeast interactome become available, more system-wide properties are uncovered. One interesting question concerns the fault tolerance of protein interaction networks: whether there exist alternative pathways that can perform some required function if a gene essential to the main mechanism is defective, absent or suppressed. A signature pattern for redundant pathways is the BPM (between-pathway model) motif, introduced by Kelley and Ideker. Past methods proposed to search the yeast interactome for BPM motifs have had several important limitations. First, they have been driven heuristically by local greedy searches, which can lead to the inclusion of extra genes that may not belong in the motif; second, they have been validated solely by functional coherence of the putative pathways using GO enrichment, making it difficult to evaluate putative BPMs in the absence of already known biological annotation. We introduce stable bipartite subgraphs, and show they form a clean and efficient way of generating meaningful BPMs which naturally discard extra genes included by local greedy methods. We show by GO enrichment measures that our BPM set outperforms previous work, covering more known complexes and functional pathways. Perhaps most importantly, since our BPMs are initially generated by examining the genetic-interaction network only, the location of edges in the protein-protein physical interaction network can then be used to statistically validate each candidate BPM, even with sparse GO annotation (or none at all). We uncover some interesting biological examples of previously unknown putative redundant pathways in such areas as vesicle-mediated transport and DNA repair
New Results on Server Problems
In the k-server problem, we must choose how k mobile servers will serve each of a sequence of requests, making our decisions in an online manner. We exhibit an optimal deterministic online strategy when the requests fall on the real line. For the weighted-cache problem, in which the cost of moving to x from any other point is w(x), the weight of x, we also provide an optimal deterministic algorithm. We prove the nonexistence of competitive algorithms for the asymmetric two-server problem, and of memoryless algorithms for the weighted-cache problem. We give a fast algorithm for offline computing of an optimal schedule, and show that finding an optimal offline schedule is at least as hard as the assignment problem. 1 Introduction The k-server problem can be stated as follows. We are given a metric space M , and k servers which move among the points of M , each occupying one point of M . Repeatedly, a request (a point x 2 M) appears. To serve x, each server moves some distance, possibly..
- …