60 research outputs found

    Lessons from the Congested Clique Applied to MapReduce

    Full text link
    The main results of this paper are (I) a simulation algorithm which, under quite general constraints, transforms algorithms running on the Congested Clique into algorithms running in the MapReduce model, and (II) a distributed O(Δ)O(\Delta)-coloring algorithm running on the Congested Clique which has an expected running time of (i) O(1)O(1) rounds, if ΔΘ(log4n)\Delta \geq \Theta(\log^4 n); and (ii) O(loglogn)O(\log \log n) rounds otherwise. Applying the simulation theorem to the Congested-Clique O(Δ)O(\Delta)-coloring algorithm yields an O(1)O(1)-round O(Δ)O(\Delta)-coloring algorithm in the MapReduce model. Our simulation algorithm illustrates a natural correspondence between per-node bandwidth in the Congested Clique model and memory per machine in the MapReduce model. In the Congested Clique (and more generally, any network in the CONGEST\mathcal{CONGEST} model), the major impediment to constructing fast algorithms is the O(logn)O(\log n) restriction on message sizes. Similarly, in the MapReduce model, the combined restrictions on memory per machine and total system memory have a dominant effect on algorithm design. In showing a fairly general simulation algorithm, we highlight the similarities and differences between these models.Comment: 15 page

    EVOLUTIONARY ALGORITHMS FOR OVERLAPPING CORRELATION CLUSTERING

    Get PDF
    Abstract. In Overlapping Correlation Clustering (OCC), a number of objects are assigned to clusters. Two objects in the same cluster have correlated characteristics. As opposed to traditional clustering where objects are assigned to a single cluster, in OCC objects may be assigned to one or more clusters. since an object can have characteristics that are correlated with objects in more than one cluster. In this paper, we present Biased Random-Key Genetic Algorithms for OCC. Computational experiments are presented. 1

    Quicksort, Largest Bucket, and Min-Wise Hashing with Limited Independence

    Get PDF
    Randomized algorithms and data structures are often analyzed under the assumption of access to a perfect source of randomness. The most fundamental metric used to measure how "random" a hash function or a random number generator is, is its independence: a sequence of random variables is said to be kk-independent if every variable is uniform and every size kk subset is independent. In this paper we consider three classic algorithms under limited independence. We provide new bounds for randomized quicksort, min-wise hashing and largest bucket size under limited independence. Our results can be summarized as follows. -Randomized quicksort. When pivot elements are computed using a 55-independent hash function, Karloff and Raghavan, J.ACM'93 showed O(nlogn)O ( n \log n) expected worst-case running time for a special version of quicksort. We improve upon this, showing that the same running time is achieved with only 44-independence. -Min-wise hashing. For a set AA, consider the probability of a particular element being mapped to the smallest hash value. It is known that 55-independence implies the optimal probability O(1/n)O (1 /n). Broder et al., STOC'98 showed that 22-independence implies it is O(1/A)O(1 / \sqrt{|A|}). We show a matching lower bound as well as new tight bounds for 33- and 44-independent hash functions. -Largest bucket. We consider the case where nn balls are distributed to nn buckets using a kk-independent hash function and analyze the largest bucket size. Alon et. al, STOC'97 showed that there exists a 22-independent hash function implying a bucket of size Ω(n1/2)\Omega ( n^{1/2}). We generalize the bound, providing a kk-independent family of functions that imply size Ω(n1/k)\Omega ( n^{1/k}).Comment: Submitted to ICALP 201

    Pattern Matching in Multiple Streams

    Full text link
    We investigate the problem of deterministic pattern matching in multiple streams. In this model, one symbol arrives at a time and is associated with one of s streaming texts. The task at each time step is to report if there is a new match between a fixed pattern of length m and a newly updated stream. As is usual in the streaming context, the goal is to use as little space as possible while still reporting matches quickly. We give almost matching upper and lower space bounds for three distinct pattern matching problems. For exact matching we show that the problem can be solved in constant time per arriving symbol and O(m+s) words of space. For the k-mismatch and k-difference problems we give O(k) time solutions that require O(m+ks) words of space. In all three cases we also give space lower bounds which show our methods are optimal up to a single logarithmic factor. Finally we set out a number of open problems related to this new model for pattern matching.Comment: 13 pages, 1 figur

    Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways

    Get PDF
    As increasing amounts of high-throughput data for the yeast interactome become available, more system-wide properties are uncovered. One interesting question concerns the fault tolerance of protein interaction networks: whether there exist alternative pathways that can perform some required function if a gene essential to the main mechanism is defective, absent or suppressed. A signature pattern for redundant pathways is the BPM (between-pathway model) motif, introduced by Kelley and Ideker. Past methods proposed to search the yeast interactome for BPM motifs have had several important limitations. First, they have been driven heuristically by local greedy searches, which can lead to the inclusion of extra genes that may not belong in the motif; second, they have been validated solely by functional coherence of the putative pathways using GO enrichment, making it difficult to evaluate putative BPMs in the absence of already known biological annotation. We introduce stable bipartite subgraphs, and show they form a clean and efficient way of generating meaningful BPMs which naturally discard extra genes included by local greedy methods. We show by GO enrichment measures that our BPM set outperforms previous work, covering more known complexes and functional pathways. Perhaps most importantly, since our BPMs are initially generated by examining the genetic-interaction network only, the location of edges in the protein-protein physical interaction network can then be used to statistically validate each candidate BPM, even with sparse GO annotation (or none at all). We uncover some interesting biological examples of previously unknown putative redundant pathways in such areas as vesicle-mediated transport and DNA repair

    A lower bound on the size of universal sets for planar graphs

    No full text

    New Results on Server Problems

    No full text
    In the k-server problem, we must choose how k mobile servers will serve each of a sequence of requests, making our decisions in an online manner. We exhibit an optimal deterministic online strategy when the requests fall on the real line. For the weighted-cache problem, in which the cost of moving to x from any other point is w(x), the weight of x, we also provide an optimal deterministic algorithm. We prove the nonexistence of competitive algorithms for the asymmetric two-server problem, and of memoryless algorithms for the weighted-cache problem. We give a fast algorithm for offline computing of an optimal schedule, and show that finding an optimal offline schedule is at least as hard as the assignment problem. 1 Introduction The k-server problem can be stated as follows. We are given a metric space M , and k servers which move among the points of M , each occupying one point of M . Repeatedly, a request (a point x 2 M) appears. To serve x, each server moves some distance, possibly..
    corecore