38 research outputs found

    Peregrine: A Pattern-Aware Graph Mining System

    Full text link
    Graph mining workloads aim to extract structural properties of a graph by exploring its subgraph structures. General purpose graph mining systems provide a generic runtime to explore subgraph structures of interest with the help of user-defined functions that guide the overall exploration process. However, the state-of-the-art graph mining systems remain largely oblivious to the shape (or pattern) of the subgraphs that they mine. This causes them to: (a) explore unnecessary subgraphs; (b) perform expensive computations on the explored subgraphs; and, (c) hold intermediate partial subgraphs in memory; all of which affect their overall performance. Furthermore, their programming models are often tied to their underlying exploration strategies, which makes it difficult for domain users to express complex mining tasks. In this paper, we develop Peregrine, a pattern-aware graph mining system that directly explores the subgraphs of interest while avoiding exploration of unnecessary subgraphs, and simultaneously bypassing expensive computations throughout the mining process. We design a pattern-based programming model that treats "graph patterns" as first class constructs and enables Peregrine to extract the semantics of patterns, which it uses to guide its exploration. Our evaluation shows that Peregrine outperforms state-of-the-art distributed and single machine graph mining systems, and scales to complex mining tasks on larger graphs, while retaining simplicity and expressivity with its "pattern-first" programming approach.Comment: This is the full version of the paper appearing in the European Conference on Computer Systems (EuroSys), 202

    Distributed Evolutionary Algorithm for Vector Quantisation in JAVA

    No full text
    In this paper we use a modified LBG algorithm for vector quantization. An evolutionary algorithm was developed to find a quantization that has both not too many reference points and small error. Due to the fact that computations of LBG are independent from one another the algorithm could be accelerated by making it parallel and distributed. Experiments show that the algorithm usually finds global minimum

    A Work-Optimal Deterministic Algorithm for the Asynchronous Certified Write-All Problem

    No full text
    In their SIAM J. on Computing paper [27] from 1992, Martel et al. posed a question for developing a work-optimal deterministic asynchronous algorithm for the fundamental loadbalancing and synchronization problem called Certified Write-All. In this problem, introduced in a slightly di#erent form by Kanellakis and Shvartsman in a PODC'89 paper [17], p processors must update n memory cells and then signal the completion of the updates. It is known that solutions to this problem can be used to simulate synchronous parallel programs on asynchronous systems with worst-case guarantees for the overhead of a simulation. Such simulations are interesting because they may increase productivity in parallel computing since synchronous parallel programs are easier to reason about than asynchronous ones are

    Distributed scheduling for disconnected cooperation

    No full text
    The dissertation studies how distributed devices that are disconnected for long and unknown periods can efficiently perform a set of tasks. Given n distributed devices that must perform t independent tasks, known to each device, the goal is to schedule work of the devices locally, in the absence of communication, so that when communication is established between some devices at some later point of time, the devices that connect have performed few tasks redundantly beyond necessity. The dissertation gives a lower bound on redundant work, and randomized and deterministic schedules, that allow devices to avoid doing redundant work provably well. The lower bound shows how the wasted work increases as the devices progress in their work. When each disconnected device randomly selects its next task, from among the tasks remaining to be done, then the amount of work duplicated by any devices that reconnect, is close to the lower bound in a precise sense. In order to derandomize the construction of schedules, techniques from design theory, linear algebra, and graph theory are used. The topics developed within the dissertation are related to the theory of latin squares and coding theory. For example the lower bound shown in the dissertation generalizes the Second Johnson Bound. The dissertation also studies scheduling problems for shared memory systems. It shows a method for creating near-optimal instances of an algorithm of Anderson and Woll. The dissertation also shows a work-optimal deterministic algorithm for the asynchronous Certified Write-All problem.

    Graph data management systems for new application domains

    No full text

    Distributed Scheduling for Disconnected Cooperation

    No full text
    This dissertation studies a cooperation problem where a system of distributed asynchronous devices, that can be disconnected for long and unknown periods, must e#ciently perform a set of task
    corecore