1,488 research outputs found

    A Comparison of Big Data Frameworks on a Layered Dataflow Model

    Get PDF
    In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only informal (and often confusing) semantics is generally provided, all share a common underlying model, namely, the Dataflow model. The Dataflow model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.Comment: 19 pages, 6 figures, 2 tables, In Proc. of the 9th Intl Symposium on High-Level Parallel Programming and Applications (HLPP), July 4-5 2016, Muenster, German

    Pregelix: Big(ger) Graph Analytics on A Dataflow Engine

    Full text link
    There is a growing need for distributed graph processing systems that are capable of gracefully scaling to very large graph datasets. Unfortunately, this challenge has not been easily met due to the intense memory pressure imposed by process-centric, message passing designs that many graph processing systems follow. Pregelix is a new open source distributed graph processing system that is based on an iterative dataflow design that is better tuned to handle both in-memory and out-of-core workloads. As such, Pregelix offers improved performance characteristics and scaling properties over current open source systems (e.g., we have seen up to 15x speedup compared to Apache Giraph and up to 35x speedup compared to distributed GraphLab), and makes more effective use of available machine resources to support Big(ger) Graph Analytics
    • …
    corecore