4 research outputs found

    The LDBC Graphalytics Benchmark

    Full text link
    In this document, we describe LDBC Graphalytics, an industrial-grade benchmark for graph analysis platforms. The main goal of Graphalytics is to enable the fair and objective comparison of graph analysis platforms. Due to the diversity of bottlenecks and performance issues such platforms need to address, Graphalytics consists of a set of selected deterministic algorithms for full-graph analysis, standard graph datasets, synthetic dataset generators, and reference output for validation purposes. Its test harness produces deep metrics that quantify multiple kinds of systems scalability, weak and strong, and robustness, such as failures and performance variability. The benchmark also balances comprehensiveness with runtime necessary to obtain the deep metrics. The benchmark comes with open-source software for generating performance data, for validating algorithm results, for monitoring and sharing performance data, and for obtaining the final benchmark result as a standard performance report

    Comparative Evaluation for the Performance of Big Stream Processing Systems

    Get PDF
    Andmete hulk kasvab tänapäeval meeletu kiirusega ning seda andmete hulka tuleb korrektselt töödelda, et saavutada kontroll andmete üle. Antud olukord sunnib meid mõtlema andmevoo töötlemise peale. Enamasti nõuavad andmemahuline pettuse tuvastus-, kaubandus-, tootmis-, sõjanduse ja luure süsteemid pidevat andmete analüüsi (reaalajas). Sellist tüüpi süsteemid nõuavad kõrgetasemel ist mustrite sobitamist ja korrelatsioone. Aja jooksul on ilmnenud erinevaid andmevoo töötlemise võimalusi. Antud lõputöös tehakse jõudlustest Apache Flink, Apache Storm, Heron, Kafka ja Apache Spark andmevoo töötlemismootoritega ning tulemusi võrreldakse ja vastandatakse omavahel. Nendes rakendustes ja domeenides on väga oluline nõue koguda, menetleda ning analüüsida olulisi andmevooge, et eraldada sealt väärtusliku informatsiooni. Antud magistritöö eesmärk on läbi viia empiiriline hindamine ning võrdlemine kõrgtasemel andmevoo töötlemissüsteemide vahel.Nowadays data is growing with tremendous acceleration, and this growing data must be processed properly if we want to have control over it. It pushes us to think about data stream processing. Most of the time, a data-intensive fraud detecting, trading, manufacturing, military and intelligence systems require processing data immediately (real-time). These kinds of systems need considerably ssophisticated pattern matching and correlations. However, other uses of stream processing have also emerged over time. In this thesis, we will benchmark to compare and contrast Apache Flink, Apache Storm, Heron, Kafka an Apache Spark stream processing engines. In these applications and domains, there is a crucial requirement to collect, process, and analyze significant streams of data to extract valuable information. This thesis aims to conduct an empirical evaluation and benchmarking of the state-of-the-art of big stream processing systems

    LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms

    Get PDF
    ABSTRACT In this paper we introduce LDBC Graphalytics, a new industrial-grade benchmark for graph analysis platforms. It consists of six deterministic algorithms, standard datasets, synthetic dataset generators, and reference output, that enable the objective comparison of graph analysis platforms. Its test harness produces deep metrics that quantify multiple kinds of system scalability, such as horizontal/vertical and weak/strong, and of robustness, such as failures and performance variability. The benchmark comes with open-source software for generating data and monitoring performance. We describe and analyze six implementations of the benchmark (three from the community, three from the industry), providing insights into the strengths and weaknesses of the platforms. Key to our contribution, vendors perform the tuning and benchmarking of their platforms