5 research outputs found

    Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures

    Full text link
    Distributed stream processing engines are designed with a focus on scalability to process big data volumes in a continuous manner. We present the Theodolite method for benchmarking the scalability of distributed stream processing engines. Core of this method is the definition of use cases that microservices implementing stream processing have to fulfill. For each use case, our method identifies relevant workload dimensions that might affect the scalability of a use case. We propose to design one benchmark per use case and relevant workload dimension. We present a general benchmarking framework, which can be applied to execute the individual benchmarks for a given use case and workload dimension. Our framework executes an implementation of the use case's dataflow architecture for different workloads of the given dimension and various numbers of processing instances. This way, it identifies how resources demand evolves with increasing workloads. Within the scope of this paper, we present 4 identified use cases, derived from processing Industrial Internet of Things data, and 7 corresponding workload dimensions. We provide implementations of 4 benchmarks with Kafka Streams and Apache Flink as well as an implementation of our benchmarking framework to execute scalability benchmarks in cloud environments. We use both for evaluating the Theodolite method and for benchmarking Kafka Streams' and Flink's scalability for different deployment options.Comment: 28 page

    Applying test case prioritization to software microbenchmarks

    Get PDF
    Regression testing comprises techniques which are applied during software evolution to uncover faults effectively and efficiently. While regression testing is widely studied for functional tests, performance regression testing, e.g., with software microbenchmarks, is hardly investigated. Applying test case prioritization (TCP), a regression testing technique, to software microbenchmarks may help capturing large performance regressions sooner upon new versions. This may especially be beneficial for microbenchmark suites, because they take considerably longer to execute than unit test suites. However, it is unclear whether traditional unit testing TCP techniques work equally well for software microbenchmarks. In this paper, we empirically study coverage-based TCP techniques, employing total and additional greedy strategies, applied to software microbenchmarks along multiple parameterization dimensions, leading to 54 unique technique instantiations. We find that TCP techniques have a mean APFD-P (average percentage of fault-detection on performance) effectiveness between 0.54 and 0.71 and are able to capture the three largest performance changes after executing 29% to 66% of the whole microbenchmark suite. Our efficiency analysis reveals that the runtime overhead of TCP varies considerably depending on the exact parameterization. The most effective technique has an overhead of 11% of the total microbenchmark suite execution time, making TCP a viable option for performance regression testing. The results demonstrate that the total strategy is superior to the additional strategy. Finally, dynamic-coverage techniques should be favored over static-coverage techniques due to their acceptable analysis overhead; however, in settings where the time for prioritzation is limited, static-coverage techniques provide an attractive alternative

    Methodological Principles for Reproducible Performance Evaluation in Cloud Computing

    No full text
    The rapid adoption and the diversification of cloud computing technology exacerbate the importance of a sound experimental methodology for this domain. This work investigates how to measure and report performance in the cloud, and how well the cloud research community is already doing it. We propose a set of eight important methodological principles that combine best-practices from nearby fields with concepts applicable only to clouds, and with new ideas about the time-accuracy trade-off. We show how these principles are applicable using a practical use-case experiment. To this end, we analyze the ability of the newly released SPEC Cloud IaaS benchmark to follow the principles, and showcase real-world experimental studies in common cloud environments that meet the principles. Last, we report on a systematic literature review including top conferences and journals in the field, from 2012 to 2017, analyzing if the practice of reporting cloud performance measurements follows the proposed eight principles. Worryingly, this systematic survey and the subsequent two-round human reviews, reveal that few of the published studies follow the eight experimental principles. We conclude that, although these important principles are simple and basic, the cloud community is yet to adopt them broadly to deliver sound measurement of cloud environments
    corecore