5 research outputs found
Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures
Distributed stream processing engines are designed with a focus on
scalability to process big data volumes in a continuous manner. We present the
Theodolite method for benchmarking the scalability of distributed stream
processing engines. Core of this method is the definition of use cases that
microservices implementing stream processing have to fulfill. For each use
case, our method identifies relevant workload dimensions that might affect the
scalability of a use case. We propose to design one benchmark per use case and
relevant workload dimension. We present a general benchmarking framework, which
can be applied to execute the individual benchmarks for a given use case and
workload dimension. Our framework executes an implementation of the use case's
dataflow architecture for different workloads of the given dimension and
various numbers of processing instances. This way, it identifies how resources
demand evolves with increasing workloads. Within the scope of this paper, we
present 4 identified use cases, derived from processing Industrial Internet of
Things data, and 7 corresponding workload dimensions. We provide
implementations of 4 benchmarks with Kafka Streams and Apache Flink as well as
an implementation of our benchmarking framework to execute scalability
benchmarks in cloud environments. We use both for evaluating the Theodolite
method and for benchmarking Kafka Streams' and Flink's scalability for
different deployment options.Comment: 28 page
Applying test case prioritization to software microbenchmarks
Regression testing comprises techniques which are applied during software evolution to uncover faults effectively and efficiently. While regression testing is widely studied for functional tests, performance regression testing, e.g., with software microbenchmarks, is hardly investigated. Applying test case prioritization (TCP), a regression testing technique, to software microbenchmarks may help capturing large performance regressions sooner upon new versions. This may especially be beneficial for microbenchmark suites, because they take considerably longer to execute than unit test suites. However, it is unclear whether traditional unit testing TCP techniques work equally well for software microbenchmarks. In this paper, we empirically study coverage-based TCP techniques, employing total and additional greedy strategies, applied to software microbenchmarks along multiple parameterization dimensions, leading to 54 unique technique instantiations. We find that TCP techniques have a mean APFD-P (average percentage of fault-detection on performance) effectiveness between 0.54 and 0.71 and are able to capture the three largest performance changes after executing 29% to 66% of the whole microbenchmark suite. Our efficiency analysis reveals that the runtime overhead of TCP varies considerably depending on the exact parameterization. The most effective technique has an overhead of 11% of the total microbenchmark suite execution time, making TCP a viable option for performance regression testing. The results demonstrate that the total strategy is superior to the additional strategy. Finally, dynamic-coverage techniques should be favored over static-coverage techniques due to their acceptable analysis overhead; however, in settings where the time for prioritzation is limited, static-coverage techniques provide an attractive alternative
Methodological Principles for Reproducible Performance Evaluation in Cloud Computing
The rapid adoption and the diversification of cloud computing technology exacerbate the importance of a sound experimental methodology for this domain. This work investigates how to measure and report performance in the cloud, and how well the cloud research community is already doing it. We propose a set of eight important methodological principles that combine best-practices from nearby fields with concepts applicable only to clouds, and with new ideas about the time-accuracy trade-off. We show how these principles are applicable using a practical use-case experiment. To this end, we analyze the ability of the newly released SPEC Cloud IaaS benchmark to follow the principles, and showcase real-world experimental studies in common cloud environments that meet the principles. Last, we report on a systematic literature review including top conferences and journals in the field, from 2012 to 2017, analyzing if the practice of reporting cloud performance measurements follows the proposed eight principles. Worryingly, this systematic survey and the subsequent two-round human reviews, reveal that few of the published studies follow the eight experimental principles. We conclude that, although these important principles are simple and basic, the cloud community is yet to adopt them broadly to deliver sound measurement of cloud environments