5 research outputs found
Software Runtime Monitoring with Adaptive Sampling Rate to Collect Representative Samples of Execution Traces
Monitoring software systems at runtime is key for understanding workloads,
debugging, and self-adaptation. It typically involves collecting and storing
observable software data, which can be analyzed online or offline. Despite the
usefulness of collecting system data, it may significantly impact the system
execution by delaying response times and competing with system resources. The
typical approach to cope with this is to filter portions of the system to be
monitored and to sample data. Although these approaches are a step towards
achieving a desired trade-off between the amount of collected information and
the impact on the system performance, they focus on collecting data of a
particular type or may capture a sample that does not correspond to the actual
system behavior. In response, we propose an adaptive runtime monitoring process
to dynamically adapt the sampling rate while monitoring software systems. It
includes algorithms with statistical foundations to improve the
representativeness of collected samples without compromising the system
performance. Our evaluation targets five applications of a widely used
benchmark. It shows that the error (RMSE) of the samples collected with our
approach is 9-54% lower than the main alternative strategy (sampling rate
inversely proportional to the throughput), with 1-6% higher performance impact.Comment: in Journal of Systems and Softwar
Quantifying cloud performance and dependability:Taxonomy, metric design, and emerging challenges
In only a decade, cloud computing has emerged from a pursuit for a service-driven information and communication technology (ICT), becoming a significant fraction of the ICT market. Responding to the growth of the market, many alternative cloud services and their underlying systems are currently vying for the attention of cloud users and providers. To make informed choices between competing cloud service providers, permit the cost-benefit analysis of cloud-based systems, and enable system DevOps to evaluate and tune the performance of these complex ecosystems, appropriate performance metrics, benchmarks, tools, and methodologies are necessary. This requires re-examining old system properties and considering new system properties, possibly leading to the re-design of classic benchmarking metrics such as expressing performance as throughput and latency (response time). In this work, we address these requirements by focusing on four system properties: (i) elasticity of the cloud service, to accommodate large variations in the amount of service requested, (ii) performance isolation between the tenants of shared cloud systems and resulting performance variability, (iii) availability of cloud services and systems, and (iv) the operational risk of running a production system in a cloud environment. Focusing on key metrics for each of these properties, we review the state-of-the-art, then select or propose new metrics together with measurement approaches. We see the presented metrics as a foundation toward upcoming, future industry-standard cloud benchmarks