93 research outputs found
3rd workshop on hot topics in cloud computing performance (HotCloudPerf'20):Performance variability
The organizers of the Third Workshop on Hot Topics in Cloud Computing Performance (HotCloudPerf 2020) are delighted to welcome you to the workshop proceedings as part of the ICPE conference companion. The HotCloudPerf 2020 workshop is a full-day workshop on Tuesday, April 21, taking place jointly with WOSP-C as part of the ICPE conference week in Edmonton, Canada. Each year, the workshop chooses a focus theme to explore; for 2020, the theme is "Performance variability of cloud datacenters and the implications of such phenomena on application performance" Cloud computing is emerging as one of the most profound changes in the way we build and use IT. The use of global services in public clouds is increasing, and the lucrative and rapidly
In Datacenter Performance, The Only Constant Is Change
All computing infrastructure suffers from performance variability, be it
bare-metal or virtualized. This phenomenon originates from many sources: some
transient, such as noisy neighbors, and others more permanent but sudden, such
as changes or wear in hardware, changes in the underlying hypervisor stack, or
even undocumented interactions between the policies of the computing resource
provider and the active workloads. Thus, performance measurements obtained on
clouds, HPC facilities, and, more generally, datacenter environments are almost
guaranteed to exhibit performance regimes that evolve over time, which leads to
undesirable nonstationarities in application performance. In this paper, we
present our analysis of performance of the bare-metal hardware available on the
CloudLab testbed where we focus on quantifying the evolving performance regimes
using changepoint detection. We describe our findings, backed by a dataset with
nearly 6.9M benchmark results collected from over 1600 machines over a period
of 2 years and 9 months. These findings yield a comprehensive characterization
of real-world performance variability patterns in one computing facility, a
methodology for studying such patterns on other infrastructures, and contribute
to a better understanding of performance variability in general.Comment: To be presented at the 20th IEEE/ACM International Symposium on
Cluster, Cloud and Internet Computing (CCGrid,
http://cloudbus.org/ccgrid2020/) on May 11-14, 2020 in Melbourne, Victoria,
Australi
Log Parsing Evaluation in the Era of Modern Software Systems
Due to the complexity and size of modern software systems, the amount of logs
generated is tremendous. Hence, it is infeasible to manually investigate these
data in a reasonable time, thereby requiring automating log analysis to derive
insights about the functioning of the systems. Motivated by an industry
use-case, we zoom-in on one integral part of automated log analysis, log
parsing, which is the prerequisite to deriving any insights from logs. Our
investigation reveals problematic aspects within the log parsing field,
particularly its inefficiency in handling heterogeneous real-world logs. We
show this by assessing the 14 most-recognized log parsing approaches in the
literature using (i) nine publicly available datasets, (ii) one dataset
comprised of combined publicly available data, and (iii) one dataset generated
within the infrastructure of a large bank. Subsequently, toward improving log
parsing robustness in real-world production scenarios, we propose a tool,
Logchimera, that enables estimating log parsing performance in industry
contexts through generating synthetic log data that resemble industry logs. Our
contributions serve as a foundation to consolidate past research efforts,
facilitate future research advancements, and establish a strong link between
research and industry log parsing
Log Parsing Evaluation in the Era of Modern Software Systems
Due to the complexity and size of modern software systems, the amount of logs generated is tremendous. Hence, it is infeasible to manually investigate these data in a reasonable time, thereby requiring automating log analysis to derive insights about the functioning of the systems. Motivated by an industry use-case, we zoom-in on one integral part of automated log analysis, log parsing, which is the prerequisite to deriving any insights from logs. Our investigation reveals problematic aspects within the log parsing field, particularly its inefficiency in handling heterogeneous real-world logs. We show this by assessing the 14 most-recognized log parsing approaches in the literature using (i) nine publicly available datasets, (ii) one dataset comprised of combined publicly available data, and (iii) one dataset generated within the infrastructure of a large bank. Subsequently, toward improving log parsing robustness in real-world production scenarios, we propose a tool, Logchimera, that enables estimating log parsing performance in industry contexts through generating synthetic log data that resemble industry logs. Our contributions serve as a foundation to consolidate past research efforts, facilitate future research advancements, and establish a strong link between research and industry log parsing
[Demo] Low-latency spark queries on updatable data
As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbench-marks and real-world graph processing queries, in datasets that are continuously growing
SenseLE:Exploiting spatial locality in decentralized sensing environments
Generally, smart devices, such as smartphones, smartwatches, or fitness trackers, communicate with each other indirectly, via cloud data centers. Sharing sensor data with a cloud data center as intermediary invokes transmission methods with high battery costs, such as 4G LTE or WiFi. By sharing sensor information locally and without intermediaries, we can use other transmission methods with low energy cost, such as Bluetooth or BLE. In this paper, we introduce Sense Low Energy (SenseLE), a decentralized sensing framework which exploits the spatial locality of nearby sensors to save energy in Internet-of-Things (IoT) environments. We demonstrate the usability of SenseLE by building a real-life application for estimating waiting times at queues. Furthermore, we evaluate the performance and resource utilization of our SenseLE Android implementation for different sensing scenarios. Our empirical evaluation shows that by exploiting spatial locality, SenseLE is able to reduce application response times (latency) by up to 74% and energy consumption by up to 56%
- …