13 research outputs found
The Workflow Trace Archive: Open-Access Data from Public and Private Computing Infrastructures -- Technical Report
Realistic, relevant, and reproducible experiments often need input traces
collected from real-world environments. We focus in this work on traces of
workflows---common in datacenters, clouds, and HPC infrastructures. We show
that the state-of-the-art in using workflow-traces raises important issues: (1)
the use of realistic traces is infrequent, and (2) the use of realistic, {\it
open-access} traces even more so. Alleviating these issues, we introduce the
Workflow Trace Archive (WTA), an open-access archive of workflow traces from
diverse computing infrastructures and tooling to parse, validate, and analyze
traces. The WTA includes million workflows captured from
computing infrastructures, representing a broad diversity of trace domains and
characteristics. To emphasize the importance of trace diversity, we
characterize the WTA contents and analyze in simulation the impact of trace
diversity on experiment results. Our results indicate significant differences
in characteristics, properties, and workflow structures between workload
sources, domains, and fields.Comment: Technical repor
DeepPlace: Learning to Place Applications in Multi-Tenant Clusters
Large multi-tenant production clusters often have to handle a variety of jobs
and applications with a variety of complex resource usage characteristics. It
is non-trivial and non-optimal to manually create placement rules for
scheduling that would decide which applications should co-locate. In this
paper, we present DeepPlace, a scheduler that learns to exploits various
temporal resource usage patterns of applications using Deep Reinforcement
Learning (Deep RL) to reduce resource competition across jobs running in the
same machine while at the same time optimizing for overall cluster utilization.Comment: APSys 201
In Datacenter Performance, The Only Constant Is Change
All computing infrastructure suffers from performance variability, be it
bare-metal or virtualized. This phenomenon originates from many sources: some
transient, such as noisy neighbors, and others more permanent but sudden, such
as changes or wear in hardware, changes in the underlying hypervisor stack, or
even undocumented interactions between the policies of the computing resource
provider and the active workloads. Thus, performance measurements obtained on
clouds, HPC facilities, and, more generally, datacenter environments are almost
guaranteed to exhibit performance regimes that evolve over time, which leads to
undesirable nonstationarities in application performance. In this paper, we
present our analysis of performance of the bare-metal hardware available on the
CloudLab testbed where we focus on quantifying the evolving performance regimes
using changepoint detection. We describe our findings, backed by a dataset with
nearly 6.9M benchmark results collected from over 1600 machines over a period
of 2 years and 9 months. These findings yield a comprehensive characterization
of real-world performance variability patterns in one computing facility, a
methodology for studying such patterns on other infrastructures, and contribute
to a better understanding of performance variability in general.Comment: To be presented at the 20th IEEE/ACM International Symposium on
Cluster, Cloud and Internet Computing (CCGrid,
http://cloudbus.org/ccgrid2020/) on May 11-14, 2020 in Melbourne, Victoria,
Australi
The workflow trace archive:Open-access data from public and private computing infrastructures
Realistic, relevant, and reproducible experiments often need input traces collected from real-world environments. In this work, we focus on traces of workflows - common in datacenters, clouds, and HPC infrastructures. We show that the state-of-the-art in using workflow-traces raises important issues: (1) the use of realistic traces is infrequent and (2) the use of realistic, open-access traces even more so. Alleviating these issues, we introduce the Workflow Trace Archive (WTA), an open-access archive of workflow traces from diverse computing infrastructures and tooling to parse, validate, and analyze traces. The WTA includes {>}48>48 million workflows captured from {>}10>10 computing infrastructures, representing a broad diversity of trace domains and characteristics. To emphasize the importance of trace diversity, we characterize the WTA contents and analyze in simulation the impact of trace diversity on experiment results. Our results indicate significant differences in characteristics, properties, and workflow structures between workload sources, domains, and fields
Traffic generation for benchmarking data centre networks
Benchmarking is commonly used in research fields, such as computer architecture design and machine learning, as a powerful paradigm for rigorously assessing, comparing, and developing novel technologies. However, the data centre network (DCN) community lacks a standard open-access and reproducible traffic generation framework for benchmark workload generation. Driving factors behind this include the proprietary nature of traffic traces, the limited detail and quantity of open-access network-level data sets, the high cost of real world experimentation, and the poor reproducibility and fidelity of synthetically generated traffic. This is curtailing the community's understanding of existing systems and hindering the ability with which novel technologies, such as optical DCNs, can be developed, compared, and tested. We present TrafPy; an open-access framework for generating both realistic and custom DCN traffic traces. TrafPy is compatible with any simulation, emulation, or experimentation environment, and can be used for standardised benchmarking and for investigating the properties and limitations of network systems such as schedulers, switches, routers, and resource managers. We give an overview of the TrafPy traffic generation framework, and provide a brief demonstration of its efficacy through an investigation into the sensitivity of some canonical scheduling algorithms to varying traffic trace characteristics in the context of optical DCNs. TrafPy is open-sourced via GitHub and all data associated with this manuscript via RDR
Shadow: Exploiting the Power of Choice for Efficient Shuffling in MapReduce
International audienc