327 research outputs found
Technical Report: A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters
To improve customer experience, datacenter operators offer support for
simplifying application and resource management. For example, running workloads
of workflows on behalf of customers is desirable, but requires increasingly
more sophisticated autoscaling policies, that is, policies that dynamically
provision resources for the customer. Although selecting and tuning autoscaling
policies is a challenging task for datacenter operators, so far relatively few
studies investigate the performance of autoscaling for workloads of workflows.
Complementing previous knowledge, in this work we propose the first
comprehensive performance study in the field. Using trace-based simulation, we
compare state-of-the-art autoscaling policies across multiple application
domains, workload arrival patterns (e.g., burstiness), and system utilization
levels. We further investigate the interplay between autoscaling and regular
allocation policies, and the complexity cost of autoscaling. Our quantitative
study focuses not only on traditional performance metrics and on
state-of-the-art elasticity metrics, but also on time- and memory-related
autoscaling-complexity metrics. Our main results give strong and quantitative
evidence about previously unreported operational behavior, for example, that
autoscaling policies perform differently across application domains and by how
much they differ.Comment: Technical Report for the CCGrid 2018 submission "A Trace-Based
Performance Study of Autoscaling Workloads of Workflows in Datacenters
Traffic generation for benchmarking data centre networks
Benchmarking is commonly used in research fields, such as computer architecture design and machine learning, as a powerful paradigm for rigorously assessing, comparing, and developing novel technologies. However, the data centre network (DCN) community lacks a standard open-access and reproducible traffic generation framework for benchmark workload generation. Driving factors behind this include the proprietary nature of traffic traces, the limited detail and quantity of open-access network-level data sets, the high cost of real world experimentation, and the poor reproducibility and fidelity of synthetically generated traffic. This is curtailing the community's understanding of existing systems and hindering the ability with which novel technologies, such as optical DCNs, can be developed, compared, and tested. We present TrafPy; an open-access framework for generating both realistic and custom DCN traffic traces. TrafPy is compatible with any simulation, emulation, or experimentation environment, and can be used for standardised benchmarking and for investigating the properties and limitations of network systems such as schedulers, switches, routers, and resource managers. We give an overview of the TrafPy traffic generation framework, and provide a brief demonstration of its efficacy through an investigation into the sensitivity of some canonical scheduling algorithms to varying traffic trace characteristics in the context of optical DCNs. TrafPy is open-sourced via GitHub and all data associated with this manuscript via RDR
Empirical characterization and modeling of power consumption and energy aware scheduling in data centers
Energy-efficient management is key in modern data centers in order to reduce
operational cost and environmental contamination. Energy management
and renewable energy utilization are strategies to optimize energy consumption
in high-performance computing. In any case, understanding the power consumption
behavior of physical servers in datacenter is fundamental to implement
energy-aware policies effectively. These policies should deal with possible
performance degradation of applications to ensure quality of service.
This thesis presents an empirical evaluation of power consumption for scientific
computing applications in multicore systems. Three types of applications
are studied, in single and combined executions on Intel and AMD servers, for
evaluating the overall power consumption of each application. The main results
indicate that power consumption behavior has a strong dependency with
the type of application. Additional performance analysis shows that the best
load of the server regarding energy efficiency depends on the type of the applications,
with efficiency decreasing in heavily loaded situations. These results
allow formulating models to characterize applications according to power consumption,
efficiency, and resource sharing, which provide useful information
for resource management and scheduling policies. Several scheduling strategies
are evaluated using the proposed energy model over realistic scientific computing
workloads. Results confirm that strategies that maximize host utilization
provide the best energy efficiency.Agencia Nacional de Investigación e Innovación FSE_1_2017_1_14478
On benchmarking of deep learning systems: software engineering issues and reproducibility challenges
Since AlexNet won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012, Deep Learning (and Machine Learning/AI in general) gained an exponential interest.
Nowadays, their adoption spreads over numerous sectors, like automotive, robotics, healthcare and finance.
The ML advancement goes in pair with the quality improvement delivered by those solutions.
However, those ameliorations are not for free: ML algorithms always require an increasing computational power, which pushes computer engineers to develop new devices capable of coping with this demand for performance.
To foster the evolution of DSAs, and thus ML research, it is key to make it easy to experiment and compare them. This may be challenging since, even if the software built around these devices simplifies their usage, obtaining the best performance is not always straightforward.
The situation gets even worse when the experiments are not conducted in a reproducible way.
Even though the importance of reproducibility for the research is evident, it does not directly translate into reproducible experiments. In fact, as already shown by previous studies regarding other research fields, also ML is facing a reproducibility crisis.
Our work addresses the topic of reproducibility of ML applications. Reproducibility in this context has two aspects: results reproducibility and performance reproducibility. While the reproducibility of the results is mandatory, performance reproducibility cannot be neglected because high-performance device usage causes cost. To understand how the ML situation is regarding reproducibility of performance, we reproduce results published for the MLPerf suite, which seems to be the most used machine learning benchmark.
Because of the wide range of devices and frameworks used in different benchmark submissions, we focus on a subset of accuracy and performance results submitted to the MLPerf Inference benchmark, presenting a detailed analysis of the difficulties a scientist may find when trying to reproduce such a benchmark and a possible solution using our workflow tool for experiment reproducibility: PROVA!.
We designed PROVA! to support the reproducibility in traditional HPC experiments, but we will show how we extended it to be used as a 'driver' for MLPerf benchmark applications.
The PROVA! driver mode allows us to experiment with different versions of the MLPerf Inference benchmark switching among different hardware and software combinations and compare them in a reproducible way.
In the last part, we will present the results of our reproducibility study, demonstrating the importance of having a support tool to reproduce and extend original experiments getting deeper knowledge about performance behaviours
An In-Depth Investigation of Performance Characteristics of Hyperledger Fabric
Private permissioned blockchains, such as Hyperledger Fabric, are widely
deployed across the industry to facilitate cross-organizational processes and
promise improved performance compared to their public counterparts. However,
the lack of empirical and theoretical results prevent precise prediction of the
real-world performance. We address this gap by conducting an in-depth
performance analysis of Hyperledger Fabric. The paper presents a detailed
compilation of various performance characteristics using an enhanced version of
the Distributed Ledger Performance Scan. Researchers and practitioners alike
can use the results as guidelines to better configure and implement their
blockchains and utilize the DLPS framework to conduct their measurements
Towards a User-Oriented Benchmark for Transport Protocols Comparison in very High Speed Networks
Standard TCP faces some performance limitations in very high speed wide area networks, mainly due to a long end-to-end feedback loop and a conservative behaviour with respect to congestion. Many TCP variants have been proposed to overcome these limitations. However, TCP is a complex protocol with many user-configurable parameters and a range of different implementations. It is then important to define measurement methods so that the transport services and protocols can evolve guided by scientific principles and compared quantitatively. The goal of this report is to present some steps towards a user-oriented benchmark, called ITB, for high speed transport protocols comparison. We first present and analyse some results reported in the literature. From this study we identify classes of representative applications and useful metrics. We then isolate infrastructure parameters and traffic factors which influence the protocol behaviour. This enable us to define scenario capturing and synthesising comprehensive and useful properties. We finally illustrate this proposal by preliminary results obtained on our experimental environment, Grid'5000, we have built and are using for contributing in this benchmark design
End-to-End Application Cloning for Distributed Cloud Microservices with Ditto
We present Ditto, an automated framework for cloning end-to-end cloud
applications, both monolithic and microservices, which captures I/O and network
activity, as well as kernel operations, in addition to application logic. Ditto
takes a hierarchical approach to application cloning, starting with capturing
the dependency graph across distributed services, to recreating each tier's
control/data flow, and finally generating system calls and assembly that mimics
the individual applications. Ditto does not reveal the logic of the original
application, facilitating publicly sharing clones of production services with
hardware vendors, cloud providers, and the research community.
We show that across a diverse set of single- and multi-tier applications,
Ditto accurately captures their CPU and memory characteristics as well as their
high-level performance metrics, is portable across platforms, and facilitates a
wide range of system studies
Faithful reproduction of network experiments
The proliferation of cloud computing has compelled the research community to rethink fundamental aspects of network systems and architectures. However, the tools commonly used to evaluate new ideas have not kept abreast of the latest developments. Common simulation and emulation frameworks fail to provide scalability, fidelity, reproducibility and execute unmodified code, all at the same time.
We present SELENA, a Xen-based network emulation framework that offers fully reproducible experiments via its automation interface and supports the use of unmodified guest operating systems. This allows out-of-the-box compatibility with common applications and OS components, such as network stacks and filesystems. In order to faithfully emulate faster and larger networks, SELENA adopts the technique of time-dilation and transparently slows down the passage of time for guest operating systems. This technique effectively virtualizes the availability of host’s hardware resources and allows the replication of scenarios with increased I/O and computational demands. Users can directly control the tradeoff between fidelity and running-times via intuitive tuning knobs. We evaluate the ability of SELENA to faithfully replicate the behaviour of real systems and compare it against existing popular experimentation platforms. Our results suggest that SELENA can accurately model networks with aggregate link speeds of 44 Gbps or more, while improving by four times the execution time in comparison to ns3 and exhibits near-linear scaling properties.This is the author accepted manuscript. The final version is available from ACM via http://dx.doi.org/10.1145/2658260.265827
- …