327 research outputs found

    Technical Report: A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters

    Get PDF
    To improve customer experience, datacenter operators offer support for simplifying application and resource management. For example, running workloads of workflows on behalf of customers is desirable, but requires increasingly more sophisticated autoscaling policies, that is, policies that dynamically provision resources for the customer. Although selecting and tuning autoscaling policies is a challenging task for datacenter operators, so far relatively few studies investigate the performance of autoscaling for workloads of workflows. Complementing previous knowledge, in this work we propose the first comprehensive performance study in the field. Using trace-based simulation, we compare state-of-the-art autoscaling policies across multiple application domains, workload arrival patterns (e.g., burstiness), and system utilization levels. We further investigate the interplay between autoscaling and regular allocation policies, and the complexity cost of autoscaling. Our quantitative study focuses not only on traditional performance metrics and on state-of-the-art elasticity metrics, but also on time- and memory-related autoscaling-complexity metrics. Our main results give strong and quantitative evidence about previously unreported operational behavior, for example, that autoscaling policies perform differently across application domains and by how much they differ.Comment: Technical Report for the CCGrid 2018 submission "A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters

    Traffic generation for benchmarking data centre networks

    Get PDF
    Benchmarking is commonly used in research fields, such as computer architecture design and machine learning, as a powerful paradigm for rigorously assessing, comparing, and developing novel technologies. However, the data centre network (DCN) community lacks a standard open-access and reproducible traffic generation framework for benchmark workload generation. Driving factors behind this include the proprietary nature of traffic traces, the limited detail and quantity of open-access network-level data sets, the high cost of real world experimentation, and the poor reproducibility and fidelity of synthetically generated traffic. This is curtailing the community's understanding of existing systems and hindering the ability with which novel technologies, such as optical DCNs, can be developed, compared, and tested. We present TrafPy; an open-access framework for generating both realistic and custom DCN traffic traces. TrafPy is compatible with any simulation, emulation, or experimentation environment, and can be used for standardised benchmarking and for investigating the properties and limitations of network systems such as schedulers, switches, routers, and resource managers. We give an overview of the TrafPy traffic generation framework, and provide a brief demonstration of its efficacy through an investigation into the sensitivity of some canonical scheduling algorithms to varying traffic trace characteristics in the context of optical DCNs. TrafPy is open-sourced via GitHub and all data associated with this manuscript via RDR

    Empirical characterization and modeling of power consumption and energy aware scheduling in data centers

    Get PDF
    Energy-efficient management is key in modern data centers in order to reduce operational cost and environmental contamination. Energy management and renewable energy utilization are strategies to optimize energy consumption in high-performance computing. In any case, understanding the power consumption behavior of physical servers in datacenter is fundamental to implement energy-aware policies effectively. These policies should deal with possible performance degradation of applications to ensure quality of service. This thesis presents an empirical evaluation of power consumption for scientific computing applications in multicore systems. Three types of applications are studied, in single and combined executions on Intel and AMD servers, for evaluating the overall power consumption of each application. The main results indicate that power consumption behavior has a strong dependency with the type of application. Additional performance analysis shows that the best load of the server regarding energy efficiency depends on the type of the applications, with efficiency decreasing in heavily loaded situations. These results allow formulating models to characterize applications according to power consumption, efficiency, and resource sharing, which provide useful information for resource management and scheduling policies. Several scheduling strategies are evaluated using the proposed energy model over realistic scientific computing workloads. Results confirm that strategies that maximize host utilization provide the best energy efficiency.Agencia Nacional de Investigación e Innovación FSE_1_2017_1_14478

    On benchmarking of deep learning systems: software engineering issues and reproducibility challenges

    Get PDF
    Since AlexNet won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012, Deep Learning (and Machine Learning/AI in general) gained an exponential interest. Nowadays, their adoption spreads over numerous sectors, like automotive, robotics, healthcare and finance. The ML advancement goes in pair with the quality improvement delivered by those solutions. However, those ameliorations are not for free: ML algorithms always require an increasing computational power, which pushes computer engineers to develop new devices capable of coping with this demand for performance. To foster the evolution of DSAs, and thus ML research, it is key to make it easy to experiment and compare them. This may be challenging since, even if the software built around these devices simplifies their usage, obtaining the best performance is not always straightforward. The situation gets even worse when the experiments are not conducted in a reproducible way. Even though the importance of reproducibility for the research is evident, it does not directly translate into reproducible experiments. In fact, as already shown by previous studies regarding other research fields, also ML is facing a reproducibility crisis. Our work addresses the topic of reproducibility of ML applications. Reproducibility in this context has two aspects: results reproducibility and performance reproducibility. While the reproducibility of the results is mandatory, performance reproducibility cannot be neglected because high-performance device usage causes cost. To understand how the ML situation is regarding reproducibility of performance, we reproduce results published for the MLPerf suite, which seems to be the most used machine learning benchmark. Because of the wide range of devices and frameworks used in different benchmark submissions, we focus on a subset of accuracy and performance results submitted to the MLPerf Inference benchmark, presenting a detailed analysis of the difficulties a scientist may find when trying to reproduce such a benchmark and a possible solution using our workflow tool for experiment reproducibility: PROVA!. We designed PROVA! to support the reproducibility in traditional HPC experiments, but we will show how we extended it to be used as a 'driver' for MLPerf benchmark applications. The PROVA! driver mode allows us to experiment with different versions of the MLPerf Inference benchmark switching among different hardware and software combinations and compare them in a reproducible way. In the last part, we will present the results of our reproducibility study, demonstrating the importance of having a support tool to reproduce and extend original experiments getting deeper knowledge about performance behaviours

    An In-Depth Investigation of Performance Characteristics of Hyperledger Fabric

    Get PDF
    Private permissioned blockchains, such as Hyperledger Fabric, are widely deployed across the industry to facilitate cross-organizational processes and promise improved performance compared to their public counterparts. However, the lack of empirical and theoretical results prevent precise prediction of the real-world performance. We address this gap by conducting an in-depth performance analysis of Hyperledger Fabric. The paper presents a detailed compilation of various performance characteristics using an enhanced version of the Distributed Ledger Performance Scan. Researchers and practitioners alike can use the results as guidelines to better configure and implement their blockchains and utilize the DLPS framework to conduct their measurements

    Towards a User-Oriented Benchmark for Transport Protocols Comparison in very High Speed Networks

    Get PDF
    Standard TCP faces some performance limitations in very high speed wide area networks, mainly due to a long end-to-end feedback loop and a conservative behaviour with respect to congestion. Many TCP variants have been proposed to overcome these limitations. However, TCP is a complex protocol with many user-configurable parameters and a range of different implementations. It is then important to define measurement methods so that the transport services and protocols can evolve guided by scientific principles and compared quantitatively. The goal of this report is to present some steps towards a user-oriented benchmark, called ITB, for high speed transport protocols comparison. We first present and analyse some results reported in the literature. From this study we identify classes of representative applications and useful metrics. We then isolate infrastructure parameters and traffic factors which influence the protocol behaviour. This enable us to define scenario capturing and synthesising comprehensive and useful properties. We finally illustrate this proposal by preliminary results obtained on our experimental environment, Grid'5000, we have built and are using for contributing in this benchmark design

    End-to-End Application Cloning for Distributed Cloud Microservices with Ditto

    Full text link
    We present Ditto, an automated framework for cloning end-to-end cloud applications, both monolithic and microservices, which captures I/O and network activity, as well as kernel operations, in addition to application logic. Ditto takes a hierarchical approach to application cloning, starting with capturing the dependency graph across distributed services, to recreating each tier's control/data flow, and finally generating system calls and assembly that mimics the individual applications. Ditto does not reveal the logic of the original application, facilitating publicly sharing clones of production services with hardware vendors, cloud providers, and the research community. We show that across a diverse set of single- and multi-tier applications, Ditto accurately captures their CPU and memory characteristics as well as their high-level performance metrics, is portable across platforms, and facilitates a wide range of system studies

    Faithful reproduction of network experiments

    Get PDF
    The proliferation of cloud computing has compelled the research community to rethink fundamental aspects of network systems and architectures. However, the tools commonly used to evaluate new ideas have not kept abreast of the latest developments. Common simulation and emulation frameworks fail to provide scalability, fidelity, reproducibility and execute unmodified code, all at the same time. We present SELENA, a Xen-based network emulation framework that offers fully reproducible experiments via its automation interface and supports the use of unmodified guest operating systems. This allows out-of-the-box compatibility with common applications and OS components, such as network stacks and filesystems. In order to faithfully emulate faster and larger networks, SELENA adopts the technique of time-dilation and transparently slows down the passage of time for guest operating systems. This technique effectively virtualizes the availability of host’s hardware resources and allows the replication of scenarios with increased I/O and computational demands. Users can directly control the tradeoff between fidelity and running-times via intuitive tuning knobs. We evaluate the ability of SELENA to faithfully replicate the behaviour of real systems and compare it against existing popular experimentation platforms. Our results suggest that SELENA can accurately model networks with aggregate link speeds of 44 Gbps or more, while improving by four times the execution time in comparison to ns3 and exhibits near-linear scaling properties.This is the author accepted manuscript. The final version is available from ACM via http://dx.doi.org/10.1145/2658260.265827
    • …
    corecore