1,536 research outputs found

    Quantifying cloud performance and dependability:Taxonomy, metric design, and emerging challenges

    Get PDF
    In only a decade, cloud computing has emerged from a pursuit for a service-driven information and communication technology (ICT), becoming a significant fraction of the ICT market. Responding to the growth of the market, many alternative cloud services and their underlying systems are currently vying for the attention of cloud users and providers. To make informed choices between competing cloud service providers, permit the cost-benefit analysis of cloud-based systems, and enable system DevOps to evaluate and tune the performance of these complex ecosystems, appropriate performance metrics, benchmarks, tools, and methodologies are necessary. This requires re-examining old system properties and considering new system properties, possibly leading to the re-design of classic benchmarking metrics such as expressing performance as throughput and latency (response time). In this work, we address these requirements by focusing on four system properties: (i) elasticity of the cloud service, to accommodate large variations in the amount of service requested, (ii) performance isolation between the tenants of shared cloud systems and resulting performance variability, (iii) availability of cloud services and systems, and (iv) the operational risk of running a production system in a cloud environment. Focusing on key metrics for each of these properties, we review the state-of-the-art, then select or propose new metrics together with measurement approaches. We see the presented metrics as a foundation toward upcoming, future industry-standard cloud benchmarks

    Report from GI-Dagstuhl Seminar 16394: Software Performance Engineering in the DevOps World

    Get PDF
    This report documents the program and the outcomes of GI-Dagstuhl Seminar 16394 "Software Performance Engineering in the DevOps World". The seminar addressed the problem of performance-aware DevOps. Both, DevOps and performance engineering have been growing trends over the past one to two years, in no small part due to the rise in importance of identifying performance anomalies in the operations (Ops) of cloud and big data systems and feeding these back to the development (Dev). However, so far, the research community has treated software engineering, performance engineering, and cloud computing mostly as individual research areas. We aimed to identify cross-community collaboration, and to set the path for long-lasting collaborations towards performance-aware DevOps. The main goal of the seminar was to bring together young researchers (PhD students in a later stage of their PhD, as well as PostDocs or Junior Professors) in the areas of (i) software engineering, (ii) performance engineering, and (iii) cloud computing and big data to present their current research projects, to exchange experience and expertise, to discuss research challenges, and to develop ideas for future collaborations

    Revisiting Actor Programming in C++

    Full text link
    The actor model of computation has gained significant popularity over the last decade. Its high level of abstraction makes it appealing for concurrent applications in parallel and distributed systems. However, designing a real-world actor framework that subsumes full scalability, strong reliability, and high resource efficiency requires many conceptual and algorithmic additives to the original model. In this paper, we report on designing and building CAF, the "C++ Actor Framework". CAF targets at providing a concurrent and distributed native environment for scaling up to very large, high-performance applications, and equally well down to small constrained systems. We present the key specifications and design concepts---in particular a message-transparent architecture, type-safe message interfaces, and pattern matching facilities---that make native actors a viable approach for many robust, elastic, and highly distributed developments. We demonstrate the feasibility of CAF in three scenarios: first for elastic, upscaling environments, second for including heterogeneous hardware like GPGPUs, and third for distributed runtime systems. Extensive performance evaluations indicate ideal runtime behaviour for up to 64 cores at very low memory footprint, or in the presence of GPUs. In these tests, CAF continuously outperforms the competing actor environments Erlang, Charm++, SalsaLite, Scala, ActorFoundry, and even the OpenMPI.Comment: 33 page

    Elasticity Measurement in CaaS Environments - Extending the Existing BUNGEE Elasticity Benchmark to AWS\u27s Elastic Container Service

    Get PDF
    Rapid elasticity and automatic scaling are core concepts of most current cloud computing systems. Elasticity describes how well and how fast cloud systems adapt to increases and decreases in workload. In parallel, software architectures are moving towards employing containerised microservices running on systems managed by container orchestration platforms. Cloud users who employ such container-based systems may want to compare the elasticity of different systems or system settings to ensure rapid elasticity and maintain service level objectives while avoiding over-provisioning. Previous research has established a variety of metrics to measure elasticity. Some existing benchmark tools are designed to measure elasticity in “Infrastructure as a Service” (IaaS) systems, but no research exists to date for measuring elasticity in systems based on containers and container orchestration. In this dissertation, an existing benchmark designed for IaaS systems, the BUNGEE benchmark developed at the University of Würzburg, was extended to be applicable to Amazon’s Elastic Container Service, a container-based cloud system. An experiment was conducted to test if the extension of the BUNGEE benchmark described in this dissertation delivers reproducible results and is therefore valid. For validation, the crucial phase of the benchmark - the system analysis phase - was run 32 times. It was established with statistical tests if the results vary by more than the acceptable level. Results indicate that there is some amount of variability, but it does not exceed the acceptable level and is consistent with the amount of performance variability encountered by other researchers in Amazon’s cloud systems. Therefore, it is concluded that the BUNGEE benchmark is likely applicable to container-based cloud systems. However, some parameters and configuration settings specific to container orchestration systems were identified that could impede reproducibility of results and should be considered in future experiments

    HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges

    Full text link
    High Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost-benefit of moving resource-intensive applications from on-premise environments to public cloud platforms. Industry trends show hybrid environments are the natural path to get the best of the on-premise and cloud resources---steady (and sensitive) workloads can run on on-premise resources and peak demand can leverage remote resources in a pay-as-you-go manner. Nevertheless, there are plenty of questions to be answered in HPC cloud, which range from how to extract the best performance of an unknown underlying platform to what services are essential to make its usage easier. Moreover, the discussion on the right pricing and contractual models to fit small and large users is relevant for the sustainability of HPC clouds. This paper brings a survey and taxonomy of efforts in HPC cloud and a vision on what we believe is ahead of us, including a set of research challenges that, once tackled, can help advance businesses and scientific discoveries. This becomes particularly relevant due to the fast increasing wave of new HPC applications coming from big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR

    DeepScaler: Holistic Autoscaling for Microservices Based on Spatiotemporal GNN with Adaptive Graph Learning

    Full text link
    Autoscaling functions provide the foundation for achieving elasticity in the modern cloud computing paradigm. It enables dynamic provisioning or de-provisioning resources for cloud software services and applications without human intervention to adapt to workload fluctuations. However, autoscaling microservice is challenging due to various factors. In particular, complex, time-varying service dependencies are difficult to quantify accurately and can lead to cascading effects when allocating resources. This paper presents DeepScaler, a deep learning-based holistic autoscaling approach for microservices that focus on coping with service dependencies to optimize service-level agreements (SLA) assurance and cost efficiency. DeepScaler employs (i) an expectation-maximization-based learning method to adaptively generate affinity matrices revealing service dependencies and (ii) an attention-based graph convolutional network to extract spatio-temporal features of microservices by aggregating neighbors' information of graph-structural data. Thus DeepScaler can capture more potential service dependencies and accurately estimate the resource requirements of all services under dynamic workloads. It allows DeepScaler to reconfigure the resources of the interacting services simultaneously in one resource provisioning operation, avoiding the cascading effect caused by service dependencies. Experimental results demonstrate that our method implements a more effective autoscaling mechanism for microservice that not only allocates resources accurately but also adapts to dependencies changes, significantly reducing SLA violations by an average of 41% at lower costs.Comment: To be published in the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023

    Resource provisioning in Science Clouds: Requirements and challenges

    Full text link
    Cloud computing has permeated into the information technology industry in the last few years, and it is emerging nowadays in scientific environments. Science user communities are demanding a broad range of computing power to satisfy the needs of high-performance applications, such as local clusters, high-performance computing systems, and computing grids. Different workloads are needed from different computational models, and the cloud is already considered as a promising paradigm. The scheduling and allocation of resources is always a challenging matter in any form of computation and clouds are not an exception. Science applications have unique features that differentiate their workloads, hence, their requirements have to be taken into consideration to be fulfilled when building a Science Cloud. This paper will discuss what are the main scheduling and resource allocation challenges for any Infrastructure as a Service provider supporting scientific applications

    V-Cache: Towards Flexible Resource Provisioning for Multi-tier Applications in IaaS Clouds

    Full text link
    Abstract—Although the resource elasticity offered by Infrastructure-as-a-Service (IaaS) clouds opens up opportunities for elastic application performance, it also poses challenges to application management. Cluster applications, such as multi-tier websites, further complicates the management requiring not only accurate capacity planning but also proper partitioning of the resources into a number of virtual machines. Instead of burdening cloud users with complex management, we move the task of determining the optimal resource configuration for cluster applications to cloud providers. We find that a structural reorganization of multi-tier websites, by adding a caching tier which runs on resources debited from the original resource budget, significantly boosts application performance and reduces resource usage. We propose V-Cache, a machine learning based approach to flexible provisioning of resources for multi-tier applications in clouds. V-Cache transparently places a caching proxy in front of the application. It uses a genetic algorithm to identify the incoming requests that benefit most from caching and dynamically resizes the cache space to accommodate these requests. We develop a reinforcement learning algorithm to optimally allocate the remaining capacity to other tiers. We have implemented V-Cache on a VMware-based cloud testbed. Exper-iment results with the RUBiS and WikiBench benchmarks show that V-Cache outperforms a representative capacity management scheme and a cloud-cache based resource provisioning approach by at least 15 % in performance, and achieves at least 11 % and 21 % savings on CPU and memory resources, respectively. I
    • …
    corecore