123 research outputs found

    On Autonomic HPC Clouds

    Get PDF
    Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015.The long tail of science using HPC facilities is looking nowadays to instant available HPC Clouds as a viable alternative to the long waiting queues of supercomputing centers. While the name of HPC Cloud is suggesting a Cloud service, the current HPC-as-a-Service is mainly an offer of bar metal, better named cluster-on-demand. The elasticity and virtualization benefits of the Clouds are not exploited by HPC-as-a-Service. In this paper we discuss how the HPC Cloud offer can be improved from a particular point of view, of automation. After a reminder of the characteristics of the Autonomic Cloud, we project the requirements and expectations to what we name Autonomic HPC Clouds. Finally, we point towards the expected results of the latest research and development activities related to the topics that were identified.The work related to Autonomic HPC Clouds is supported by the European Commission under grant agreement H2020-6643946 (CloudLightning). The CLoudLightning project proposal was prepared by eight partner institutions, three of them as earlier partners in the COST Action IC1305 NESUS, benefiting from its inputs for the proposal. The section related to Autonomic Clouds is supported by the Romanian UEFISCDI under grant agreement PN-II-ID-PCE-2011- 3-0260 (AMICAS)

    Autonomous management of cost, performance, and resource uncertainty for migration of applications to infrastructure-as-a-service (IaaS) clouds

    Get PDF
    2014 Fall.Includes bibliographical references.Infrastructure-as-a-Service (IaaS) clouds abstract physical hardware to provide computing resources on demand as a software service. This abstraction leads to the simplistic view that computing resources are homogeneous and infinite scaling potential exists to easily resolve all performance challenges. Adoption of cloud computing, in practice however, presents many resource management challenges forcing practitioners to balance cost and performance tradeoffs to successfully migrate applications. These challenges can be broken down into three primary concerns that involve determining what, where, and when infrastructure should be provisioned. In this dissertation we address these challenges including: (1) performance variance from resource heterogeneity, virtualization overhead, and the plethora of vaguely defined resource types; (2) virtual machine (VM) placement, component composition, service isolation, provisioning variation, and resource contention for multitenancy; and (3) dynamic scaling and resource elasticity to alleviate performance bottlenecks. These resource management challenges are addressed through the development and evaluation of autonomous algorithms and methodologies that result in demonstrably better performance and lower monetary costs for application deployments to both public and private IaaS clouds. This dissertation makes three primary contributions to advance cloud infrastructure management for application hosting. First, it includes design of resource utilization models based on step-wise multiple linear regression and artificial neural networks that support prediction of better performing component compositions. The total number of possible compositions is governed by Bell's Number that results in a combinatorially explosive search space. Second, it includes algorithms to improve VM placements to mitigate resource heterogeneity and contention using a load-aware VM placement scheduler, and autonomous detection of under-performing VMs to spur replacement. Third, it describes a workload cost prediction methodology that harnesses regression models and heuristics to support determination of infrastructure alternatives that reduce hosting costs. Our methodology achieves infrastructure predictions with an average mean absolute error of only 0.3125 VMs for multiple workloads

    SLA-driven dynamic cloud resource management

    Full text link
    As the size and complexity of Cloud systems increase, the manual management of these solutions becomes a challenging issue as more personnel, resources and expertise are needed. Service Level Agreement (SLA)- aware autonomic cloud solutions enable managing large scale infrastructure management meanwhile supporting multiple dynamic requirement from users. This paper contributes to these topics by the introduction of Cloudcompaas, a SLA-aware PaaS Cloud platform that manages the complete resource lifecycle. This platform features an extension of the SLA specification WS-Agreement, tailored to the specific needs of Cloud Computing. In particular, Cloudcompaas enables Cloud providers with a generic SLA model to deal with higher-level metrics, closer to end-user perception, and with flexible composition of the requirements of multiple actors in the computational scene. Moreover, Cloudcompaas provides a framework for general Cloud computing applications that could be dynamically adapted to correct the QoS violations by using the elasticity features of Cloud infrastructures. The effectiveness of this solution is demonstrated in this paper through a simulation that considers several realistic workload profiles, where Cloudcompaas achieves minimum cost and maximum efficiency, under highly heterogeneous utilization patterns. © 2013 Elsevier B.V. All rights reserved.This work has been developed under the support of the program Formacion de Personal Investigador de Caracter Predoctoral grant number BFPI/2009/103, from the Conselleria d'Educacio of the Generalitat Valenciana. Also, the authors wish to thank the financial support received from The Spanish Ministry of Education and Science to develop the project 'CodeCloud', with reference TIN2010-17804.García García, A.; Blanquer Espert, I.; Hernández García, V. (2014). SLA-driven dynamic cloud resource management. Future Generation Computer Systems. 31:1-11. https://doi.org/10.1016/j.future.2013.10.005S1113

    Nefeli: Hint-Based Execution of Workloads in Clouds

    Full text link
    Abstract—Virtualization of computer systems has made feasi-ble the provision of entire distributed infrastructures in the form of services. Such services do not expose the internal operational and physical characteristics of the underlying machinery to either users or applications. In this way, infrastructures including computers in data-centers, clusters of workstations, and networks of machines are shrouded in “clouds”. Mainly through the deployment of virtual machines, such networks of computing nodes become cloud-computing environments. In this paper, we propose Nefeli, a virtual infrastructure gateway that is capable of effectively handling diverse workloads of jobs in cloud environments. By and large, users and their workloads remain agnostic to the internal features of clouds at all times. Exploiting execution patterns as well as logistical constraints, users provide Nefeli with hints for the handling of their jobs. Hints provide no hard requirements for application deployment in terms of pairing virtual-machines to specific physical cloud elements. Nefeli helps avoid bottlenecks within the cloud through the realization of viable virtual machine deployment mappings. As the types of jobs change over time, deployment mappings must follow suit. To this end, Nefeli offers mechanisms to migrate virtual machines as needed to adapt to changing performance needs. Using our prototype system, we show significant improvements in overall time needed and energy consumed for the execution of workloads in both simulated and real cloud computing environments. I

    A Middleware framework for self-adaptive large scale distributed services

    Get PDF
    Modern service-oriented applications demand the ability to adapt to changing conditions and unexpected situations while maintaining a required QoS. Existing self-adaptation approaches seem inadequate to address this challenge because many of their assumptions are not met on the large-scale, highly dynamic infrastructures where these applications are generally deployed on. The main motivation of our research is to devise principles that guide the construction of large scale self-adaptive distributed services. We aim to provide sound modeling abstractions based on a clear conceptual background, and their realization as a middleware framework that supports the development of such services. Taking the inspiration from the concepts of decentralized markets in economics, we propose a solution based on three principles: emergent self-organization, utility driven behavior and model-less adaptation. Based on these principles, we designed Collectives, a middleware framework which provides a comprehensive solution for the diverse adaptation concerns that rise in the development of distributed systems. We tested the soundness and comprehensiveness of the Collectives framework by implementing eUDON, a middleware for self-adaptive web services, which we then evaluated extensively by means of a simulation model to analyze its adaptation capabilities in diverse settings. We found that eUDON exhibits the intended properties: it adapts to diverse conditions like peaks in the workload and massive failures, maintaining its QoS and using efficiently the available resources; it is highly scalable and robust; can be implemented on existing services in a non-intrusive way; and do not require any performance model of the services, their workload or the resources they use. We can conclude that our work proposes a solution for the requirements of self-adaptation in demanding usage scenarios without introducing additional complexity. In that sense, we believe we make a significant contribution towards the development of future generation service-oriented applications.Las Aplicaciones Orientadas a Servicios modernas demandan la capacidad de adaptarse a condiciones variables y situaciones inesperadas mientras mantienen un cierto nivel de servio esperado (QoS). Los enfoques de auto-adaptación existentes parecen no ser adacuados debido a sus supuestos no se cumplen en infrastructuras compartidas de gran escala. La principal motivación de nuestra investigación es inerir un conjunto de principios para guiar el desarrollo de servicios auto-adaptativos de gran escala. Nuesto objetivo es proveer abstraciones de modelaje apropiadas, basadas en un marco conceptual claro, y su implemetnacion en un middleware que soporte el desarrollo de estos servicios. Tomando como inspiración conceptos económicos de mercados decentralizados, hemos propuesto una solución basada en tres principios: auto-organización emergente, comportamiento guiado por la utilidad y adaptación sin modelos. Basados en estos principios diseñamos Collectives, un middleware que proveer una solución exhaustiva para los diversos aspectos de adaptación que surgen en el desarrollo de sistemas distribuidos. La adecuación y completitud de Collectives ha sido provada por medio de la implementación de eUDON, un middleware para servicios auto-adaptativos, el ha sido evaluado de manera exhaustiva por medio de un modelo de simulación, analizando sus propiedades de adaptación en diversos escenarios de uso. Hemos encontrado que eUDON exhibe las propiedades esperadas: se adapta a diversas condiciones como picos en la carga de trabajo o fallos masivos, mateniendo su calidad de servicio y haciendo un uso eficiente de los recusos disponibles. Es altamente escalable y robusto; puedeoo ser implementado en servicios existentes de manera no intrusiva; y no requiere la obtención de un modelo de desempeño para los servicios. Podemos concluir que nuestro trabajo nos ha permitido desarrollar una solucion que aborda los requerimientos de auto-adaptacion en escenarios de uso exigentes sin introducir complejidad adicional. En este sentido, consideramos que nuestra propuesta hace una contribución significativa hacia el desarrollo de la futura generación de aplicaciones orientadas a servicios.Postprint (published version

    Adaptive monitoring and control framework in Application Service Management environment

    Get PDF
    The economics of data centres and cloud computing services have pushed hardware and software requirements to the limits, leaving only very small performance overhead before systems get into saturation. For Application Service Management–ASM, this carries the growing risk of impacting the execution times of various processes. In order to deliver a stable service at times of great demand for computational power, enterprise data centres and cloud providers must implement fast and robust control mechanisms that are capable of adapting to changing operating conditions while satisfying service–level agreements. In ASM practice, there are normally two methods for dealing with increased load, namely increasing computational power or releasing load. The first approach typically involves allocating additional machines, which must be available, waiting idle, to deal with high demand situations. The second approach is implemented by terminating incoming actions that are less important to new activity demand patterns, throttling, or rescheduling jobs. Although most modern cloud platforms, or operating systems, do not allow adaptive/automatic termination of processes, tasks or actions, it is administrators’ common practice to manually end, or stop, tasks or actions at any level of the system, such as at the level of a node, function, or process, or kill a long session that is executing on a database server. In this context, adaptive control of actions termination remains a significantly underutilised subject of Application Service Management and deserves further consideration. For example, this approach may be eminently suitable for systems with harsh execution time Service Level Agreements, such as real–time systems, or systems running under conditions of hard pressure on power supplies, systems running under variable priority, or constraints set up by the green computing paradigm. Along this line of work, the thesis investigates the potential of dimension relevance and metrics signals decomposition as methods that would enable more efficient action termination. These methods are integrated in adaptive control emulators and actuators powered by neural networks that are used to adjust the operation of the system to better conditions in environments with established goals seen from both system performance and economics perspectives. The behaviour of the proposed control framework is evaluated using complex load and service agreements scenarios of systems compatible with the requirements of on–premises, elastic compute cloud deployments, server–less computing, and micro–services architectures

    Automated Bidding in Computing Service Markets. Strategies, Architectures, Protocols

    Get PDF
    This dissertation contributes to the research on Computational Mechanism Design by providing novel theoretical and software models - a novel bidding strategy called Q-Strategy, which automates bidding processes in imperfect information markets, a software framework for realizing agents and bidding strategies called BidGenerator and a communication protocol called MX/CS, for expressing and exchanging economic and technical information in a market-based scheduling system

    Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015) Krakow, Poland

    Get PDF
    Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015
    corecore