4 research outputs found

    Zeroing memory deallocator to reduce checkpoint sizes in virtualized HPC environments

    Get PDF
    Virtualization has become an indispensable tool in data centers and cloud environments to flexibly assign virtual machines (VMs) to resources. Virtualization also becomes more and more attractive for high-performance computing (HPC). This is mainly due to the strong isolation of VMs which enables: (1) the sharing of cluster nodes and optimization of the system’s overall utilization; (2) load balancing by means of migrations due to the reduction of residual dependencies; and (3) the creation of system-level checkpoints increasing the fault tolerance in an application-transparent way. On the downside, the additional virtualization layer conceals information that is only available on the process level. This information has a direct influence on the checkpoint size which should be kept as small as possible. In this paper, we propose a novel technique for checkpoint size reduction in virtualized environments. We exploit the fact that the hypervisor detects zero pages which are omitted when capturing a checkpoint. Moreover, compression techniques are applied for a further reduction of the checkpoint size. We therefore fill freed memory regions with zeros supporting both the zero-page detection and the compression. We evaluate our approach by taking the example of HPC applications. The results reveal a reduction of the checkpoint size by up to 9% when compression is disabled in the hypervisor and up to 49% with compression enabled. Furthermore, memory zeroing is able to reduce VM migration time by up to 10% when compression is disabled and by up to 60% when compression is enabled

    Building Scientific Clouds: The Distributed, Peer-to-Peer Approach

    Get PDF
    The Scientific community is constantly growing in size. The increase in personnel number and projects have resulted in the requirement of large amounts of storage, CPU power and other computing resources. It has also become necessary to acquire these resources in an affordable manner that is sensitive to work loads. In this thesis, the author presents a novel approach that provides the communication platform that will support such large scale scientific projects. These resources could be difficult to acquire due to NATs, firewalls and other site-based restrictions and policies. Methods used to overcome these hurdles have been discussed in detail along with other advantages of using such a system, which include: increased availability of necessary computing infrastructure; increased grid resource utilization; reduced user dependability; reduced job execution time. Experiments conducted included local infrastructure on the Clemson University Campus as well as resources provided by other federated grid sites

    Improved self-management of datacenter systems applying machine learning

    Get PDF
    Autonomic Computing is a Computer Science and Technologies research area, originated during mid 2000's. It focuses on optimization and improvement of complex distributed computing systems through self-control and self-management. As distributed computing systems grow in complexity, like multi-datacenter systems in cloud computing, the system operators and architects need more help to understand, design and optimize manually these systems, even more when these systems are distributed along the world and belong to different entities and authorities. Self-management lets these distributed computing systems improve their resource and energy management, a very important issue when resources have a cost, by obtaining, running or maintaining them. Here we propose to improve Autonomic Computing techniques for resource management by applying modeling and prediction methods from Machine Learning and Artificial Intelligence. Machine Learning methods can find accurate models from system behaviors and often intelligible explanations to them, also predict and infer system states and values. These models obtained from automatic learning have the advantage of being easily updated to workload or configuration changes by re-taking examples and re-training the predictors. So employing automatic modeling and predictive abilities, we can find new methods for making "intelligent" decisions and discovering new information and knowledge from systems. This thesis departs from the state of the art, where management is based on administrators expertise, well known data, ad-hoc studied algorithms and models, and elements to be studied from computing machine point of view; to a novel state of the art where management is driven by models learned from the same system, providing useful feedback, making up for incomplete, missing or uncertain data, from a global network of datacenters point of view. - First of all, we cover the scenario where the decision maker works knowing all pieces of information from the system: how much will each job consume, how is and will be the desired quality of service, what are the deadlines for the workload, etc. All of this focusing on each component and policy of each element involved in executing these jobs. -Then we focus on the scenario where instead of fixed oracles that provide us information from an expert formula or set of conditions, machine learning is used to create these oracles. Here we look at components and specific details while some part of the information is not known and must be learned and predicted. - We reduce the problem of optimizing resource allocations and requirements for virtualized web-services to a mathematical problem, indicating each factor, variable and element involved, also all the constraints the scheduling process must attend to. The scheduling problem can be modeled as a Mixed Integer Linear Program. Here we face an scenario of a full datacenter, further we introduce some information prediction. - We complement the model by expanding the predicted elements, studying the main resources (this is CPU, Memory and IO) that can suffer from noise, inaccuracy or unavailability. Once learning predictors for certain components let the decision making improve, the system can become more ¿expert-knowledge independent¿ and research can focus on an scenario where all the elements provide noisy, uncertainty or private information. Also we introduce to the management optimization new factors as for each datacenter context and costs may change, turning the model as "multi-datacenter" - Finally, we review of the cost of placing datacenters depending on green energy sources, and distribute the load according to green energy availability

    Economic regulation for multi tenant infrastructures

    Get PDF
    Large scale computing infrastructures need scalable and effi cient resource allocation mechanisms to ful l the requirements of its participants and applications while the whole system is regulated to work e ciently. Computational markets provide e fficient allocation mechanisms that aggregate information from multiple sources in large, dynamic and complex systems where there is not a single source with complete information. They have been proven to be successful in matching resource demand and resource supply in the presence of sel sh multi-objective and utility-optimizing users and sel sh pro t-optimizing providers. However, global infrastructure metrics which may not directly affect participants of the computational market still need to be addressed -a.k.a. economic externalities like load balancing or energy-efficiency. In this thesis, we point out the need to address these economic externalities, and we design and evaluate appropriate regulation mechanisms from di erent perspectives on top of existing economic models, to incorporate a wider range of objective metrics not considered otherwise. Our main contributions in this thesis are threefold; fi rst, we propose a taxation mechanism that addresses the resource congestion problem e ffectively improving the balance of load among resources when correlated economic preferences are present; second, we propose a game theoretic model with complete information to derive an algorithm to aid resource providers to scale up and down resource supply so energy-related costs can be reduced; and third, we relax our previous assumptions about complete information on the resource provider side and design an incentive-compatible mechanism to encourage users to truthfully report their resource requirements effectively assisting providers to make energy-eff cient allocations while providing a dynamic allocation mechanism to users.Les infraestructures computacionals de gran escala necessiten mecanismes d’assignació de recursos escalables i eficients per complir amb els requisits computacionals de tots els seus participants, assegurant-se de que el sistema és regulat apropiadament per a que funcioni de manera efectiva. Els mercats computacionals són mecanismes d’assignació de recursos eficients que incorporen informació de diferents fonts considerant sistemes de gran escala, complexos i dinàmics on no existeix una única font que proveeixi informació completa de l'estat del sistema. Aquests mercats computacionals han demostrat ser exitosos per acomodar la demanda de recursos computacionals amb la seva oferta quan els seus participants son considerats estratègics des del punt de vist de teoria de jocs. Tot i això existeixen mètriques a nivell global sobre la infraestructura que no tenen per que influenciar els usuaris a priori de manera directa. Així doncs, aquestes externalitats econòmiques com poden ser el balanceig de càrrega o la eficiència energètica, conformen una línia d’investigació que cal explorar. En aquesta tesi, presentem i descrivim la problemàtica derivada d'aquestes externalitats econòmiques. Un cop establert el marc d’actuació, dissenyem i avaluem mecanismes de regulació apropiats basats en models econòmics existents per resoldre aquesta problemàtica des de diferents punts de vista per incorporar un ventall més ampli de mètriques objectiu que no havien estat considerades fins al moment. Les nostres contribucions principals tenen tres vessants: en primer lloc, proposem un mecanisme de regulació de tipus impositiu que tracta de mitigar l’aparició de recursos sobre-explotats que, efectivament, millora el balanceig de la càrrega de treball entre els recursos disponibles; en segon lloc, proposem un model teòric basat en teoria de jocs amb informació o completa que permet derivar un algorisme que facilita la tasca dels proveïdors de recursos per modi car a l'alça o a la baixa l'oferta de recursos per tal de reduir els costos relacionats amb el consum energètic; i en tercer lloc, relaxem la nostra assumpció prèvia sobre l’existència d’informació complerta per part del proveïdor de recursos i dissenyem un mecanisme basat en incentius per fomentar que els usuaris facin pública de manera verídica i explícita els seus requeriments computacionals, ajudant d'aquesta manera als proveïdors de recursos a fer assignacions eficients des del punt de vista energètic a la vegada que oferim un mecanisme l’assignació de recursos dinàmica als usuari
    corecore