Search CORE

4 research outputs found

Towards self-configuring hardware for distributed computer systems

Author: Computer Sciences
Emmett Witchel
Jonathan Wildstrom
Mike Dahlindepartment
Peter Stone
Raymond J. Mooney
Publication venue: IEEE Computer Society
Publication date: 01/01/2005
Field of study

High-end servers that can be partitioned into logical subsystems and repartitioned on the fly are now becoming available. This development raises the possibility of reconfiguring distributed systems online to optimize for dynamically changing workloads. This paper presents the initial steps towards a system that can learn to alter its current configuration in reaction to the current workload. In particular, the advantages of shifting CPU and memory resources online are considered. Investigation on a publically available multi-machine, multi-process distributed system (the online transaction processing benchmark TPC-W) indicates that there is a real performance benefit to reconfiguration in reaction to workload changes. A learning framework is presented that does not require any instrumentation of the middleware, nor any special instrumentation of the operating system; rather, it learns to identify preferable configurations as well as their quantitative performance effects from system behavior as reported by standard monitoring tools. Initial results using the WEKA machine learning package suggest that automatic adaptive configuration can provide measurable performance benefits over any fixed configuration. 1

CiteSeerX

Improved self-management of datacenter systems applying machine learning

Author: Berral García Josep Lluís
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2013
Field of study

Autonomic Computing is a Computer Science and Technologies research area, originated during mid 2000's. It focuses on optimization and improvement of complex distributed computing systems through self-control and self-management. As distributed computing systems grow in complexity, like multi-datacenter systems in cloud computing, the system operators and architects need more help to understand, design and optimize manually these systems, even more when these systems are distributed along the world and belong to different entities and authorities. Self-management lets these distributed computing systems improve their resource and energy management, a very important issue when resources have a cost, by obtaining, running or maintaining them. Here we propose to improve Autonomic Computing techniques for resource management by applying modeling and prediction methods from Machine Learning and Artificial Intelligence. Machine Learning methods can find accurate models from system behaviors and often intelligible explanations to them, also predict and infer system states and values. These models obtained from automatic learning have the advantage of being easily updated to workload or configuration changes by re-taking examples and re-training the predictors. So employing automatic modeling and predictive abilities, we can find new methods for making "intelligent" decisions and discovering new information and knowledge from systems. This thesis departs from the state of the art, where management is based on administrators expertise, well known data, ad-hoc studied algorithms and models, and elements to be studied from computing machine point of view; to a novel state of the art where management is driven by models learned from the same system, providing useful feedback, making up for incomplete, missing or uncertain data, from a global network of datacenters point of view. - First of all, we cover the scenario where the decision maker works knowing all pieces of information from the system: how much will each job consume, how is and will be the desired quality of service, what are the deadlines for the workload, etc. All of this focusing on each component and policy of each element involved in executing these jobs. -Then we focus on the scenario where instead of fixed oracles that provide us information from an expert formula or set of conditions, machine learning is used to create these oracles. Here we look at components and specific details while some part of the information is not known and must be learned and predicted. - We reduce the problem of optimizing resource allocations and requirements for virtualized web-services to a mathematical problem, indicating each factor, variable and element involved, also all the constraints the scheduling process must attend to. The scheduling problem can be modeled as a Mixed Integer Linear Program. Here we face an scenario of a full datacenter, further we introduce some information prediction. - We complement the model by expanding the predicted elements, studying the main resources (this is CPU, Memory and IO) that can suffer from noise, inaccuracy or unavailability. Once learning predictors for certain components let the decision making improve, the system can become more ¿expert-knowledge independent¿ and research can focus on an scenario where all the elements provide noisy, uncertainty or private information. Also we introduce to the management optimization new factors as for each datacenter context and costs may change, turning the model as "multi-datacenter" - Finally, we review of the cost of placing datacenters depending on green energy sources, and distribute the load according to green energy availability

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Organic Service-Level Management in Service-Oriented Environments

Author: Liu Lei
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2011
Field of study

Dynamic service-oriented environments (SOEs) are characterised by a large number of heterogeneous service components that are expected to support the business as a whole. The present work provides a negotiation-based approach to facilitate automated and multi-level service-level management in an SOE, where each component autonomously arranges its contribution to the whole operational goals. Evaluation experiments have shown an increased responsiveness and stability of an SOE in case of changes

KITopen

Directory of Open Access Books (DOAB)