265 research outputs found

    Scrooge Attack: Undervolting ARM Processors for Profit

    Full text link
    Latest ARM processors are approaching the computational power of x86 architectures while consuming much less energy. Consequently, supply follows demand with Amazon EC2, Equinix Metal and Microsoft Azure offering ARM-based instances, while Oracle Cloud Infrastructure is about to add such support. We expect this trend to continue, with an increasing number of cloud providers offering ARM-based cloud instances. ARM processors are more energy-efficient leading to substantial electricity savings for cloud providers. However, a malicious cloud provider could intentionally reduce the CPU voltage to further lower its costs. Running applications malfunction when the undervolting goes below critical thresholds. By avoiding critical voltage regions, a cloud provider can run undervolted instances in a stealthy manner. This practical experience report describes a novel attack scenario: an attack launched by the cloud provider against its users to aggressively reduce the processor voltage for saving energy to the last penny. We call it the Scrooge Attack and show how it could be executed using ARM-based computing instances. We mimic ARM-based cloud instances by deploying our own ARM-based devices using different generations of Raspberry Pi. Using realistic and synthetic workloads, we demonstrate to which degree of aggressiveness the attack is relevant. The attack is unnoticeable by our detection method up to an offset of -50mV. We show that the attack may even remain completely stealthy for certain workloads. Finally, we propose a set of client-based detection methods that can identify undervolted instances. We support experimental reproducibility and provide instructions to reproduce our results.Comment: European Commission Project: LEGaTO - Low Energy Toolset for Heterogeneous Computing (EC-H2020-780681

    Reliable and energy efficient resource provisioning in cloud computing systems

    Get PDF
    Cloud Computing has revolutionized the Information Technology sector by giving computing a perspective of service. The services of cloud computing can be accessed by users not knowing about the underlying system with easy-to-use portals. To provide such an abstract view, cloud computing systems have to perform many complex operations besides managing a large underlying infrastructure. Such complex operations confront service providers with many challenges such as security, sustainability, reliability, energy consumption and resource management. Among all the challenges, reliability and energy consumption are two key challenges focused on in this thesis because of their conflicting nature. Current solutions either focused on reliability techniques or energy efficiency methods. But it has been observed that mechanisms providing reliability in cloud computing systems can deteriorate the energy consumption. Adding backup resources and running replicated systems provide strong fault tolerance but also increase energy consumption. Reducing energy consumption by running resources on low power scaling levels or by reducing the number of active but idle sitting resources such as backup resources reduces the system reliability. This creates a critical trade-off between these two metrics that are investigated in this thesis. To address this problem, this thesis presents novel resource management policies which target the provisioning of best resources in terms of reliability and energy efficiency and allocate them to suitable virtual machines. A mathematical framework showing interplay between reliability and energy consumption is also proposed in this thesis. A formal method to calculate the finishing time of tasks running in a cloud computing environment impacted with independent and correlated failures is also provided. The proposed policies adopted various fault tolerance mechanisms while satisfying the constraints such as task deadlines and utility values. This thesis also provides a novel failure-aware VM consolidation method, which takes the failure characteristics of resources into consideration before performing VM consolidation. All the proposed resource management methods are evaluated by using real failure traces collected from various distributed computing sites. In order to perform the evaluation, a cloud computing framework, 'ReliableCloudSim' capable of simulating failure-prone cloud computing systems is developed. The key research findings and contributions of this thesis are: 1. If the emphasis is given only to energy optimization without considering reliability in a failure prone cloud computing environment, the results can be contrary to the intuitive expectations. Rather than reducing energy consumption, a system ends up consuming more energy due to the energy losses incurred because of failure overheads. 2. While performing VM consolidation in a failure prone cloud computing environment, a significant improvement in terms of energy efficiency and reliability can be achieved by considering failure characteristics of physical resources. 3. By considering correlated occurrence of failures during resource provisioning and VM allocation, the service downtime or interruption is reduced significantly by 34% in comparison to the environments with the assumption of independent occurrence of failures. Moreover, measured by our mathematical model, the ratio of reliability and energy consumption is improved by 14%

    Doctor of Philosophy

    Get PDF
    dissertationWe propose a collective approach for harnessing the idle resources (cpu, storage, and bandwidth) of nodes (e.g., home desktops) distributed across the Internet. Instead of a purely peer-to-peer (P2P) approach, we organize participating nodes to act collectively using collective managers (CMs). Participating nodes provide idle resources to CMs, which unify these resources to run meaningful distributed services for external clients. We do not assume altruistic users or employ a barter-based incentive model; instead, participating nodes provide resources to CMs for long durations and are compensated in proportion to their contribution. In this dissertation we discuss the challenges faced by collective systems, present a design that addresses these challenges, and study the effect of selfish nodes. We believe that the collective service model is a useful alternative to the dominant pure P2P and centralized work queue models. It provides more effective utilization of idle resources, has a more meaningful economic model, and is better suited for building legal and commercial distributed services. We demonstrate the value of our work by building two distributed services using the collective approach. These services are a collective content distribution service and a collective data backup service

    Phantom: Towards Vendor-Agnostic Resource Consolidation in Cloud Environments

    Get PDF
    Mobile-oriented internet technologies such as mobile cloud computing are gaining wider popularity in the IT industry. These technologies are aimed at improving the user internet usage experience by employing state-of-the-art technologies or their combination. One of the most important parts of modern mobile-oriented future internet is cloud computing. Modern mobile devices use cloud computing technology to host, share and store data on the network. This helps mobile users to avail different internet services in a simple, cost-effective and easy way. In this paper, we shall discuss the issues in mobile cloud resource management followed by a vendor-agnostic resource consolidation approach named Phantom, to improve the resource allocation challenges in mobile cloud environments. The proposed scheme exploits software-defined networks (SDNs) to introduce vendor-agnostic concept and utilizes a graph-theoretic approach to achieve its objectives. Simulation results demonstrate the efficiency of our proposed approach in improving application service response time

    Improved self-management of datacenter systems applying machine learning

    Get PDF
    Autonomic Computing is a Computer Science and Technologies research area, originated during mid 2000's. It focuses on optimization and improvement of complex distributed computing systems through self-control and self-management. As distributed computing systems grow in complexity, like multi-datacenter systems in cloud computing, the system operators and architects need more help to understand, design and optimize manually these systems, even more when these systems are distributed along the world and belong to different entities and authorities. Self-management lets these distributed computing systems improve their resource and energy management, a very important issue when resources have a cost, by obtaining, running or maintaining them. Here we propose to improve Autonomic Computing techniques for resource management by applying modeling and prediction methods from Machine Learning and Artificial Intelligence. Machine Learning methods can find accurate models from system behaviors and often intelligible explanations to them, also predict and infer system states and values. These models obtained from automatic learning have the advantage of being easily updated to workload or configuration changes by re-taking examples and re-training the predictors. So employing automatic modeling and predictive abilities, we can find new methods for making "intelligent" decisions and discovering new information and knowledge from systems. This thesis departs from the state of the art, where management is based on administrators expertise, well known data, ad-hoc studied algorithms and models, and elements to be studied from computing machine point of view; to a novel state of the art where management is driven by models learned from the same system, providing useful feedback, making up for incomplete, missing or uncertain data, from a global network of datacenters point of view. - First of all, we cover the scenario where the decision maker works knowing all pieces of information from the system: how much will each job consume, how is and will be the desired quality of service, what are the deadlines for the workload, etc. All of this focusing on each component and policy of each element involved in executing these jobs. -Then we focus on the scenario where instead of fixed oracles that provide us information from an expert formula or set of conditions, machine learning is used to create these oracles. Here we look at components and specific details while some part of the information is not known and must be learned and predicted. - We reduce the problem of optimizing resource allocations and requirements for virtualized web-services to a mathematical problem, indicating each factor, variable and element involved, also all the constraints the scheduling process must attend to. The scheduling problem can be modeled as a Mixed Integer Linear Program. Here we face an scenario of a full datacenter, further we introduce some information prediction. - We complement the model by expanding the predicted elements, studying the main resources (this is CPU, Memory and IO) that can suffer from noise, inaccuracy or unavailability. Once learning predictors for certain components let the decision making improve, the system can become more ¿expert-knowledge independent¿ and research can focus on an scenario where all the elements provide noisy, uncertainty or private information. Also we introduce to the management optimization new factors as for each datacenter context and costs may change, turning the model as "multi-datacenter" - Finally, we review of the cost of placing datacenters depending on green energy sources, and distribute the load according to green energy availability

    Improving data center efficiency through smart grid integration and intelligent analytics

    Full text link
    The ever-increasing growth of the demand in IT computing, storage and large-scale cloud services leads to the proliferation of data centers that consist of (tens of) thousands of servers. As a result, data centers are now among the largest electricity consumers worldwide. Data center energy and resource efficiency has started to receive significant attention due to its economical, environmental, and performance impacts. In tandem, facing increasing challenges in stabilizing the power grids due to growing needs of intermittent renewable energy integration, power market operators have started to offer a number of demand response (DR) opportunities for energy consumers (such as data centers) to receive credits by modulating their power consumption dynamically following specific requirements. This dissertation claims that data centers have strong capabilities to emerge as major enablers of substantial electricity integration from renewables. The participation of data centers into emerging DR, such as regulation service reserves (RSRs), enables the growth of the data center in a sustainable, environmentally neutral, or even beneficial way, while also significantly reducing data center electricity costs. In this dissertation, we first model data center participation in DR, and then propose runtime policies to dynamically modulate data center power in response to independent system operator (ISO) requests, leveraging advanced server power and workload management techniques. We also propose energy and reserve bidding strategies to minimize the data center energy cost. Our results demonstrate that a typical data center can achieve up to 44% monetary savings in its electricity cost with RSR provision, dramatically surpassing savings achieved by traditional energy management strategies. In addition, we investigate the capabilities and benefits of various types of energy storage devices (ESDs) in DR. Finally, we demonstrate RSR provision in practice on a real server. In addition to its contributions on improving data center energy efficiency, this dissertation also proposes a novel method to address data center management efficiency. We propose an intelligent system analytics approach, "discovery by example", which leverages fingerprinting and machine learning methods to automatically discover software and system changes. Our approach eases runtime data center introspection and reduces the cost of system management.2018-11-04T00:00:00

    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)

    Get PDF
    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.The PhD Symposium was a very good opportunity for the young researchers to share information and knowledge, to present their current research, and to discuss topics with other students in order to look for synergies and common research topics. The idea was very successful and the assessment made by the PhD Student was very good. It also helped to achieve one of the major goals of the NESUS Action: to establish an open European research network targeting sustainable solutions for ultrascale computing aiming at cross fertilization among HPC, large scale distributed systems, and big data management, training, contributing to glue disparate researchers working across different areas and provide a meeting ground for researchers in these separate areas to exchange ideas, to identify synergies, and to pursue common activities in research topics such as sustainable software solutions (applications and system software stack), data management, energy efficiency, and resilience.European Cooperation in Science and Technology. COS
    corecore