34 research outputs found

    Reliability-oriented resource management for High-Performance Computing

    Get PDF
    Reliability is an increasingly pressing issue for High-Performance Computing systems, as failures are a threat to large-scale applications, for which an even single run may incur significant energy and billing costs. Currently, application developers need to address reliability explicitly, by integrating application-specific checkpoint/restore mechanisms. However, the application alone cannot exploit system knowledge, which is not the case for system-wide resource management systems. In this paper, we propose a reliability-oriented policy that can increase significantly component reliability by combining checkpoint/restore mechanisms exploitation and proactive resource management policies

    Towards Power- and Energy-Efficient Datacenters

    Full text link
    As the Internet evolves, cloud computing is now a dominant form of computation in modern lives. Warehouse-scale computers (WSCs), or datacenters, comprising the foundation of this cloud-centric web have been able to deliver satisfactory performance to both the Internet companies and the customers. With the increased focus and popularity of the cloud, however, datacenter loads rise and grow rapidly, and Internet companies are in need of boosted computing capacity to serve such demand. Unfortunately, power and energy are often the major limiting factors prohibiting datacenter growth: it is often the case that no more servers can be added to datacenters without surpassing the capacity of the existing power infrastructure. This dissertation aims to investigate the issues of power and energy usage in a modern datacenter environment. We identify the source of power and energy inefficiency at three levels in a modern datacenter environment and provides insights and solutions to address each of these problems, aiming to prepare datacenters for critical future growth. We start at the datacenter-level and find that the peak provisioning and improper service placement in multi-level power delivery infrastructures fragment the power budget inside production datacenters, degrading the compute capacity the existing infrastructure can support. We find that the heterogeneity among datacenter workloads is key to address this issue and design systematic methods to reduce the fragmentation and improve the utilization of the power budget. This dissertation then narrow the focus to examine the energy usage of individual servers running cloud workloads. Especially, we examine the power management mechanisms employed in these servers and find that the coarse time granularity of these mechanisms is one critical factor that leads to excessive energy consumption. We propose an intelligent and low overhead solution on top of the emerging finer granularity voltage/frequency boosting circuit to effectively pinpoints and boosts queries that are likely to increase the tail distribution and can reap more benefit from the voltage/frequency boost, improving energy efficiency without sacrificing the quality of services. The final focus of this dissertation takes a further step to investigate how using a fundamentally more efficient computing substrate, field programmable gate arrays (FPGAs), benefit datacenter power and energy efficiency. Different from other types of hardware accelerations, FPGAs can be reconfigured on-the-fly to provide fine-grain control over hardware resource allocation and presents a unique set of challenges for optimal workload scheduling and resource allocation. We aim to design a set coordinated algorithms to manage these two key factors simultaneously and fully explore the benefit of deploying FPGAs in the highly varying cloud environment.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144043/1/hsuch_1.pd

    Analysis of Application Delivery Platform for Software Defined Infrastructures

    Get PDF
    Application Service Providers (ASPs) obtaining resources from multiple clouds have to contend with different management and control platforms employed by the cloud service providers (CSPs) and network service providers (NSP). Distributing applications on multiple clouds has a number of benefits but the absence of a common multi-cloud management platform that would allow ASPs dynamic and real-time control over resources across multiple clouds and interconnecting networks makes this task arduous. OpenADN, being developed at Washington University in Saint Louis, fills this gap. However, performance issues of such a complex, distributed and multi-threaded platform, not tackled appropriately, may neutralize some of the gains accruable to the ASPs. In this paper, we establish the need for and methods of collecting precise and fine-grained behavioral data of OpenADN like platforms that can be used to optimize their behavior in order to control operational cost, performance (e.g., latency) and energy consumption.Comment: E-preprin

    Minimizing Thermal Stress for Data Center Servers through Thermal-Aware Relocation

    Get PDF
    A rise in inlet air temperature may lower the rate of heat dissipation from air cooled computing servers. This introduces a thermal stress to these servers. As a result, the poorly cooled active servers will start conducting heat to the neighboring servers and giving rise to hotspot regions of thermal stress, inside the data center. As a result, the physical hardware of these servers may fail, thus causing performance loss, monetary loss, and higher energy consumption for cooling mechanism. In order to minimize these situations, this paper performs the profiling of inlet temperature sensitivity (ITS) and defines the optimum location for each server to minimize the chances of creating a thermal hotspot and thermal stress. Based upon novel ITS analysis, a thermal state monitoring and server relocation algorithm for data centers is being proposed. The contribution of this paper is bringing the peak outlet temperatures of the relocated servers closer to average outlet temperature by over 5 times, lowering the average peak outlet temperature by 3.5% and minimizing the thermal stress

    Network and Server Resource Management Strategies for Data Centre Infrastructures: A Survey

    Get PDF
    The advent of virtualisation and the increasing demand for outsourced, elastic compute charged on a pay-as-you-use basis has stimulated the development of large-scale Cloud Data Centres (DCs) housing tens of thousands of computer clusters. Of the signi�cant capital outlay required for building and operating such infrastructures, server and network equipment account for 45% and 15% of the total cost, respectively, making resource utilisation e�ciency paramount in order to increase the operators' Return-on-Investment (RoI). In this paper, we present an extensive survey on the management of server and network resources over virtualised Cloud DC infrastructures, highlighting key concepts and results, and critically discussing their limitations and implications for future research opportunities. We highlight the need for and bene �ts of adaptive resource provisioning that alleviates reliance on static utilisation prediction models and exploits direct measurement of resource utilisation on servers and network nodes. Coupling such distributed measurement with logically-centralised Software De�ned Networking (SDN) principles, we subsequently discuss the challenges and opportunities for converged resource management over converged ICT environments, through unifying control loops to globally orchestrate adaptive and load-sensitive resource provisioning

    Management And Security Of Multi-Cloud Applications

    Get PDF
    Single cloud management platform technology has reached maturity and is quite successful in information technology applications. Enterprises and application service providers are increasingly adopting a multi-cloud strategy to reduce the risk of cloud service provider lock-in and cloud blackouts and, at the same time, get the benefits like competitive pricing, the flexibility of resource provisioning and better points of presence. Another class of applications that are getting cloud service providers increasingly interested in is the carriers\u27 virtualized network services. However, virtualized carrier services require high levels of availability and performance and impose stringent requirements on cloud services. They necessitate the use of multi-cloud management and innovative techniques for placement and performance management. We consider two classes of distributed applications – the virtual network services and the next generation of healthcare – that would benefit immensely from deployment over multiple clouds. This thesis deals with the design and development of new processes and algorithms to enable these classes of applications. We have evolved a method for optimization of multi-cloud platforms that will pave the way for obtaining optimized placement for both classes of services. The approach that we have followed for placement itself is predictive cost optimized latency controlled virtual resource placement for both types of applications. To improve the availability of virtual network services, we have made innovative use of the machine and deep learning for developing a framework for fault detection and localization. Finally, to secure patient data flowing through the wide expanse of sensors, cloud hierarchy, virtualized network, and visualization domain, we have evolved hierarchical autoencoder models for data in motion between the IoT domain and the multi-cloud domain and within the multi-cloud hierarchy

    Energy-Efficient Software

    Get PDF
    The energy consumption of ICT is growing at an unprecedented pace. The main drivers for this growth are the widespread diffusion of mobile devices and the proliferation of datacenters, the most power-hungry IT facilities. In addition, it is predicted that the demand for ICT technologies and services will increase in the coming years. Finding solutions to decrease ICT energy footprint is and will be a top priority for researchers and professionals in the field. As a matter of fact, hardware technology has substantially improved throughout the years: modern ICT devices are definitely more energy efficient than their predecessors, in terms of performance per watt. However, as recent studies show, these improvements are not effectively reducing the growth rate of ICT energy consumption. This suggests that these devices are not used in an energy-efficient way. Hence, we have to look at software. Modern software applications are not designed and implemented with energy efficiency in mind. As hardware became more and more powerful (and cheaper), software developers were not concerned anymore with optimizing resource usage. Rather, they focused on providing additional features, adding layers of abstraction and complexity to their products. This ultimately resulted in bloated, slow software applications that waste hardware resources -- and consequently, energy. In this dissertation, the relationship between software behavior and hardware energy consumption is explored in detail. For this purpose, the abstraction levels of software are traversed upwards, from source code to architectural components. Empirical research methods and evidence-based software engineering approaches serve as a basis. First of all, this dissertation shows the relevance of software over energy consumption. Secondly, it gives examples of best practices and tactics that can be adopted to improve software energy efficiency, or design energy-efficient software from scratch. Finally, this knowledge is synthesized in a conceptual framework that gives the reader an overview of possible strategies for software energy efficiency, along with examples and suggestions for future research

    Perpetual Sensing: Experiences with Energy-Harvesting Sensor Systems

    Full text link
    Industry forecasts project the number of connected devices will outpace the global population by orders of magnitude in the next decade or two. These projections are application driven: smart cities, implantable health monitors, responsive buildings, autonomous robots, driverless cars, and instrumented infrastructure are all expected to be drivers for the growth of networked devices. Achieving this immense scale---potentially trillions of smart and connected sensors and computers, popularly called the "Internet of Things"---raises a host of challenges including operating system design, networking protocols, and orchestration methodologies. However, another critical issue may be the most fundamental: If embedded computers outnumber people by a factor of a thousand, how are we going to keep all of these devices powered? In this dissertation, we show that energy-harvesting operation, by which devices scavenge energy from their surroundings to power themselves after they are deployed, is a viable answer to this question. In particular, we examine a range of energy-harvesting sensor node designs for a specific application: smart buildings. In this application setting, the devices must be small and sleek to be unobtrusively and widely deployed, yet shrinking the devices also reduces their energy budgets as energy storage often dominates their volume. Additionally, energy-harvesting introduces new challenges for these devices due to the intermittent access to power that stems from relying on unpredictable ambient energy sources. To address these challenges, we present several techniques for realizing effective sensors despite the size and energy constraints. First is Monjolo, an energy metering system that exploits rather than attempts to mask the variability in energy-harvesting by using the energy harvester itself as the sensor. Building on Monjolo, we show how simple time synchronization and an application specific sensor can enable accurate, building-scale submetering while remaining energy-harvesting. We also show how energy-harvesting can be the foundation for highly deployable power metering, as well as indoor monitoring and event detection. With these sensors as a guide, we present an architecture for energy-harvesting systems that provides layered abstractions and enables modular component reuse. We also couple these sensors with a generic and reusable gateway platform and an application-layer cloud service to form an easy-to-deploy building sensing toolkit, and demonstrate its effectiveness by performing and analyzing several modest-scale deployments.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138686/1/bradjc_1.pd
    corecore