Search CORE

34 research outputs found

Reliability-oriented resource management for High-Performance Computing

Author: Agosta Giovanni
Campi Alessandro
Ciesielski Sebastian
Fornaciari William
Kulczewski Michal
Massari Giuseppe
Peta Miriam
Piatek Wojciech
Reghenzani Federico
Terraneo Federico
Publication venue: 'Elsevier BV'
Publication date: 01/01/2023
Field of study

Reliability is an increasingly pressing issue for High-Performance Computing systems, as failures are a threat to large-scale applications, for which an even single run may incur significant energy and billing costs. Currently, application developers need to address reliability explicitly, by integrating application-specific checkpoint/restore mechanisms. However, the application alone cannot exploit system knowledge, which is not the case for system-wide resource management systems. In this paper, we propose a reliability-oriented policy that can increase significantly component reliability by combining checkpoint/restore mechanisms exploitation and proactive resource management policies

Archivio istituzionale della ricerca - Politecnico di Milano

Towards Power- and Energy-Efficient Datacenters

Author: Hsu Chang-Hong
Publication venue
Publication date
Field of study

As the Internet evolves, cloud computing is now a dominant form of computation in modern lives. Warehouse-scale computers (WSCs), or datacenters, comprising the foundation of this cloud-centric web have been able to deliver satisfactory performance to both the Internet companies and the customers. With the increased focus and popularity of the cloud, however, datacenter loads rise and grow rapidly, and Internet companies are in need of boosted computing capacity to serve such demand. Unfortunately, power and energy are often the major limiting factors prohibiting datacenter growth: it is often the case that no more servers can be added to datacenters without surpassing the capacity of the existing power infrastructure. This dissertation aims to investigate the issues of power and energy usage in a modern datacenter environment. We identify the source of power and energy inefficiency at three levels in a modern datacenter environment and provides insights and solutions to address each of these problems, aiming to prepare datacenters for critical future growth. We start at the datacenter-level and find that the peak provisioning and improper service placement in multi-level power delivery infrastructures fragment the power budget inside production datacenters, degrading the compute capacity the existing infrastructure can support. We find that the heterogeneity among datacenter workloads is key to address this issue and design systematic methods to reduce the fragmentation and improve the utilization of the power budget. This dissertation then narrow the focus to examine the energy usage of individual servers running cloud workloads. Especially, we examine the power management mechanisms employed in these servers and find that the coarse time granularity of these mechanisms is one critical factor that leads to excessive energy consumption. We propose an intelligent and low overhead solution on top of the emerging finer granularity voltage/frequency boosting circuit to effectively pinpoints and boosts queries that are likely to increase the tail distribution and can reap more benefit from the voltage/frequency boost, improving energy efficiency without sacrificing the quality of services. The final focus of this dissertation takes a further step to investigate how using a fundamentally more efficient computing substrate, field programmable gate arrays (FPGAs), benefit datacenter power and energy efficiency. Different from other types of hardware accelerations, FPGAs can be reconfigured on-the-fly to provide fine-grain control over hardware resource allocation and presents a unique set of challenges for optimal workload scheduling and resource allocation. We aim to design a set coordinated algorithms to manage these two key factors simultaneously and fully explore the benefit of deploying FPGAs in the highly varying cloud environment.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144043/1/hsuch_1.pd

Deep Blue Documents at the University of Michigan

Analysis of Application Delivery Platform for Software Defined Infrastructures

Author: Gupta Lav
Jain Raj
Samaka Mohammed
Publication venue
Publication date: 11/01/1938
Field of study

Application Service Providers (ASPs) obtaining resources from multiple clouds have to contend with different management and control platforms employed by the cloud service providers (CSPs) and network service providers (NSP). Distributing applications on multiple clouds has a number of benefits but the absence of a common multi-cloud management platform that would allow ASPs dynamic and real-time control over resources across multiple clouds and interconnecting networks makes this task arduous. OpenADN, being developed at Washington University in Saint Louis, fills this gap. However, performance issues of such a complex, distributed and multi-threaded platform, not tackled appropriately, may neutralize some of the gains accruable to the ASPs. In this paper, we establish the need for and methods of collecting precise and fine-grained behavioral data of OpenADN like platforms that can be used to optimize their behavior in order to control operational cost, performance (e.g., latency) and energy consumption.Comment: E-preprin

arXiv.org e-Print Archive

Trinity College

Minimizing Thermal Stress for Data Center Servers through Thermal-Aware Relocation

Author: Atif Manzoor
Muhammad Tayyab Chaudhry
S. A. Hussain
T. C. Ling
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

A rise in inlet air temperature may lower the rate of heat dissipation from air cooled computing servers. This introduces a thermal stress to these servers. As a result, the poorly cooled active servers will start conducting heat to the neighboring servers and giving rise to hotspot regions of thermal stress, inside the data center. As a result, the physical hardware of these servers may fail, thus causing performance loss, monetary loss, and higher energy consumption for cooling mechanism. In order to minimize these situations, this paper performs the profiling of inlet temperature sensitivity (ITS) and defines the optimum location for each server to minimize the chances of creating a thermal hotspot and thermal stress. Based upon novel ITS analysis, a thermal state monitoring and server relocation algorithm for data centers is being proposed. The contribution of this paper is bringing the peak outlet temperatures of the relocated servers closer to average outlet temperature by over 5 times, lowering the average peak outlet temperature by 3.5% and minimizing the thermal stress

Crossref

Directory of Open Access Journals

PubMed Central

Recommended from our members

Finding, Measuring, and Reducing Inefficiencies in Contemporary Computer Systems

Author: Kambadur Melanie Rae
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2016
Field of study

Computer systems have become increasingly diverse and specialized in recent years. This complexity supports a wide range of new computing uses and users, but is not without cost: it has become difficult to maintain the efficiency of contemporary general purpose computing systems. Computing inefficiencies, which include nonoptimal runtimes, excessive energy use, and limits to scalability, are a serious problem that can result in an inability to apply computing to solve the world's most important problems. Beyond the complexity and vast diversity of modern computing platforms and applications, a number of factors make improving general purpose efficiency challenging, including the requirement that multiple levels of the computer system stack be examined, that legacy hardware devices and software may stand in the way of achieving efficiency, and the need to balance efficiency with reusability, programmability, security, and other goals. This dissertation presents five case studies, each demonstrating different ways in which the measurement of emerging systems can provide actionable advice to help keep general purpose computing efficient. The first of the five case studies is Parallel Block Vectors, a new profiling method for understanding parallel programs with a fine-grained, code-centric perspective aids in both future hardware design and in optimizing software to map better to existing hardware. Second is a project that defines a new way of measuring application interference on a datacenter's worth of chip-multiprocessors, leading to improved scheduling where applications can more effectively utilize available hardware resources. Next is a project that uses the GT-Pin tool to define a method for accelerating the simulation of GPGPUs, ultimately allowing for the development of future hardware with fewer inefficiencies. The fourth project is an experimental energy survey that compares and combines the latest energy efficiency solutions at different levels of the stack to properly evaluate the state of the art and to find paths forward for future energy efficiency research. The final project presented is NRG-Loops, a language extension that allows programs to measure and intelligently adapt their own power and energy use

Columbia University Academic Commons

Network and Server Resource Management Strategies for Data Centre Infrastructures: A Survey

Author: Jouet S
Pezaros DP
Tso FP
Publication venue: 'Elsevier BV'
Publication date: 01/09/2016
Field of study

The advent of virtualisation and the increasing demand for outsourced, elastic compute charged on a pay-as-you-use basis has stimulated the development of large-scale Cloud Data Centres (DCs) housing tens of thousands of computer clusters. Of the signi�cant capital outlay required for building and operating such infrastructures, server and network equipment account for 45% and 15% of the total cost, respectively, making resource utilisation e�ciency paramount in order to increase the operators' Return-on-Investment (RoI). In this paper, we present an extensive survey on the management of server and network resources over virtualised Cloud DC infrastructures, highlighting key concepts and results, and critically discussing their limitations and implications for future research opportunities. We highlight the need for and bene �ts of adaptive resource provisioning that alleviates reliance on static utilisation prediction models and exploits direct measurement of resource utilisation on servers and network nodes. Coupling such distributed measurement with logically-centralised Software De�ned Networking (SDN) principles, we subsequently discuss the challenges and opportunities for converged resource management over converged ICT environments, through unifying control loops to globally orchestrate adaptive and load-sensitive resource provisioning

LJMU Research Online (Liverpool John Moores University)

Elsevier - Publisher Connector

Enlighten

Management And Security Of Multi-Cloud Applications

Author: Gupta Lav
Publication venue: Washington University Open Scholarship
Publication date: 15/05/2019
Field of study

Single cloud management platform technology has reached maturity and is quite successful in information technology applications. Enterprises and application service providers are increasingly adopting a multi-cloud strategy to reduce the risk of cloud service provider lock-in and cloud blackouts and, at the same time, get the benefits like competitive pricing, the flexibility of resource provisioning and better points of presence. Another class of applications that are getting cloud service providers increasingly interested in is the carriers\u27 virtualized network services. However, virtualized carrier services require high levels of availability and performance and impose stringent requirements on cloud services. They necessitate the use of multi-cloud management and innovative techniques for placement and performance management. We consider two classes of distributed applications – the virtual network services and the next generation of healthcare – that would benefit immensely from deployment over multiple clouds. This thesis deals with the design and development of new processes and algorithms to enable these classes of applications. We have evolved a method for optimization of multi-cloud platforms that will pave the way for obtaining optimized placement for both classes of services. The approach that we have followed for placement itself is predictive cost optimized latency controlled virtual resource placement for both types of applications. To improve the availability of virtual network services, we have made innovative use of the machine and deep learning for developing a framework for fault detection and localization. Finally, to secure patient data flowing through the wide expanse of sensors, cloud hierarchy, virtualized network, and visualization domain, we have evolved hierarchical autoencoder models for data in motion between the IoT domain and the multi-cloud domain and within the multi-cloud hierarchy

Washington University St. Louis: Open Scholarship

Energy-Efficient Software

Author: Procaccianti Giuseppe
Publication venue: Politecnico di Torino
Publication date: 01/01/2015
Field of study

The energy consumption of ICT is growing at an unprecedented pace. The main drivers for this growth are the widespread diffusion of mobile devices and the proliferation of datacenters, the most power-hungry IT facilities. In addition, it is predicted that the demand for ICT technologies and services will increase in the coming years. Finding solutions to decrease ICT energy footprint is and will be a top priority for researchers and professionals in the field. As a matter of fact, hardware technology has substantially improved throughout the years: modern ICT devices are definitely more energy efficient than their predecessors, in terms of performance per watt. However, as recent studies show, these improvements are not effectively reducing the growth rate of ICT energy consumption. This suggests that these devices are not used in an energy-efficient way. Hence, we have to look at software. Modern software applications are not designed and implemented with energy efficiency in mind. As hardware became more and more powerful (and cheaper), software developers were not concerned anymore with optimizing resource usage. Rather, they focused on providing additional features, adding layers of abstraction and complexity to their products. This ultimately resulted in bloated, slow software applications that waste hardware resources -- and consequently, energy. In this dissertation, the relationship between software behavior and hardware energy consumption is explored in detail. For this purpose, the abstraction levels of software are traversed upwards, from source code to architectural components. Empirical research methods and evidence-based software engineering approaches serve as a basis. First of all, this dissertation shows the relevance of software over energy consumption. Secondly, it gives examples of best practices and tactics that can be adopted to improve software energy efficiency, or design energy-efficient software from scratch. Finally, this knowledge is synthesized in a conceptual framework that gives the reader an overview of possible strategies for software energy efficiency, along with examples and suggestions for future research

VU Research Portal

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Perpetual Sensing: Experiences with Energy-Harvesting Sensor Systems

Author: Campbell Bradford
Publication venue
Publication date: 01/01/2017
Field of study

Industry forecasts project the number of connected devices will outpace the global population by orders of magnitude in the next decade or two. These projections are application driven: smart cities, implantable health monitors, responsive buildings, autonomous robots, driverless cars, and instrumented infrastructure are all expected to be drivers for the growth of networked devices. Achieving this immense scale---potentially trillions of smart and connected sensors and computers, popularly called the "Internet of Things"---raises a host of challenges including operating system design, networking protocols, and orchestration methodologies. However, another critical issue may be the most fundamental: If embedded computers outnumber people by a factor of a thousand, how are we going to keep all of these devices powered? In this dissertation, we show that energy-harvesting operation, by which devices scavenge energy from their surroundings to power themselves after they are deployed, is a viable answer to this question. In particular, we examine a range of energy-harvesting sensor node designs for a specific application: smart buildings. In this application setting, the devices must be small and sleek to be unobtrusively and widely deployed, yet shrinking the devices also reduces their energy budgets as energy storage often dominates their volume. Additionally, energy-harvesting introduces new challenges for these devices due to the intermittent access to power that stems from relying on unpredictable ambient energy sources. To address these challenges, we present several techniques for realizing effective sensors despite the size and energy constraints. First is Monjolo, an energy metering system that exploits rather than attempts to mask the variability in energy-harvesting by using the energy harvester itself as the sensor. Building on Monjolo, we show how simple time synchronization and an application specific sensor can enable accurate, building-scale submetering while remaining energy-harvesting. We also show how energy-harvesting can be the foundation for highly deployable power metering, as well as indoor monitoring and event detection. With these sensors as a guide, we present an architecture for energy-harvesting systems that provides layered abstractions and enables modular component reuse. We also couple these sensors with a generic and reusable gateway platform and an application-layer cloud service to form an easy-to-deploy building sensing toolkit, and demonstrate its effectiveness by performing and analyzing several modest-scale deployments.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138686/1/bradjc_1.pd

Deep Blue Documents at the University of Michigan