42,932 research outputs found
TANGO: Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation
The paper is concerned with the issue of how software systems actually use
Heterogeneous Parallel Architectures (HPAs), with the goal of optimizing power
consumption on these resources. It argues the need for novel methods and tools
to support software developers aiming to optimise power consumption resulting
from designing, developing, deploying and running software on HPAs, while
maintaining other quality aspects of software to adequate and agreed levels. To
do so, a reference architecture to support energy efficiency at application
construction, deployment, and operation is discussed, as well as its
implementation and evaluation plans.Comment: Part of the Program Transformation for Programmability in
Heterogeneous Architectures (PROHA) workshop, Barcelona, Spain, 12th March
2016, 7 pages, LaTeX, 3 PNG figure
Next Generation Cloud Computing: New Trends and Research Directions
The landscape of cloud computing has significantly changed over the last
decade. Not only have more providers and service offerings crowded the space,
but also cloud infrastructure that was traditionally limited to single provider
data centers is now evolving. In this paper, we firstly discuss the changing
cloud infrastructure and consider the use of infrastructure from multiple
providers and the benefit of decentralising computing away from data centers.
These trends have resulted in the need for a variety of new computing
architectures that will be offered by future cloud infrastructure. These
architectures are anticipated to impact areas, such as connecting people and
devices, data-intensive computing, the service space and self-learning systems.
Finally, we lay out a roadmap of challenges that will need to be addressed for
realising the potential of next generation cloud systems.Comment: Accepted to Future Generation Computer Systems, 07 September 201
Toward sustainable data centers: a comprehensive energy management strategy
Data centers are major contributors to the emission of carbon dioxide to the atmosphere, and this contribution is expected to increase in the following years. This has encouraged the development of techniques to reduce the energy consumption and the environmental footprint of data centers. Whereas some of these techniques have succeeded to reduce the energy consumption of the hardware equipment of data centers (including IT, cooling, and power supply systems), we claim that sustainable data centers will be only possible if the problem is faced by means of a holistic approach that includes not only the aforementioned techniques but also intelligent and unifying solutions that enable a synergistic and energy-aware management of data centers.
In this paper, we propose a comprehensive strategy to reduce the carbon footprint of data centers that uses the energy as a driver of their management procedures. In addition, we present a holistic management architecture for sustainable data centers that implements the aforementioned strategy, and we propose design guidelines to accomplish each step of the proposed strategy, referring to related achievements and enumerating the main challenges that must be still solved.Peer ReviewedPostprint (author's final draft
Online Fault Classification in HPC Systems through Machine Learning
As High-Performance Computing (HPC) systems strive towards the exascale goal,
studies suggest that they will experience excessive failure rates. For this
reason, detecting and classifying faults in HPC systems as they occur and
initiating corrective actions before they can transform into failures will be
essential for continued operation. In this paper, we propose a fault
classification method for HPC systems based on machine learning that has been
designed specifically to operate with live streamed data. We cast the problem
and its solution within realistic operating constraints of online use. Our
results show that almost perfect classification accuracy can be reached for
different fault types with low computational overhead and minimal delay. We
have based our study on a local dataset, which we make publicly available, that
was acquired by injecting faults to an in-house experimental HPC system.Comment: Accepted for publication at the Euro-Par 2019 conferenc
Measuring and Managing Answer Quality for Online Data-Intensive Services
Online data-intensive services parallelize query execution across distributed
software components. Interactive response time is a priority, so online query
executions return answers without waiting for slow running components to
finish. However, data from these slow components could lead to better answers.
We propose Ubora, an approach to measure the effect of slow running components
on the quality of answers. Ubora randomly samples online queries and executes
them twice. The first execution elides data from slow components and provides
fast online answers; the second execution waits for all components to complete.
Ubora uses memoization to speed up mature executions by replaying network
messages exchanged between components. Our systems-level implementation works
for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the
EasyRec Recommendation Engine, and the OpenEphyra question answering system.
Ubora computes answer quality much faster than competing approaches that do not
use memoization. With Ubora, we show that answer quality can and should be used
to guide online admission control. Our adaptive controller processed 37% more
queries than a competing controller guided by the rate of timeouts.Comment: Technical Repor
Performance and Power Analysis of HPC Workloads on Heterogenous Multi-Node Clusters
Performance analysis tools allow application developers to identify and characterize the inefficiencies that cause performance degradation in their codes, allowing for application optimizations. Due to the increasing interest in the High Performance Computing (HPC) community towards energy-efficiency issues, it is of paramount importance to be able to correlate performance and power figures within the same profiling and analysis tools. For this reason, we present a performance and energy-efficiency study aimed at demonstrating how a single tool can be used to collect most of the relevant metrics. In particular, we show how the same analysis techniques can be applicable on different architectures, analyzing the same HPC application on a high-end and a low-power cluster. The former cluster embeds Intel Haswell CPUs and NVIDIA K80 GPUs, while the latter is made up of NVIDIA Jetson TX1 boards, each hosting an Arm Cortex-A57 CPU and an NVIDIA Tegra X1 Maxwell GPU.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc projects [17], grant agreements n. 288777, 610402 and 671697. E.C. was partially founded by “Contributo 5 per mille assegnato all’Università degli Studi di Ferrara-dichiarazione dei redditi dell’anno 2014”. We thank the University of Ferrara and INFN Ferrara for the access to the COKA Cluster. We warmly thank the BSC tools group, supporting us for the smooth integration and test of our setup within Extrae and Paraver.Peer ReviewedPostprint (published version
- …