Search CORE

1,039 research outputs found

RISC-V-Based Platforms for HPC: Analyzing Non-functional Properties for Future HPC and Big-Data Clusters

Author: Aldinucci Marco
Baroffio Davide
Birke Robert
Cesarini Daniele
Colonnelli Iacopo
Condia Josie E. Rodriguez
Fornaciari William
Iannone Francesco
Mencagli Gabriele
Metra Cecilia
Mittone Gianluca
Omana Martin
Palombi Filippo
Reghenzani Federico
Reorda Matteo Sonza
Terraneo Federico
Tesser Federico
Zummo Giuseppe
Publication venue: Springer
Publication date: 01/01/2023
Field of study

High-Performance Computing (HPC) have evolved to be used to perform simulations of systems where physical experimentation is prohibitively impractical, expensive, or dangerous. This paper provides a general overview and showcases the analysis of non-functional properties in RISC-V-based platforms for HPCs. In particular, our analyses target the evaluation of power and energy control, thermal management, and reliability assessment of promising systems, structures, and technologies devised for current and future generation of HPC machines. The main set of design methodologies and technologies developed within the activities of the Future and HPC & Big Data spoke of the National Centre of HPC, Big Data and Quantum Computing project are described along with the description of the testbed for experimenting two-phase cooling approaches

Archivio istituzionale della ricerca - Politecnico di Milano

Power, Reliability, Performance: One System to Rule Them All

Author: Acun Bilge
Kalé Laxmikant
Langer Akhil
Meneses-Rojas Esteban
Menon Harshitha
Sarood Osman
Totoni Ehsan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

En un diseño basado en el marco de programación paralelo Charm ++, un sistema de tiempo de ejecución adaptativo interactúa dinámicamente con el administrador de recursos de un centro de datos para controlar la energía mediante la programación inteligente de trabajos, la reasignación de recursos y la reconfiguración de hardware. Gestiona simultáneamente la fiabilidad al enfriar el sistema al nivel óptimo de la aplicación en ejecución y mantiene el rendimiento a través del equilibrio de carg

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional del Instituto Tecnologico de Costa Rica

Cooling – making efficient choices White paper update 2016

Author: Radosław Januszewski
Publication venue
Publication date
Field of study

Due to the increasing role of energy costs and growing heat density of servers, cooling issues are becoming highly challenging and very important. While power consumption is recognised to be one of the main challenges for future HPC systems, attention to this issue tends to be limited to the power consumed by the computer hardware only, leaving aside the increase in cooling power required by more densely packaged, highly integrated hardware. The information presented herein is a result of data analysed and collected in a process of distributing a detailed survey among PRACE partners. The PRACE 3IP Pre-Commercial Procurement (PCP) contributes to the development of energy efficient HPC technologies and architecture, targeting improved cooling and energy efficiency of the overall system along with fine scale monitoring of energy consumption. The methodology designed for this PCP to evaluate energy efficiency could be re-used for other HPC infrastructure procurements, allowing for a better view of the TCO of the future system. This paper provides an overview of traditional technologies cooling and devices currently used in modern HPC data centres as well as some innovative and promising solutions adopted by some PRACE partners that may pave the way for future standards. The advantages and disadvantages of each described solution are discussed and general recommendations are provided as to which aspects HPC centres should take into account when selecting and building a cooling system

ZENODO

RISC-V-based Platforms for HPC: Analyzing Non-functional Properties for Future HPC and Big-Data Clusters

Author: Aldinucci Marco
Birke Robert
Colonnelli Iacopo
Mittone Gianluca
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2023
Field of study

Institutional Research Information System University of Turin

Project Final Report: HPC-Colony II

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Parallel Programming with Migratable Objects: Charm++ in Practice

Author: Acun Bilge
Gupta Abhishek
Jain Nikhil
Kale Laxmikant
Langer Akhil
Menon Harshitha
Mikida Eric
Ni Xiang
Robson Michael
Sun Yanhua
Totoni Ehsan
Wesolowski Lukasz
Publication venue: Smith ScholarWorks
Publication date: 16/01/2014
Field of study

The advent of petascale computing has introduced new challenges (e.g. Heterogeneity, system failure) for programming scalable parallel applications. Increased complexity and dynamism in science and engineering applications of today have further exacerbated the situation. Addressing these challenges requires more emphasis on concepts that were previously of secondary importance, including migratability, adaptivity, and runtime system introspection. In this paper, we leverage our experience with these concepts to demonstrate their applicability and efficacy for real world applications. Using the CHARM++ parallel programming framework, we present details on how these concepts can lead to development of applications that scale irrespective of the rough landscape of supercomputing technology. Empirical evaluation presented in this paper spans many miniapplications and real applications executed on modern supercomputers including Blue Gene/Q, Cray XE6, and Stampede

Smith College: Smith ScholarWorks

Improving efficiency and resilience in large-scale computing systems through analytics and data-driven management

Author: Tuncer Ozan
Publication venue
Publication date: 03/07/2018
Field of study

Applications running in large-scale computing systems such as high performance computing (HPC) or cloud data centers are essential to many aspects of modern society, from weather forecasting to financial services. As the number and size of data centers increase with the growing computing demand, scalable and efficient management becomes crucial. However, data center management is a challenging task due to the complex interactions between applications, middleware, and hardware layers such as processors, network, and cooling units. This thesis claims that to improve robustness and efficiency of large-scale computing systems, significantly higher levels of automated support than what is available in today's systems are needed, and this automation should leverage the data continuously collected from various system layers. Towards this claim, we propose novel methodologies to automatically diagnose the root causes of performance and configuration problems and to improve efficiency through data-driven system management. We first propose a framework to diagnose software and hardware anomalies that cause undesired performance variations in large-scale computing systems. We show that by training machine learning models on resource usage and performance data collected from servers, our approach successfully diagnoses 98% of the injected anomalies at runtime in real-world HPC clusters with negligible computational overhead. We then introduce an analytics framework to address another major source of performance anomalies in cloud data centers: software misconfigurations. Our framework discovers and extracts configuration information from cloud instances such as containers or virtual machines. This is the first framework to provide comprehensive visibility into software configurations in multi-tenant cloud platforms, enabling systematic analysis for validating the correctness of software configurations. This thesis also contributes to the design of robust and efficient system management methods that leverage continuously monitored resource usage data. To improve performance under power constraints, we propose a workload- and cooling-aware power budgeting algorithm that distributes the available power among servers and cooling units in a data center, achieving up to 21% improvement in throughput per Watt compared to the state-of-the-art. Additionally, we design a network- and communication-aware HPC workload placement policy that reduces communication overhead by up to 30% in terms of hop-bytes compared to existing policies.2019-07-02T00:00:00

Boston University Institutional Repository (OpenBU)

Cloud computing: survey on energy efficiency

Author: Anghel I.
Apparao P.
Ariel Oleksiak
Armand F.
Athanasios V. Vasilakos
Borovyi A.
Cavdar D.
de Langen P.
Fakhar Faiza
Feeney L. M.
Ge Chang
Gupta M.
Hankendi C.
Heller Brandon
Holger Claussen
Ivona Brandic
Jean-Marc Pierson
Jin Yichao
Le K.
Ma Kun
Mastelic Toni
Megalingam R. K.
Moisan Francois
Patterson Michael K.
Pianese F.
Raghavan Praveen
Snyder G. Jeffrey
Steinder M.
Stoess Jan
Sundararajan K. T.
Toni Mastelic
Zer Emre
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

International audienceCloud computing is today’s most emphasized Information and Communications Technology (ICT) paradigm that is directly or indirectly used by almost every online user. However, such great significance comes with the support of a great infrastructure that includes large data centers comprising thousands of server units and other supporting equipment. Their share in power consumption generates between 1.1% and 1.5% of the total electricity use worldwide and is projected to rise even more. Such alarming numbers demand rethinking the energy efficiency of such infrastructures. However, before making any changes to infrastructure, an analysis of the current status is required. In this article, we perform a comprehensive analysis of an infrastructure supporting the cloud computing paradigm with regards to energy efficiency. First, we define a systematic approach for analyzing the energy efficiency of most important data center domains, including server and network equipment, as well as cloud management systems and appliances consisting of a software utilized by end users. Second, we utilize this approach for analyzing available scientific and industrial literature on state-of-the-art practices in data centers and their equipment. Finally, we extract existing challenges and highlight future research directions

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte