Search CORE

636 research outputs found

Towards Exascale Scientific Metadata Management

Author: Blanas Spyros
Byna Surendra
Publication venue
Publication date: 29/03/2015
Field of study

Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

arXiv.org e-Print Archive

eScholarship - University of California

Cold Storage Data Archives: More Than Just a Bunch of Tapes

Author: Appuswamy Raja
Memishi Bunjamin
Paradies Marcus
Publication venue
Publication date: 01/01/2019
Field of study

The abundance of available sensor and derived data from large scientific experiments, such as earth observation programs, radio astronomy sky surveys, and high-energy physics already exceeds the storage hardware globally fabricated per year. To that end, cold storage data archives are the---often overlooked---spearheads of modern big data analytics in scientific, data-intensive application domains. While high-performance data analytics has received much attention from the research community, the growing number of problems in designing and deploying cold storage archives has only received very little attention. In this paper, we take the first step towards bridging this gap in knowledge by presenting an analysis of four real-world cold storage archives from three different application domains. In doing so, we highlight (i) workload characteristics that differentiate these archives from traditional, performance-sensitive data analytics, (ii) design trade-offs involved in building cold storage systems for these archives, and (iii) deployment trade-offs with respect to migration to the public cloud. Based on our analysis, we discuss several other important research challenges that need to be addressed by the data management community

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

PROCESS Data Infrastructure and Data Services

Author: Belloum Adam
Bobák Martin
Cushing Reginald
Graziani Mara
Habala Ondrej
Madougou Souley
Meizner Jan
Müller Henning
Nowakowski Piotr
Tran Viet
Valkering Onno
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 01/01/2020
Field of study

Due to energy limitation and high operational costs, it is likely that exascale computing will not be achieved by one or two datacentres but will require many more. A simple calculation, which aggregates the computation power of the 2017 Top500 supercomputers, can only reach 418 petaflops. Companies like Rescale, which claims 1.4 exaflops of peak computing power, describes its infrastructure as composed of 8 million servers spread across 30 datacentres. Any proposed solution to address exascale computing challenges has to take into consideration these facts and by design should aim to support the use of geographically distributed and likely independent datacentres. It should also consider, whenever possible, the co-allocation of the storage with the computation as it would take 3 years to transfer 1 exabyte on a dedicated 100 Gb Ethernet connection. This means we have to be smart about managing data more and more geographically dispersed and spread across different administrative domains. As the natural settings of the PROCESS project is to operate within the European Research Infrastructure and serve the European research communities facing exascale challenges, it is important that PROCESS architecture and solutions are well positioned within the European computing and data management landscape namely PRACE, EGI, and EUDAT. In this paper we propose a scalable and programmable data infrastructure that is easy to deploy and can be tuned to support various data-intensive scientific applications

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

International Migration, Integration and Social Cohesion online publications

ASCR/HEP Exascale Requirements Review Report

This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude -- and in some cases greater -- than that available currently. 2) The growth rate of data produced by simulations is overwhelming the current ability, of both facilities and researchers, to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. 3) Data rates and volumes from HEP experimental facilities are also straining the ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. 4) A close integration of HPC simulation and data analysis will aid greatly in interpreting results from HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. 5) Long-range planning between HEP and ASCR will be required to meet HEP's research needs. To best use ASCR HPC resources the experimental HEP program needs a) an established long-term plan for access to ASCR computational and data resources, b) an ability to map workflows onto HPC resources, c) the ability for ASCR facilities to accommodate workflows run by collaborations that can have thousands of individual members, d) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, e) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio

arXiv.org e-Print Archive

eScholarship - University of California

The Landscape of Exascale Research: A Data-Driven Literature Analysis

Author: Belloum A.S.Z.
Heldens S.
Hijma P.
Maassen J.
Van Nieuwpoort R.V.
Van Werkhoven B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2020
Field of study

International Migration, Integration and Social Cohesion online publications

Parallel programming systems for scalable scientific computing

Author: Heldens S.J.
Publication venue
Publication date: 01/01/2024
Field of study

High-performance computing (HPC) systems are more powerful than ever before. However, this rise in performance brings with it greater complexity, presenting significant challenges for researchers who wish to use these systems for their scientific work. This dissertation explores the development of scalable programming solutions for scientific computing. These solutions aim to be effective across a diverse range of computing platforms, from personal desktops to advanced supercomputers.To better understand HPC systems, this dissertation begins with a literature review on exascale supercomputers, massive systems capable of performing 10¹⁸ floating-point operations per second. This review combines both manual and data-driven analyses, revealing that while traditional challenges of exascale computing have largely been addressed, issues like software complexity and data volume remain. Additionally, the dissertation introduces the open-source software tool (called LitStudy) developed for this research.Next, this dissertation introduces two novel programming systems. The first system (called Rocket) is designed to scale all-versus-all algorithms to massive datasets. It features a multi-level software-based cache, a divide-and-conquer approach, hierarchical work-stealing, and asynchronous processing to maximize data reuse, exploit data locality, dynamically balance workloads, and optimize resource utilization. The second system (called Lightning) aims to scale existing single-GPU kernel functions across multiple GPUs, even on different nodes, with minimal code adjustments. Results across eight benchmarks on up to 32 GPUs show excellent scalability.The dissertation concludes by proposing a set of design principles for developing parallel programming systems for scalable scientific computing. These principles, based on lessons from this PhD research, represent significant steps forward in enabling researchers to efficiently utilize HPC systems

International Migration, Integration and Social Cohesion online publications

Reference Exascale Architecture (Extended Version)

Author: Belloum Adam
Bobák Martin
Cushing Reginald
Graziani Mara
Habala Ondrej
Hluchý Ladislav
Maassen Jason
Madougou Souley
Müller Henning
Tran Viet
Valkering Onno
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 01/01/2020
Field of study

While political commitments for building exascale systems have been made, turning these systems into platforms for a wide range of exascale applications faces several technical, organisational and skills-related challenges. The key technical challenges are related to the availability of data. While the first exascale machines are likely to be built within a single site, the input data is in many cases impossible to store within a single site. Alongside handling of extreme-large amount of data, the exascale system has to process data from different sources, support accelerated computing, handle high volume of requests per day, minimize the size of data flows, and be extensible in terms of continuously increasing data as well as an increase in parallel requests being sent. These technical challenges are addressed by the general reference exascale architecture. It is divided into three main blocks: virtualization layer, distributed virtual file system, and manager of computing resources. Its main property is modularity which is achieved by containerization at two levels: 1) application containers - containerization of scientific workflows, 2) micro-infrastructure - containerization of extreme-large data service-oriented infrastructure. The paper also presents an instantiation of the reference architecture - the architecture of the PROCESS project (PROviding Computing solutions for ExaScale ChallengeS) and discusses its relation to the reference exascale architecture. The PROCESS architecture has been used as an exascale platform within various exascale pilot applications. This paper also presents performance modelling of exascale platform with its validation

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

International Migration, Integration and Social Cohesion online publications

Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

Author: A Arnold
A Faradjian
B Hess
C Schütte
G Wilson
JA Anderson
JC Phillips
KJ Bowers
KJ Bowers
L Verlet
M Eleftheriou
M Shirts
MJ Abraham
P Eastman
R Yokota
S Pronk
S Páll
U Essmann
W Humphrey
WM Brown
Y Andoh
Y Sugita
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin

arXiv.org e-Print Archive

Publikationer från KTH

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

MPG.PuRe

The EU Center of Excellence for Exascale in Solid Earth (ChEESE): Implementation, results, and roadmap for the second phase

Author: Abril Claudia
Afanasiev Michael
Amati Giorgio
Aniko Wirp Sara
Bader Michael
Badia Rosa M.
Barsotti Sara
Basili Roberto
Bayraktar Hafize B.
Bernardi Fabrizio
Boehm Christian
Brizuela Beatriz
Brogi Federico
Cabrera Eduardo
Casarotti Emanuele
Castro Manuel J.
Cerminara Matteo
Cheptsov Alexey
Cirella Antonella
Conejero Javier
Costa Antonio
de la Asunción Marc
de la Puente Josep
Djuric Marco
Dorozhinskii Ravil
Espinosa Gabriela
Esposti-Ongaro Tomaso
Farnós Joan
Favretto-Cristini Nathalie
Fichtner Andreas
Folch Arnau
Fournier Alexandre
Gabriel Alice-Agnes
Gallard Jean-Matthieu
Gibbons Steven John
Glimsdal Sylfest
González-Vida José Manuel
Gracia Jose
Gregorio Rose
Gutierrez Natalia
Halldorsson Benedikt
Hamitou Okba
Houzeaux Guillaume
Jaure Stephan
Kessar Mouloud
Krenz Lukas
Krischer Lion
Laforet Soline
Lanucara Piero
Li Bo
Lorenzino Maria Concetta
Lorito Stefano
Løvholt Finn
Macedonio Giovanni
Macías Jorge
Martínez Montesinos Beatriz
Marín Guillermo
Mingari Leonardo
Moguilny Geneviève
Montellier Vadim
Monterrubio-Velasco Marisol
Moulard Georges Emmanuel
Nagaso Masaru
Nazaria Massimo
Niethammer Christoph
Pardini Federica
Pienkowska Marta
Pizzimenti Luca
Poiata Natalia
Rannabauer Leonhard
Rodriguez Juan Esteban
Rojas Otilio
Romano Fabrizio
Rudyy Oleksandr
Ruggiero Vittorio
Samfass Philipp
Sanchez Sabrina
Sandri Laura
Scala Antonio
Schaeffer Nathanael
Schuchart Joseph
Selva Jacopo
Sergeant Amadine
Stallone Angela
Sánchez-Linares Carlos
Taroni Matteo
Thrastarson Soelvi
Titos Manuel
Tonelllo Nadia
Tonini Roberto
Ulrich Thomas
Vilotte Jean-Pierre
Volpe Manuela
Vöge Malte
Wössner Uwe
Publication venue
Publication date: 01/01/2023
Field of study

publishedVersio

HAL AMU

Norwegian Geotechnical Institute (NGI) Digital Archive

LEONARDO: A Pan-European Pre-Exascale Supercomputer for HPC and AI Applications

Author: Amati Giorgio
Cestari Mirko
Turisini Matteo
Publication venue
Publication date: 31/07/2023
Field of study

A new pre-exascale computer cluster has been designed to foster scientific progress and competitive innovation across European research systems, it is called LEONARDO. This paper describes the general architecture of the system and focuses on the technologies adopted for its GPU-accelerated partition. High density processing elements, fast data movement capabilities and mature software stack collections allow the machine to run intensive workloads in a flexible and scalable way. Scientific applications from traditional High Performance Computing (HPC) as well as emerging Artificial Intelligence (AI) domains can benefit from this large apparatus in terms of time and energy to solution.Comment: 16 pages, 5 figures, 7 tables, to be published in Journal of Large Scale Research Facilitie

arXiv.org e-Print Archive