537 research outputs found
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
GROMACS is a widely used package for biomolecular simulation, and over the
last two decades it has evolved from small-scale efficiency to advanced
heterogeneous acceleration and multi-level parallelism targeting some of the
largest supercomputers in the world. Here, we describe some of the ways we have
been able to realize this through the use of parallelization on all levels,
combined with a constant focus on absolute performance. Release 4.6 of GROMACS
uses SIMD acceleration on a wide range of architectures, GPU offloading
acceleration, and both OpenMP and MPI parallelism within and between nodes,
respectively. The recent work on acceleration made it necessary to revisit the
fundamental algorithms of molecular simulation, including the concept of
neighborsearching, and we discuss the present and future challenges we see for
exascale simulation - in particular a very fine-grained task parallelism. We
also discuss the software management, code peer review and continuous
integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin
Adaptation, deployment and evaluation of a railway simulator in cloud environments
Many scientific areas make extensive use of computer simulations to study realworld
processes. As they become more complex and resource-intensive, traditional
programming paradigms running on supercomputers have shown to be limited by
their hardware resources.
The Cloud and its elastic nature has been increasingly seen as a valid alternative
for simulation execution, as it aims to provide virtually infinite resources, thus
unlimited scalability. In order to bene t from this, simulators must be adapted to
this paradigm since cloud migration tends to add virtualization and communication
overhead.
This work has the main objective of migrating a power consumption railway
simulator to the Cloud, with minimal impact in the original code and preserving
performance. We propose a data-centric adaptation based in MapReduce to distribute
the simulation load across several nodes while minimising data transmission.
We deployed our solution on an Amazon EC2 virtual cluster and measured its
performance. We did the same in in our local cluster to compare the solution's performance
against the original application when the Cloud's overhead is not present.
Our tests show that the resulting application is highly scalable and shows a better
overall performance regarding the original simulator in both environments.
This document summarises the author's work during the whole adaptation development
process .IngenierĂa InformĂĄtic
Distributed Computing in a Pandemic
The current COVID-19 global pandemic caused by the SARS-CoV-2 betacoronavirus has resulted in over a million deaths and is having a grave socio-economic impact, hence there is an urgency to find solutions to key research challenges. Much of this COVID-19 research depends on distributed computing. In this article, I review distributed architectures -- various types of clusters, grids and clouds -- that can be leveraged to perform these tasks at scale, at high-throughput, with a high degree of parallelism, and which can also be used to work collaboratively. High-performance computing (HPC) clusters will be used to carry out much of this work. Several bigdata processing tasks used in reducing the spread of SARS-CoV-2 require high-throughput approaches, and a variety of tools, which Hadoop and Spark offer, even using commodity hardware. Extremely large-scale COVID-19 research has also utilised some of the world's fastest supercomputers, such as IBM's SUMMIT -- for ensemble docking high-throughput screening against SARS-CoV-2 targets for drug-repurposing, and high-throughput gene analysis -- and Sentinel, an XPE-Cray based system used to explore natural products. Grid computing has facilitated the formation of the world's first Exascale grid computer. This has accelerated COVID-19 research in molecular dynamics simulations of SARS-CoV-2 spike protein interactions through massively-parallel computation and was performed with over 1 million volunteer computing devices using the Folding@home platform. Grids and clouds both can also be used for international collaboration by enabling access to important datasets and providing services that allow researchers to focus on research rather than on time-consuming data-management tasks
Distributed Computing in a Pandemic: A Review of Technologies Available for Tackling COVID-19
The current COVID-19 global pandemic caused by the SARS-CoV-2 betacoronavirus
has resulted in over a million deaths and is having a grave socio-economic
impact, hence there is an urgency to find solutions to key research challenges.
Much of this COVID-19 research depends on distributed computing. In this
article, I review distributed architectures -- various types of clusters, grids
and clouds -- that can be leveraged to perform these tasks at scale, at
high-throughput, with a high degree of parallelism, and which can also be used
to work collaboratively. High-performance computing (HPC) clusters will be used
to carry out much of this work. Several bigdata processing tasks used in
reducing the spread of SARS-CoV-2 require high-throughput approaches, and a
variety of tools, which Hadoop and Spark offer, even using commodity hardware.
Extremely large-scale COVID-19 research has also utilised some of the world's
fastest supercomputers, such as IBM's SUMMIT -- for ensemble docking
high-throughput screening against SARS-CoV-2 targets for drug-repurposing, and
high-throughput gene analysis -- and Sentinel, an XPE-Cray based system used to
explore natural products. Grid computing has facilitated the formation of the
world's first Exascale grid computer. This has accelerated COVID-19 research in
molecular dynamics simulations of SARS-CoV-2 spike protein interactions through
massively-parallel computation and was performed with over 1 million volunteer
computing devices using the Folding@home platform. Grids and clouds both can
also be used for international collaboration by enabling access to important
datasets and providing services that allow researchers to focus on research
rather than on time-consuming data-management tasks.Comment: 21 pages (15 excl. refs), 2 figures, 3 table
RELEASE: A High-level Paradigm for Reliable Large-scale Server Software
Erlang is a functional language with a much-emulated model for building reliable distributed systems. This paper outlines the RELEASE project, and describes the progress in the first six months. The project aim is to scale the Erlangâs radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines. Currently Erlang has inherently scalable computation and reliability models, but in practice scalability is constrained by aspects of the language and virtual machine. We are working at three levels to address these challenges: evolving the Erlang virtual machine so that it can work effectively on large scale multicore systems; evolving the language to Scalable Distributed (SD) Erlang; developing a scalable Erlang infrastructure to integrate multiple, heterogeneous clusters. We are also developing state of the art tools that allow programmers to understand the behaviour of massively parallel SD Erlang programs. We will demonstrate the effectiveness of the RELEASE approach using demonstrators and two large case studies on a Blue Gene
Parallel programming systems for scalable scientific computing
High-performance computing (HPC) systems are more powerful than ever before. However, this rise in performance brings with it greater complexity, presenting significant challenges for researchers who wish to use these systems for their scientific work. This dissertation explores the development of scalable programming solutions for scientific computing. These solutions aim to be effective across a diverse range of computing platforms, from personal desktops to advanced supercomputers.To better understand HPC systems, this dissertation begins with a literature review on exascale supercomputers, massive systems capable of performing 10Âčâž floating-point operations per second. This review combines both manual and data-driven analyses, revealing that while traditional challenges of exascale computing have largely been addressed, issues like software complexity and data volume remain. Additionally, the dissertation introduces the open-source software tool (called LitStudy) developed for this research.Next, this dissertation introduces two novel programming systems. The first system (called Rocket) is designed to scale all-versus-all algorithms to massive datasets. It features a multi-level software-based cache, a divide-and-conquer approach, hierarchical work-stealing, and asynchronous processing to maximize data reuse, exploit data locality, dynamically balance workloads, and optimize resource utilization. The second system (called Lightning) aims to scale existing single-GPU kernel functions across multiple GPUs, even on different nodes, with minimal code adjustments. Results across eight benchmarks on up to 32 GPUs show excellent scalability.The dissertation concludes by proposing a set of design principles for developing parallel programming systems for scalable scientific computing. These principles, based on lessons from this PhD research, represent significant steps forward in enabling researchers to efficiently utilize HPC systems
Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY
Convergence between high-performance computing (HPC) and big data analytics (BDA) is currently an established research area that has spawned new opportunities for unifying the platform layer and data abstractions in these ecosystems. This work presents an architectural model that enables the interoperability of established BDA and HPC execution models, reflecting the key design features that interest both the HPC and BDA communities, and including an abstract data collection and operational model that generates a unified interface for hybrid applications. This architecture can be implemented in different ways depending on the process- and data-centric platforms of choice and the mechanisms put in place to effectively meet the requirements of the architecture. The Spark-DIY platform is introduced in the paper as a prototype implementation of the architecture proposed. It preserves the interfaces and execution environment of the popular BDA platform Apache Spark, making it compatible with any Spark-based application and tool, while providing efficient communication and kernel execution via DIY, a powerful communication pattern library built on top of MPI. Later, Spark-DIY is analyzed in terms of performance by building a representative use case from the hydrogeology domain, EnKF-HGS. This application is a clear example of how current HPC simulations are evolving toward hybrid HPC-BDA applications, integrating HPC simulations within a BDA environment.This work was supported in part by the Spanish Ministry of Economy, Industry and Competitiveness under Grant TIN2016-79637-P(toward Unification of HPC and Big Data Paradigms), in part by the Spanish Ministry of Education under Grant FPU15/00422 TrainingProgram for Academic and Teaching Staff Grant, in part by the Advanced Scientific Computing Research, Office of Science, U.S.Department of Energy, under Contract DE-AC02-06CH11357, and in part by the DOE with under Agreement DE-DC000122495,Program Manager Laura Biven
- âŠ