Search CORE

19 research outputs found

Computational Methods in Science and Engineering : Proceedings of the Workshop SimLabs@KIT, November 29 - 30, 2010, Karlsruhe, Germany

Author: Kirner Ole
Kondov Ivan
Poghosyan Gevorg
Schmitz Frank
Schneider Olaf
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2011
Field of study

In this proceedings volume we provide a compilation of article contributions equally covering applications from different research fields and ranging from capacity up to capability computing. Besides classical computing aspects such as parallelization, the focus of these proceedings is on multi-scale approaches and methods for tackling algorithm and data complexity. Also practical aspects regarding the usage of the HPC infrastructure and available tools and software at the SCC are presented

KITopen

Methodology for malleable applications on distributed memory systems

Author: Aguilar Mena Jimmy
Publication venue: Universitat Politècnica de Catalunya
Publication date: 23/11/2022
Field of study

A la portada logo BSC(English) The dominant programming approach for scientific and industrial computing on clusters is MPI+X. While there are a variety of approaches within the node, denoted by the ``X'', Message Passing interface (MPI) is the standard for programming multiple nodes with distributed memory. This thesis argues that the OmpSs-2 tasking model can be extended beyond the node to naturally support distributed memory, with three benefits: First, at small to medium scale the tasking model is a simpler and more productive alternative to MPI. It eliminates the need to distribute the data explicitly and convert all dependencies into explicit message passing. It also avoids the complexity of hybrid programming using MPI+X. Second, the ability to offload parts of the computation among the nodes enables the runtime to automatically balance the loads in a full-scale MPI+X program. This approach does not require a cost model, and it is able to transparently balance the computational loads across the whole program, on all its nodes. Third, because the runtime handles all low-level aspects of data distribution and communication, it can change the resource allocation dynamically, in a way that is transparent to the application. This thesis describes the design, development and evaluation of OmpSs-2@Cluster, a programming model and runtime system that extends the OmpSs-2 model to allow a virtually unmodified OmpSs-2 program to run across multiple distributed memory nodes. For well-balanced applications it provides similar performance to MPI+OpenMP on up to 16 nodes, and it improves performance by up to 2x for irregular and unbalanced applications like Cholesky factorization. This work also extended OmpSs-2@Cluster for interoperability with MPI and Barcelona Supercomputing Center (BSC)'s state-of-the-art Dynamic Load Balance (DLB) library in order to dynamically balance MPI+OmpSs-2 applications by transparently offloading tasks among nodes. This approach reduces the execution time of a microscale solid mechanics application by 46% on 64 nodes and on a synthetic benchmark, it is within 10% of perfect load balancing on up to 8 nodes. Finally, the runtime was extended to transparently support malleability for pure OmpSs-2@Cluster programs and interoperate with the Resources Management System (RMS). The only change to the application is to explicitly call an API function to control the addition or removal of nodes. In this regard we additionally provide the runtime with the ability to semi-transparently save and recover part of the application status to perform checkpoint and restart. Such a feature hides the complexity of data redistribution and parallel IO from the user while allowing the program to recover and continue previous executions. Our work is a starting point for future research on fault tolerance. In summary, OmpSs-2@Cluster expands the OmpSs-2 programming model to encompass distributed memory clusters. It allows an existing OmpSs-2 program, with few if any changes, to run across multiple nodes. OmpSs-2@Cluster supports transparent multi-node dynamic load balancing for MPI+OmpSs-2 programs, and enables semi-transparent malleability for OmpSs-2@Cluster programs. The runtime system has a high level of stability and performance, and it opens several avenues for future work.(Español) El modelo de programación dominante para clusters tanto en ciencia como industria es actualmente MPI+X. A pesar de que hay alguna variedad de alternativas para programar dentro de un nodo (indicado por la "X"), el estandar para programar múltiples nodos con memoria distribuida sigue siendo Message Passing Interface (MPI). Esta tesis propone la extensión del modelo de programación basado en tareas OmpSs-2 para su funcionamiento en sistemas de memoria distribuida, destacando 3 beneficios principales: En primer lugar; a pequeña y mediana escala, un modelo basado en tareas es más simple y productivo que MPI y elimina la necesidad de distribuir los datos explícitamente y convertir todas las dependencias en mensajes. Además, evita la complejidad de la programacion híbrida MPI+X. En segundo lugar; la capacidad de enviar partes del cálculo entre los nodos permite a la librería balancear la carga de trabajo en programas MPI+X a gran escala. Este enfoque no necesita un modelo de coste y permite equilibrar cargas transversalmente en todo el programa y todos los nodos. En tercer lugar; teniendo en cuenta que es la librería quien maneja todos los aspectos relacionados con distribución y transferencia de datos, es posible la modificación dinámica y transparente de los recursos que utiliza la aplicación. Esta tesis describe el diseño, desarrollo y evaluación de OmpSs-2@Cluster; un modelo de programación y librería que extiende OmpSs-2 permitiendo la ejecución de programas OmpSs-2 existentes en múltiples nodos sin prácticamente necesidad de modificarlos. Para aplicaciones balanceadas, este modelo proporciona un rendimiento similar a MPI+OpenMP hasta 16 nodos y duplica el rendimiento en aplicaciones irregulares o desbalanceadas como la factorización de Cholesky. Este trabajo incluye la extensión de OmpSs-2@Cluster para interactuar con MPI y la librería de balanceo de carga Dynamic Load Balancing (DLB) desarrollada en el Barcelona Supercomputing Center (BSC). De este modo es posible equilibrar aplicaciones MPI+OmpSs-2 mediante la transferencia transparente de tareas entre nodos. Este enfoque reduce el tiempo de ejecución de una aplicación de mecánica de sólidos a micro-escala en un 46% en 64 nodos; en algunos experimentos hasta 8 nodos se pudo equilibrar perfectamente la carga con una diferencia inferior al 10% del equilibrio perfecto. Finalmente, se implementó otra extensión de la librería para realizar operaciones de maleabilidad en programas OmpSs-2@Cluster e interactuar con el Sistema de Manejo de Recursos (RMS). El único cambio requerido en la aplicación es la llamada explicita a una función de la interfaz que controla la adición o eliminación de nodos. Además, se agregó la funcionalidad de guardar y recuperar parte del estado de la aplicación de forma semitransparente con el objetivo de realizar operaciones de salva-reinicio. Dicha funcionalidad oculta al usuario la complejidad de la redistribución de datos y las operaciones de lectura-escritura en paralelo, mientras permite al programa recuperar y continuar ejecuciones previas. Este es un punto de partida para futuras investigaciones en tolerancia a fallos. En resumen, OmpSs-2@Cluster amplía el modelo de programación de OmpSs-2 para abarcar sistemas de memoria distribuida. El modelo permite la ejecución de programas OmpSs-2 en múltiples nodos prácticamente sin necesidad de modificarlos. OmpSs-2@Cluster permite además el balanceo dinámico de carga en aplicaciones híbridas MPI+OmpSs-2 ejecutadas en varios nodos y es capaz de realizar maleabilidad semi-transparente en programas OmpSs-2@Cluster puros. La librería tiene un niveles de rendimiento y estabilidad altos y abre varios caminos para trabajos futuro.Arquitectura de computador

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Recommended from our members

National Energy Research Scientific Computing Center 2001 Annual Report

Author: Lawrence Berkeley National Laboratory
Publication venue: eScholarship, University of California
Publication date: 01/12/2001
Field of study

eScholarship - University of California

An FPGA implementation of an investigative many-core processor, Fynbos : in support of a Fortran autoparallelising software pipeline

Author: Wyngaard Janet Ruth
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2014
Field of study

Includes bibliographical references.In light of the power, memory, ILP, and utilisation walls facing the computing industry, this work examines the hypothetical many-core approach to finding greater compute performance and efficiency. In order to achieve greater efficiency in an environment in which Moore’s law continues but TDP has been capped, a means of deriving performance from dark and dim silicon is needed. The many-core hypothesis is one approach to exploiting these available transistors efficiently. As understood in this work, it involves trading in hardware control complexity for hundreds to thousands of parallel simple processing elements, and operating at a clock speed sufficiently low as to allow the efficiency gains of near threshold voltage operation. Performance is there- fore dependant on exploiting a new degree of fine-grained parallelism such as is currently only found in GPGPUs, but in a manner that is not as restrictive in application domain range. While removing the complex control hardware of traditional CPUs provides space for more arithmetic hardware, a basic level of control is still required. For a number of reasons this work chooses to replace this control largely with static scheduling. This pushes the burden of control primarily to the software and specifically the compiler, rather not to the programmer or to an application specific means of control simplification. An existing legacy tool chain capable of autoparallelising sequential Fortran code to the degree of parallelism necessary for many-core exists. This work implements a many-core architecture to match it. Prototyping the design on an FPGA, it is possible to examine the real world performance of the compiler-architecture system to a greater degree than simulation only would allow. Comparing theoretical peak performance and real performance in a case study application, the system is found to be more efficient than any other reviewed, but to also significantly under perform relative to current competing architectures. This failing is apportioned to taking the need for simple hardware too far, and an inability to implement static scheduling mitigating tactics due to lack of support for such in the compiler

Cape Town University OpenUCT

Annual Research Report 2011

Author: Weierstrass-Institut für Angewandte Analysis und Stochastik (Berlin)
Publication venue
Publication date: 01/01/2011
Field of study

Publications Server of the Weierstrass Institute for Applied Analysis and Stochastics

Laboratory Directed Research and Development Annual Report - Fiscal Year 2000

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Recommended from our members

Laboratory Directed Research and Development Program FY 2004 Annual Report

Author: Sjoreen Terrence P
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/04/2005
Field of study

The Oak Ridge National Laboratory (ORNL) Laboratory Directed Research and Development (LDRD) Program reports its status to the U.S. Department of Energy (DOE) in March of each year. The program operates under the authority of DOE Order 413.2A, 'Laboratory Directed Research and Development' (January 8, 2001), which establishes DOE's requirements for the program while providing the Laboratory Director broad flexibility for program implementation. LDRD funds are obtained through a charge to all Laboratory programs. This report describes all ORNL LDRD research activities supported during FY 2004 and includes final reports for completed projects and shorter progress reports for projects that were active, but not completed, during this period. The FY 2004 ORNL LDRD Self-Assessment (ORNL/PPA-2005/2) provides financial data about the FY 2004 projects and an internal evaluation of the program's management process. ORNL is a DOE multiprogram science, technology, and energy laboratory with distinctive capabilities in materials science and engineering, neutron science and technology, energy production and end-use technologies, biological and environmental science, and scientific computing. With these capabilities ORNL conducts basic and applied research and development (R&D) to support DOE's overarching national security mission, which encompasses science, energy resources, environmental quality, and national nuclear security. As a national resource, the Laboratory also applies its capabilities and skills to the specific needs of other federal agencies and customers through the DOE Work For Others (WFO) program. Information about the Laboratory and its programs is available on the Internet at <http://www.ornl.gov/>. LDRD is a relatively small but vital DOE program that allows ORNL, as well as other multiprogram DOE laboratories, to select a limited number of R&D projects for the purpose of: (1) maintaining the scientific and technical vitality of the Laboratory; (2) enhancing the Laboratory's ability to address future DOE missions; (3) fostering creativity and stimulating exploration of forefront science and technology; (4) serving as a proving ground for new research; and (5) supporting high-risk, potentially high-value R&D. Through LDRD the Laboratory is able to improve its distinctive capabilities and enhance its ability to conduct cutting-edge R&D for its DOE and WFO sponsors. To meet the LDRD objectives and fulfill the particular needs of the Laboratory, ORNL has established a program with two components: the Director's R&D Fund and the Seed Money Fund. As outlined in Table 1, these two funds are complementary. The Director's R&D Fund develops new capabilities in support of the Laboratory initiatives, while the Seed Money Fund is open to all innovative ideas that have the potential for enhancing the Laboratory's core scientific and technical competencies. Provision for multiple routes of access to ORNL LDRD funds maximizes the likelihood that novel and seminal ideas with scientific and technological merit will be recognized and supported

UNT Digital Library

Annual Research Report 2016

Author: Weierstrass-Institut für Angewandte Analysis und Stochastik (Berlin)
Publication venue
Publication date: 01/01/2016
Field of study

Publications Server of the Weierstrass Institute for Applied Analysis and Stochastics

Laboratory Directed Research and Development FY 1998 Progress Report

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Recommended from our members

Laboratory directed research and development. Annual report, fiscal year 1995

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/02/1996
Field of study

This document is a compilation of the several research and development programs having been performed at the Pacific Northwest National Laboratory for the fiscal year 1995

UNT Digital Library