28 research outputs found
Introducing the Task-Aware Storage I/O (TASIO) Library
Task-based programming models are excellent tools to parallelize and seamlessly load balance an application workload. However, the integration of I/O intensive applications and task-based programming models is lacking. Typically, I/O operations stall the requesting thread until the data is serviced by the backing device. Because the core where the thread was running becomes idle, it should be possible to overlap the data query operation with either computation workloads or even more I/O operations. Nonetheless, overlapping I/O tasks with other tasks entails an extra degree of complexity currently not managed by programming models’ runtimes. In this work, we focus on integrating storage I/O into the tasking model by introducing the Task-Aware Storage I/O (TASIO) library. We test TASIO extensively with a custom benchmark for a number of configurations and conclude that it is able to achieve speedups up to 2x depending on the workload, although it might lead to slowdowns if not used with the right settings.This project is supported by the European Union's Horizon 2021 research and
innovation programme under the grant agreement No 754304 (DEEP-EST), the
Ministry of Economy of Spain through the Severo Ochoa Center of Excellence
Program (SEV-2015-0493), by the Spanish Ministry of Science and Innovation
(contract TIN2015-65316-P) and by the Generalitat de Catalunya (2017-SGR-
1481). Also, the authors would like to acknowledge that the test environment
(Cobi) was ceded by Intel Corporation in the frame of the BSC - Intel collabo-
ration.Peer ReviewedPostprint (author's final draft
Shared Memory Pipelined Parareal
For the parallel-in-time integration method Parareal, pipelining can be used to hide some of the cost of the serial correction step and improve its efficiency. The paper introduces an OpenMP implementation of pipelined Parareal and compares it to a standard MPI-based variant. Both versions yield almost identical runtimes, but, depending on the compiler, the OpenMP variant consumes about 7% less energy and has a significantly smaller memory footprint. However, its higher implementation complexity might make it difficult to use in legacy codes and in combination with spatial parallelisation
Exploiting Memory Affinity in OpenMP through Schedule Reuse
In this paper we explore the possibility of reusing schedules to improve the scalability of numerical codes in shared--memory architectures with non--uniform memory access. The main objective is to implicitly construct affinity links between threads and data accesses and reuse them as much as possible along the execution of the application. These links are created thorugh the definition and reuse of iteration schedules statically defined by the user or dinamically created at run time. The paper does not include a formal proposal of OpenMP extensions but includes some experiments showing the usefulness of constructing affinity links in some irregular codes.Peer ReviewedPostprint (author's final draft
Exploting Pipelined Executions in OpenMP
This paper proposes a set of extensions to the OpenMP programming model to express point–to–point synchronization schemes. This is accomplished by defining, in the form of directives, precedence relations among the tasks that are originated from OpenMP work–sharing constructs. The proposal is based on the definition of a name space that identifies the work parceled out by these work–sharing constructs. Then the programmer defines the precedence relations using this name space. This relieves the programmer from the burden of defining complex synchronization data structures and the insertion of explicit synchronization actions in the program that make the program difficult to understand and maintain. The paper briefly describes the main aspects of the runtime implementation required to support precedences relations in OpenMP. The paper focuses on the evaluation of the proposal through its use two benchmarks: NAS LU and ASCI Seep3d.
Defining and Supporting Pipelined Executions in OpenMP
This paper proposes a set of extensions to the OpenMP programming model to express complex pipelined computations. This is accomplished by dening, in the form of directives, precedence relations among the tasks originated from work{sharing constructs. The proposal is based on the denition of a name space that identies the work parceled out by these work{sharing constructs. Then the programmer denes the precedence relations using this name space. This relieves the programmer from the burden of dening complex synchronization data structures and the insertion of explicit synchronization actions in the program that make the program dicult to understand and maintain. The paper focuses on the runtime support required to support this feature and the code generated by the NanosCompiler