Search CORE

11 research outputs found

A Linux Kernel scheduler extension for multi-core systems

Author: Roca Nonell Aleix
Publication venue: Universitat PolitÃ©cnica de Catalunya
Publication date: 01/01/2017
Field of study

In this thesis, it is presented a Linux kernel extension that allows a user-space application to be notified of the blocking and unblocking of its threads, making it possible for a core to execute another thread while the other is blocked. The OmpSs Nanos6 runtime is adapted to use this feature

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A Linux kernel scheduler extension for multi-core systems

Author: Beltran Vicenç
Marquet Kevin
Roca Nonell Aleix
Rodríguez Samuel
Segura Albert
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The Linux kernel is mostly designed for multi-programed environments, but high-performance applications have other requirements. Such applications are run standalone, and usually rely on runtime systems to distribute the application's workload on worker threads, one per core. However, due to current OSes limitations, it is not feasible to track whether workers are actually running or blocked due to, for instance, a requested resource. For I/O intensive applications, this leads to a significant performance degradation given that the core of a blocked thread becomes idle until it is able to run again. In this paper, we present the proof-of-concept of a Linux kernel extension denoted User-Monitored Threads (UMT) which tackles this problem. Our extension allows a user-space process to be notified of when the selected threads become blocked or unblocked, making it possible for a runtime to schedule additional work on the idle core. We implemented the extension on the Linux Kernel 5.1 and adapted the Nanos6 runtime of the OmpSs-2 programming model to take advantage of it. The whole prototype was tested on two applications which, on the tested hardware and the appropriate conditions, reported speedups of almost 2x.This project is supported by the European Union’s Horizon 2021 research and innovation programme under the grant agreement No 754304 (DEEP-EST), the Ministry of Economy of Spain through the Severo Ochoa Center of Excellence Program (SEV-2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P) and by the Generalitat de Catalunya (2017-SGR-1481).Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Advanced synchronization techniques for task-based runtime systems

Author: Beltran Querol Vicenç
Maroñas Bravo Marcos
Roca Nonell Aleix
Sala Penadés Kevin
Álvarez Robert David
Publication venue: Association for Computing Machinery (ACM)
Publication date: 01/01/2021
Field of study

Task-based programming models like OmpSs-2 and OpenMP provide a flexible data-flow execution model to exploit dynamic, irregular and nested parallelism. Providing an efficient implementation that scales well with small granularity tasks remains a challenge, and bottlenecks can manifest in several runtime components. In this paper, we analyze the limiting factors in the scalability of a task-based runtime system and propose individual solutions for each of the challenges, including a wait-free dependency system and a novel scalable scheduler design based on delegation. We evaluate how the optimizations impact the overall performance of the runtime, both individually and in combination. We also compare the resulting runtime against state of the art OpenMP implementations, showing equivalent or better performance, especially for fine-grained tasks.This project is supported by the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No.s 754304 (DEEP-EST), by the Spanish Ministry of Science and Innovation (contract PID2019-107255GB and TIN2015-65316P) and by the Generalitat de Catalunya (2017-SGR-1414).Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

A Linux Kernel scheduler extension for multi-core systems

Author: Roca Nonell Aleix
Publication venue: Universitat PolitÃ©cnica de Catalunya
Publication date: 25/10/2017
Field of study

Controlador de dispositius per reconeixement de veu (CDRV)

Author: Roca Nonell Aleix
Publication venue: Universitat Politècnica de Catalunya
Publication date: 11/12/2014
Field of study

CDRV (Controlador de dipositivos por reconocimiento de voz) es un dispositiu capaç de controlar altres dispositius mitjançant la veu. Concretament, per aquest projecte, s'ha adaptat per controlar una butaca reclinable

UPCommons. Portal del coneixement obert de la UPC

Controlador de dispositius per reconeixement de veu (CDRV)

Author: Roca Nonell Aleix
Publication venue: Universitat Politècnica de Catalunya
Publication date: 11/12/2014
Field of study

A Linux Kernel scheduler extension for multi-core systems

Author: Roca Nonell Aleix
Publication venue: Universitat PolitÃ©cnica de Catalunya
Publication date
Field of study

RECERCAT

Introducing the Task-Aware Storage I/O (TASIO) Library

Author: Beltran Querol Vicenç
Mateo Bellido Sergi
Roca Nonell Aleix
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date
Field of study

Task-based programming models are excellent tools to parallelize and seamlessly load balance an application workload. However, the integration of I/O intensive applications and task-based programming models is lacking. Typically, I/O operations stall the requesting thread until the data is serviced by the backing device. Because the core where the thread was running becomes idle, it should be possible to overlap the data query operation with either computation workloads or even more I/O operations. Nonetheless, overlapping I/O tasks with other tasks entails an extra degree of complexity currently not managed by programming models’ runtimes. In this work, we focus on integrating storage I/O into the tasking model by introducing the Task-Aware Storage I/O (TASIO) library. We test TASIO extensively with a custom benchmark for a number of configurations and conclude that it is able to achieve speedups up to 2x depending on the workload, although it might lead to slowdowns if not used with the right settings.This project is supported by the European Union's Horizon 2021 research and innovation programme under the grant agreement No 754304 (DEEP-EST), the Ministry of Economy of Spain through the Severo Ochoa Center of Excellence Program (SEV-2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P) and by the Generalitat de Catalunya (2017-SGR- 1481). Also, the authors would like to acknowledge that the test environment (Cobi) was ceded by Intel Corporation in the frame of the BSC - Intel collabo- ration.Peer Reviewe

RECERCAT

On the applicability of PEBS based online memory access tracking for heterogeneous memory management at scale

Author: Bautista-Gomez Leonardo
Beltran Querol Vicenç
Gerofi Balazs
Ishikawa Yutaka
Martinet Dominique
Roca Nonell Aleix
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2018
Field of study

Operating systems have historically had to manage only a single type of memory device. The imminent availability of heterogeneous memory devices based on emerging memory technologies confronts the classic single memory model and opens a new spectrum of possibilities for memory management. Transparent data movement between different memory devices based on access patterns of applications is a desired feature to make optimal use of such devices and to hide the complexity of memory management to the end user. However, capturing memory access patterns of an application at runtime comes at a cost, which is particularly challenging for large-scale parallel applications that may be sensitive to system noise. In this work, we focus on the access pattern profiling phase prior to the actual memory relocation. We study the feasibility of using Intel's Processor Event-Based Sampling (PEBS) feature to record memory accesses by sampling at runtime and study the overhead at scale. We have implemented a custom PEBS driver in the IHK/-McKernel lightweight multi-kernel operating system, one of whose advantages is minimal system interference due to the lightweight kernel's simple design compared to other OS kernels such as Linux. We present the PEBS overhead of a set of scientific applications and show the access patterns identified in noise sensitive HPC applications. Our results show that clear access patterns can be captured with a 10% overhead in the worst-case and 1% in the best case when running on up to 128k CPU cores (2,048 Intel Xeon Phi Knights Landing nodes). We conclude that online memory access profiling using PEBS at large-scale is promising for memory management in heterogeneous memory environments.This work has been partially funded by MEXT’s program for the Development and Improvement of Next Generation Ultra High- Speed Computer Systems under its subsidies for operating the Specific Advanced Large Research Facilities in Japan. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska- Curie grant agreement No 708566 (DURO).Peer Reviewe

Crossref

UPCommons. Portal del coneixement obert de la UPC