11 research outputs found

    A Linux Kernel scheduler extension for multi-core systems

    Get PDF
    In this thesis, it is presented a Linux kernel extension that allows a user-space application to be notified of the blocking and unblocking of its threads, making it possible for a core to execute another thread while the other is blocked. The OmpSs Nanos6 runtime is adapted to use this feature

    A Linux kernel scheduler extension for multi-core systems

    Get PDF
    ©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The Linux kernel is mostly designed for multi-programed environments, but high-performance applications have other requirements. Such applications are run standalone, and usually rely on runtime systems to distribute the application's workload on worker threads, one per core. However, due to current OSes limitations, it is not feasible to track whether workers are actually running or blocked due to, for instance, a requested resource. For I/O intensive applications, this leads to a significant performance degradation given that the core of a blocked thread becomes idle until it is able to run again. In this paper, we present the proof-of-concept of a Linux kernel extension denoted User-Monitored Threads (UMT) which tackles this problem. Our extension allows a user-space process to be notified of when the selected threads become blocked or unblocked, making it possible for a runtime to schedule additional work on the idle core. We implemented the extension on the Linux Kernel 5.1 and adapted the Nanos6 runtime of the OmpSs-2 programming model to take advantage of it. The whole prototype was tested on two applications which, on the tested hardware and the appropriate conditions, reported speedups of almost 2x.This project is supported by the European Union’s Horizon 2021 research and innovation programme under the grant agreement No 754304 (DEEP-EST), the Ministry of Economy of Spain through the Severo Ochoa Center of Excellence Program (SEV-2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P) and by the Generalitat de Catalunya (2017-SGR-1481).Peer ReviewedPostprint (author's final draft

    Advanced synchronization techniques for task-based runtime systems

    Get PDF
    Task-based programming models like OmpSs-2 and OpenMP provide a flexible data-flow execution model to exploit dynamic, irregular and nested parallelism. Providing an efficient implementation that scales well with small granularity tasks remains a challenge, and bottlenecks can manifest in several runtime components. In this paper, we analyze the limiting factors in the scalability of a task-based runtime system and propose individual solutions for each of the challenges, including a wait-free dependency system and a novel scalable scheduler design based on delegation. We evaluate how the optimizations impact the overall performance of the runtime, both individually and in combination. We also compare the resulting runtime against state of the art OpenMP implementations, showing equivalent or better performance, especially for fine-grained tasks.This project is supported by the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No.s 754304 (DEEP-EST), by the Spanish Ministry of Science and Innovation (contract PID2019-107255GB and TIN2015-65316P) and by the Generalitat de Catalunya (2017-SGR-1414).Peer ReviewedPostprint (author's final draft

    A Linux Kernel scheduler extension for multi-core systems

    No full text
    In this thesis, it is presented a Linux kernel extension that allows a user-space application to be notified of the blocking and unblocking of its threads, making it possible for a core to execute another thread while the other is blocked. The OmpSs Nanos6 runtime is adapted to use this feature

    Controlador de dispositius per reconeixement de veu (CDRV)

    No full text
    CDRV (Controlador de dipositivos por reconocimiento de voz) es un dispositiu capaç de controlar altres dispositius mitjançant la veu. Concretament, per aquest projecte, s'ha adaptat per controlar una butaca reclinable

    Controlador de dispositius per reconeixement de veu (CDRV)

    No full text
    CDRV (Controlador de dipositivos por reconocimiento de voz) es un dispositiu capaç de controlar altres dispositius mitjançant la veu. Concretament, per aquest projecte, s'ha adaptat per controlar una butaca reclinable

    A Linux Kernel scheduler extension for multi-core systems

    No full text
    In this thesis, it is presented a Linux kernel extension that allows a user-space application to be notified of the blocking and unblocking of its threads, making it possible for a core to execute another thread while the other is blocked. The OmpSs Nanos6 runtime is adapted to use this feature

    Introducing the Task-Aware Storage I/O (TASIO) Library

    No full text
    Task-based programming models are excellent tools to parallelize and seamlessly load balance an application workload. However, the integration of I/O intensive applications and task-based programming models is lacking. Typically, I/O operations stall the requesting thread until the data is serviced by the backing device. Because the core where the thread was running becomes idle, it should be possible to overlap the data query operation with either computation workloads or even more I/O operations. Nonetheless, overlapping I/O tasks with other tasks entails an extra degree of complexity currently not managed by programming models’ runtimes. In this work, we focus on integrating storage I/O into the tasking model by introducing the Task-Aware Storage I/O (TASIO) library. We test TASIO extensively with a custom benchmark for a number of configurations and conclude that it is able to achieve speedups up to 2x depending on the workload, although it might lead to slowdowns if not used with the right settings.This project is supported by the European Union's Horizon 2021 research and innovation programme under the grant agreement No 754304 (DEEP-EST), the Ministry of Economy of Spain through the Severo Ochoa Center of Excellence Program (SEV-2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P) and by the Generalitat de Catalunya (2017-SGR- 1481). Also, the authors would like to acknowledge that the test environment (Cobi) was ceded by Intel Corporation in the frame of the BSC - Intel collabo- ration.Peer Reviewe

    On the applicability of PEBS based online memory access tracking for heterogeneous memory management at scale

    Get PDF
    Operating systems have historically had to manage only a single type of memory device. The imminent availability of heterogeneous memory devices based on emerging memory technologies confronts the classic single memory model and opens a new spectrum of possibilities for memory management. Transparent data movement between different memory devices based on access patterns of applications is a desired feature to make optimal use of such devices and to hide the complexity of memory management to the end user. However, capturing memory access patterns of an application at runtime comes at a cost, which is particularly challenging for large-scale parallel applications that may be sensitive to system noise. In this work, we focus on the access pattern profiling phase prior to the actual memory relocation. We study the feasibility of using Intel's Processor Event-Based Sampling (PEBS) feature to record memory accesses by sampling at runtime and study the overhead at scale. We have implemented a custom PEBS driver in the IHK/-McKernel lightweight multi-kernel operating system, one of whose advantages is minimal system interference due to the lightweight kernel's simple design compared to other OS kernels such as Linux. We present the PEBS overhead of a set of scientific applications and show the access patterns identified in noise sensitive HPC applications. Our results show that clear access patterns can be captured with a 10% overhead in the worst-case and 1% in the best case when running on up to 128k CPU cores (2,048 Intel Xeon Phi Knights Landing nodes). We conclude that online memory access profiling using PEBS at large-scale is promising for memory management in heterogeneous memory environments.This work has been partially funded by MEXT’s program for the Development and Improvement of Next Generation Ultra High- Speed Computer Systems under its subsidies for operating the Specific Advanced Large Research Facilities in Japan. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska- Curie grant agreement No 708566 (DURO).Peer Reviewe
    corecore