Search CORE

21 research outputs found

Lasagne : a static binary translator for weak memory model architectures

Author: Bhatotia Pramod
Chakraborty Soham
Fink Martin
Gouicem Redha
Rocha Rodrigo C. O.
Spink Tom
Sprokholt Dennis
Publication venue: ACM
Publication date: 01/01/2022
Field of study

Funding: This work was supported by a UK RISE Grant.The emergence of new architectures create a recurring challenge to ensure that existing programs still work on them. Manually porting legacy code is often impractical. Static binary translation (SBT) is a process where a program’s binary is automatically translated from one architecture to another, while preserving their original semantics. However, these SBT tools have limited support to various advanced architectural features. Importantly, they are currently unable to translate concurrent binaries. The main challenge arises from the mismatches of the memory consistency model specified by the different architectures, especially when porting existing binaries to a weak memory model architecture. In this paper, we propose Lasagne, an end-to-end static binary translator with precise translation rules between x86 and Arm concurrency semantics. First, we propose a concurrency model for Lasagne’s intermediate representation (IR) and formally proved mappings between the IR and the two architectures. The memory ordering is preserved by introducing fences in the translated code. Finally, we propose optimizations focused on raising the level of abstraction of memory address calculations and reducing the number offences. Our evaluation shows that Lasagne reduces the number of fences by up to about 65%, with an average reduction of 45.5%, significantly reducing their runtime overhead.Postprin

TU Delft Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

University of St. Andrews - Pure

St Andrews Research Repository

The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS

Author: Bouron Justinien
Chevalley Sébastien
Gouicem Redha
Lawall Julia
Lepers Baptiste
Muller Gilles
Sopena Julien
Zwaenepoel Willy
Publication venue: HAL CCSD
Publication date: 02/06/2018
Field of study

International audienceThis paper analyzes the impact on application performance of the design and implementation choices made in two widely used open-source schedulers: ULE, the default FreeBSD scheduler, and CFS, the default Linux scheduler. We compare ULE and CFS in otherwise identical circumstances. We have ported ULE to Linux, and use it to schedule all threads that are normally scheduled by CFS. We compare the performance of a large suite of applications on the modified kernel running ULE and on the standard Linux kernel running CFS. The observed performance differences are solely the result of scheduling decisions, and do not reflect differences in other subsystems between FreeBSD and Linux. There is no overall winner. On many workloads the two schedulers perform similarly, but for some work-loads there are significant and even surprising differences. ULE may cause starvation, even when executing a single application with identical threads, but this starvation may actually lead to better application performance for some workloads. The more complex load balancing mechanism of CFS reacts more quickly to work-load changes, but ULE achieves better load balance in the long run

Infoscience - École polytechnique fédérale de Lausanne

INRIA a CCSD electronic archive server

HAL Descartes

The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS

Author: Bouron Justinien
Chevalley Sebastien
Gouicem Redha
Lawall Julia
Lepers Baptiste Joseph Eustache
Muller Gilles
Sopena Julien
Zwaenepoel Willy
Publication venue: Usenix ATC 2018
Publication date: 02/06/2018
Field of study

This paper analyzes the impact on application performance of the design and implementation choices made in two widely used open-source schedulers: ULE, the default FreeBSD scheduler, and CFS, the default Linux scheduler. We compare ULE and CFS in otherwise identical circumstances. We have ported ULE to Linux, and use it to schedule all threads that are normally scheduled by CFS. We compare the performance of a large suite of applications on the modified kernel running ULE and on the standard Linux kernel running CFS. The observed performance differences are solely the result of scheduling decisions, and do not reflect differences in other subsystems between FreeBSD and Linux. There is no overall winner. On many workloads the two schedulers perform similarly, but for some workloads there are significant and even surprising differences. ULE may cause starvation, even when executing a single application with identical threads, but this starvation may actually lead to better application performance for some workloads. The more complex load balancing mechanism of CFS reacts more quickly to workload changes, but ULE achieves better load balance in the long run

Infoscience - École polytechnique fédérale de Lausanne

Fewer Cores, More Hertz: Leveraging High-Frequency Cores in the OS Scheduler for Improved Application Performance

Author: Carver Damien
Gouicem Redha
Lawall Julia
Lepers Baptiste
Lozi Jean-Pierre
Muller Gilles
Palix Nicolas
Sopena Julien
Zwaenepoel Willy
Publication venue: HAL CCSD
Publication date: 15/07/2020
Field of study

International audienceIn modern server CPUs, individual cores can run at different frequencies, which allows for fine-grained control of the per-formance/energy tradeoff. Adjusting the frequency, however, incurs a high latency. We find that this can lead to a problem of frequency inversion, whereby the Linux scheduler places a newly active thread on an idle core that takes dozens to hundreds of milliseconds to reach a high frequency, just before another core already running at a high frequency becomes idle. In this paper, we first illustrate the significant performance overhead of repeated frequency inversion through a case study of scheduler behavior during the compilation of the Linux kernel on an 80-core Intel R Xeon-based machine. Following this, we propose two strategies to reduce the likelihood of frequency inversion in the Linux scheduler. When benchmarked over 60 diverse applications on the Intel R Xeon, the better performing strategy, S move , improves performance by more than 5% (at most 56% with no energy overhead) for 23 applications, and worsens performance by more than 5% (at most 8%) for only 3 applications. On a 4-core AMD Ryzen we obtain performance improvements up to 56%

INRIA a CCSD electronic archive server

Ordonnancement de fils d'exécution dans les systèmes d'exploitation multi-coeurs

Author: Gouicem Redha
Publication venue: HAL CCSD
Publication date: 23/10/2020
Field of study

In this thesis, we address the problem of schedulers for multi-core architectures from several perspectives: design (simplicity and correctness), performance improvement and the development of application-specific schedulers. The contributions presented are summarized as follows : - Ipanema, a domain-specific language dedicated to thread schedulers for multi-core architectures. We also implement a new abstraction in the Linux kernel that enables the dynamic addition of schedulers written in Ipanema. - a series of performance and bug tracking tools. Thanks to these tools, we show that the Linux scheduler, CFS, suffers from a problem related to frequency management on modern processors. We propose a solution to this problem in the form of a patch submitted to the community. This patch allows to significantly improve the performance of numerous applications. - a scheduler model in the form of a “feature tree”. We implement these features independently in order to offer a new fully modular scheduler. This modular scheduler allows us to study exhaustively the different combinations of features, thus paving the way for the development of application-specific schedulers.Dans cette thèse, nous traitons du problème des ordonnanceurs pour architectures multi-cœurs en l’abordant sous plusieurs angles : celui de la conception (simplicité et correction), celui de l’amélioration des performances et enfin celui du développement d’ordonnanceurs sur mesure pour une application. En résumé, les contributions présentées sont les suivantes : - Ipanema, un langage dédié au développement d’ordonnanceurs de processus pour multi-coeurs. Nous implémentons également au coeur du noyau Linux une nouvelle abstraction permettant d’ajouter dynamiquement un nouvel ordonnanceur écrit en Ipanema. - une série d’outils de recherche de bogues de performance. Grâce à ces outils, nous montrons que l’ordonnanceur de Linux, CFS, souffre d’un problème lié à la gestion de la fréquence sur les processeurs modernes. Nous proposons une solution à ce problème sous la forme d’un patch soumis à la communauté. Ce patch permet d’améliorer significativement les performances de nombreuses applications. - une modélisation des ordonnanceurs sous forme d’un “feature tree”. Nous implémentons ces fonctionnalités de façon indépendantes afin de proposer un nouvel ordonnanceur entièrement modulaire. Cet ordonnanceur modulaire nous permet d’étudier exhaustivement les différentes combinaisons de fonctionnalités ouvrant ainsi la voie au développement d’ordonnanceurs spécifiques à une application donnée

Thèses en Ligne

Ordonnancement de Fils d'Exécution dans les Systèmes d'Exploitation Multi-cœurs

Author: Gouicem Redha
Publication venue: HAL CCSD
Publication date: 23/10/2020
Field of study

In this thesis, we address the problem of schedulers for multi-core architectures from several perspectives: design (simplicity and correctness), performance improvement and the development of application-specific schedulers. The contributions presented are summarized as follows:- Ipanema, a domain-specific language dedicated to thread schedulers for multi-core architectures. We also implement a new abstraction in the Linux kernel that enables the dynamic addition of schedulers written in Ipanema.- a series of performance and bug tracking tools. Thanks to these tools, we show that the Linux scheduler, CFS, suffers from a problem related to frequency management on modern processors. We propose a solution to this problem in the form of a patch submitted to the community. This patch allows to significantly improve the performance of numerous applications.- a scheduler model in the form of a “feature tree”. We implement these features independently in order to offer a new fully modular scheduler. This modular scheduler allows us to study exhaustively the different combinations of features, thus paving the way for the development of application-specific schedulers.Dans cette thèse, nous traitons du problème des ordonnanceurs pour architectures multi-cœur en l’abordant sous plusieurs angles: celui de la conception (simplicité et correction), celui de l’amélioration des performances et enfin celui du développement d’ordonnanceurs sur mesure pour une application. En résumé, les contributions présentées sont les suivantes:- Ipanema, un langage dédié au développement d’ordonnanceurs de processus pour multi-cœur. Nous implémentons également au cœur du noyau Linux une nouvelle abstraction permettant d’ajouter dynamiquement un nouvel ordonnanceur écrit en Ipanema.- une série d’outils de recherche de bogues de performance. Grâce à ces outils, nous montrons que l’ordonnanceur de Linux, CFS, souffre d’un problème lié à la gestion de la fréquence sur les processeurs modernes. Nous proposons une solution à ce problème sous la forme d’un patch soumis à la communauté. Ce patch permet d’améliorer significativement les performances de nombreuses applications.- une modélisation des ordonnanceurs sous forme d’un “feature tree”. Nous implémentons ces fonctionnalités de façon indépendantes afin de proposer un nouvel ordonnanceur entièrement modulaire. Cet ordonnanceur modulaire nous permet d’étudier exhaustivement les différentes combinaisons de fonctionnalités ouvrant ainsi la voie au développement d’ordonnanceurs spécifiques à une application donnée

Thèses en Ligne

INRIA a CCSD electronic archive server

HAL Descartes

Ordonnancement de fils d'exécution dans les systèmes d'exploitation multi-coeurs

Author: Gouicem Redha
Publication venue
Publication date: 23/10/2020
Field of study

Dans cette thèse, nous traitons du problème des ordonnanceurs pour architectures multi-cœurs en l’abordant sous plusieurs angles : celui de la conception (simplicité et correction), celui de l’amélioration des performances et enfin celui du développement d’ordonnanceurs sur mesure pour une application. En résumé, les contributions présentées sont les suivantes : - Ipanema, un langage dédié au développement d’ordonnanceurs de processus pour multi-coeurs. Nous implémentons également au coeur du noyau Linux une nouvelle abstraction permettant d’ajouter dynamiquement un nouvel ordonnanceur écrit en Ipanema. - une série d’outils de recherche de bogues de performance. Grâce à ces outils, nous montrons que l’ordonnanceur de Linux, CFS, souffre d’un problème lié à la gestion de la fréquence sur les processeurs modernes. Nous proposons une solution à ce problème sous la forme d’un patch soumis à la communauté. Ce patch permet d’améliorer significativement les performances de nombreuses applications. - une modélisation des ordonnanceurs sous forme d’un “feature tree”. Nous implémentons ces fonctionnalités de façon indépendantes afin de proposer un nouvel ordonnanceur entièrement modulaire. Cet ordonnanceur modulaire nous permet d’étudier exhaustivement les différentes combinaisons de fonctionnalités ouvrant ainsi la voie au développement d’ordonnanceurs spécifiques à une application donnée.In this thesis, we address the problem of schedulers for multi-core architectures from several perspectives: design (simplicity and correctness), performance improvement and the development of application-specific schedulers. The contributions presented are summarized as follows : - Ipanema, a domain-specific language dedicated to thread schedulers for multi-core architectures. We also implement a new abstraction in the Linux kernel that enables the dynamic addition of schedulers written in Ipanema. - a series of performance and bug tracking tools. Thanks to these tools, we show that the Linux scheduler, CFS, suffers from a problem related to frequency management on modern processors. We propose a solution to this problem in the form of a patch submitted to the community. This patch allows to significantly improve the performance of numerous applications. - a scheduler model in the form of a “feature tree”. We implement these features independently in order to offer a new fully modular scheduler. This modular scheduler allows us to study exhaustively the different combinations of features, thus paving the way for the development of application-specific schedulers

Thèses en Ligne

INRIA a CCSD electronic archive server

Theses.fr

Risotto:a dynamic binary translator for weak memory model architectures

Author: Bhatotia Pramod
Chakraborty Soham
Gouicem Redha
Rocha Rodrigo
Ruehl Jasper
Spink Tom
Sprokholt Dennis
Publication venue: ACM
Publication date: 21/12/2022
Field of study

Dynamic Binary Translation (DBT) is a powerful approach to support cross-architecture emulation of unmodified binaries. However, DBT systems face correctness and performance challenges, when emulating concurrent binaries from strong to weak memory consistency architectures. As a matter of fact, we report several translation errors in QEMU, when emulating x86 binaries on Arm hosts.To address these challenges, we propose an end-to-end approach that provides correct and efficient emulation for weak memory model architectures. Our contributions are twofold: First, we formalize QEMU’s intermediate representation’s memory model, and use it to propose formally verified mapping schemes to bridge the strong-on-weak memory consistency mismatch. Second, we implement these verified mappings in Risotto, a QEMU-based DBT system that optimizes memory fence placement while ensuring correctness. Risotto further improves performance via cross-architecture dynamic linking of native shared libraries and faster yet correct translation of compare-and-swap operations.We evaluate Risotto using multi-threaded benchmark suites and real-world applications, and show that Risotto improves the emulation performance by 6.7% on average over “erroneous” QEMU, while ensuring correctness