41 research outputs found

    Le Remote Core Lock (RCL) : une nouvelle technique de verrouillage pour les architectures multi-coeur

    Get PDF
    National audienceLes architectures multi-coeur sont désormais omniprésentes dans les systèmes informatiques per-sonnels et d'entreprise. À l'heure actuelle, les systèmes et les applications sont cependant incapables d'exploiter efficacement la puissance de ces nouvelles architectures, en particulier à cause du coût d'exécution des sections critiques. Nous proposons une nouvelle approche, baptisée Remote Core Lock (RCL), qui permet d'améliorer les performances des applications multi-fil sur les architectures multi-coeur. Le principe du RCL est de remplacer, dans les applications patrimoniales, certaines prises de verrous critiques en terme de performances par des appels de procédures distantes sur un coeur dédié appelé serveur. L'intérêt du RCL est double. D'une part, en remplaçant les demandes de prises de verrou par un unique envoi de message au serveur, le RCL évite les effets d'effondrement liés à la surcharge du bus lors d'un grand nombre de demandes concurrentes de prise de verrou. D'autre part, les verrous sont en général utilisés pour protéger les accès à des données partagées et le RCL évite la migration de ces données sur le coeur qui prend le verrou : les données partagées restent en effet dans les caches du serveur, puisque celui-ci est le seul à y accéder. Nos premières évaluations montrent que (i) le RCL offre des performances supérieures aux verrous classiques en cas de forte contention sur le bus, (ii) grâce au RCL, le benchmark SPLASH-2/Raytrace passe à l'échelle jusqu'à 32 coeurs, au lieu de 8 avec des verrous classiques et (iii) l'utilisation du RCL dans le serveur de cache memcached offre un gain de débit allant jusqu'à 65%

    Efficient locking for multicore architectures

    Get PDF
    The scalability of multithreaded applications on current multicore systems is hampered by the performance of critical sections, due in particular to the costs of access contention and cache misses. In this paper, we propose a new locking technique, Remote Core Locking (RCL) that aims to improve the performance of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server core. RCL limits the performance collapse observed with regular locks when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the core acquiring the lock: such data can typically remain in the server core's cache. Our microbenchmark shows that under high contention, RCL is always more efficient than the other state-of-the-art lock mechanisms, and a preliminary macrobenchmark evaluation shows performance gains on SPLASH-2 benchmarks (speedup up to 4.85) and on the Web cacheapplication memcached (speedup up to 2.62).L'extensibilité des applications parallèles sur les architectures multicoeurs modernes est limitée par la performance des sections critiques, pour des raisons de contention sur le bus et de défauts de cache. Dans cet article, nous proposons une nouvelle approche pour l'implémentation des verrous, appelée Verrou À Distance (VAD), qui permet d'améliorer la performance des applications patrimoniales sur les architectures multicoeurs. L'idée du VAD est de remplacer les acquisitions de verrous par des appels de procédures à distance vers un ou plusieurs coeurs dédiés. Le VAD permet de limiter l'effet d'effondrement des performances observé avec les verrous classiques lorsque de nombreux fils d'exécution tentent d'acquérir simultanément un verrou. Le VAD évite également le transfert des données protégées par le verrou vers le coeur qui en fait l'acquisition. De fait, ces données restent dans le cache du coeur serveur. Sous haute contention, nos micro-évaluations montre que le VAD est toujours plus performant que l'état de l'art en matière de verrou. Sur des applications patrimoniales, nos expérimentations montrent un gain en performance pouvant aller jusqu'à 4.85 sur le banc d'essai SPLASH-2 et jusqu'à 2.62 sur le cache Web memcached

    CSR++: A Fast, Scalable, Update-Friendly Graph Data Structure

    Get PDF

    Fast and Portable Locking for Multicore Architectures

    Get PDF
    International audienceThe scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. The main contribution presented in this article is a new locking technique, Remote Core Locking (RCL), that aims to accelerate the execution of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server hardware thread. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the hardware thread acquiring the lock, because such data can typically remain in the server's cache. Other contributions presented in this article include a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX lock acquisitions into RCL locks. Eighteen applications were used to evaluate RCL: the nine applications of the SPLASH-2 benchmark suite, the seven applications of the Phoenix 2 benchmark suite, Memcached, and Berkeley DB with a TPC-C client. Eight of these applications are unable to scale because of locks and benefit from RCL on an ×86 machine with four AMD Opteron processors and 48 hardware threads. By using RCL instead of Linux POSIX locks, performance is improved by up to 2.5 times on Memcached, and up to 11.6 times on Berkeley DB with the TPC-C client. On a SPARC machine with two Sun Ultrasparc T2+ processors and 128 hardware threads, three applications benefit from RCL. In particular, performance is improved by up to 1.3 times with respect to Solaris POSIX locks on Memcached, and up to 7.9 times on Berkeley DB with the TPC-C client.. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publication

    Hector: Detecting resource-release omission faults in error-handling code for systems software

    Get PDF
    International audienceOmitting resource-release operations in systems error handling code can lead to memory leaks, crashes, and deadlocks. Finding omission faults is challenging due to the difficulty of reproducing system errors, the diversity of system resources, and the lack of appropriate abstractions in the C language. To address these issues, numerous approaches have been proposed that globally scan a code base for common resource-release operations. Such macroscopic approaches are notorious for their many false positives, while also leaving many faults undetected. We propose a novel microscopic approach to finding resource-release omission faults in systems software. Rather than generalizing from the entire source code, our approach focuses on the error-handling code of each function. Using our tool, Hector, we have found over 370 faults in six systems software projects, including Linux, with a 23% false positive rate. Some of these faults allow an unprivileged malicious user to crash the entire system

    Remote Core Locking: Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications

    Get PDF
    National audienceThe scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. In this paper, we propose a new lock algorithm, Remote Core Locking (RCL), that aims to improve the performance of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server core. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the core acquiring the lock because such data can typically remain in the server core's cache. We have developed a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX locks into RCL locks. We have evaluated our approach on 18 applications: Memcached, Berkeley DB, the 9 applications of the SPLASH-2 benchmark suite and the 7 applications of the Phoenix2 benchmark suite. 10 of these applications, including Memcached and Berkeley DB, are unable to scale because of locks, and benefit from RCL. Using RCL locks, we get performance improvements of up to 2.6 times with respect to POSIX locks on Memcached, and up to 14 times with respect to Berkeley DB

    Remote Core Locking: Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications

    Get PDF
    National audienceThe scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. In this paper, we propose a new lock algorithm, Remote Core Locking (RCL), that aims to improve the performance of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server core. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the core acquiring the lock because such data can typically remain in the server core's cache. We have developed a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX locks into RCL locks. We have evaluated our approach on 18 applications: Memcached, Berkeley DB, the 9 applications of the SPLASH-2 benchmark suite and the 7 applications of the Phoenix2 benchmark suite. 10 of these applications, including Memcached and Berkeley DB, are unable to scale because of locks, and benefit from RCL. Using RCL locks, we get performance improvements of up to 2.6 times with respect to POSIX locks on Memcached, and up to 14 times with respect to Berkeley DB

    OS Scheduling with Nest: Keeping Tasks Close Together on Warm Cores

    Get PDF
    International audienceTo best support highly parallel applications, Linux's CFS scheduler tends to spread tasks across the machine on task creation and wakeup. It has been observed, however, that in a server environment, such a strategy leads to tasks being unnecessarily placed on long-idle cores that are running at lower frequencies, reducing performance, and to tasks being unnecessarily distributed across sockets, consuming more energy. In this paper, we propose to exploit the principle of core reuse, by constructing a nest of cores to be used in priority for task scheduling, thus obtaining higher frequencies and using fewer sockets. We implement the Nest scheduler in the Linux kernel. While performance and energy usage are comparable to CFS for highly parallel applications, for a range of applications using fewer tasks than cores, Nest improves performance 10%-2× and can reduce energy usage
    corecore