5 research outputs found

    Mely: Efficient Workstealing for Multicore Event-Driven Systems

    Get PDF
    Many high-performance communicating systems are designed using the event-driven paradigm. As multicore platforms are now pervasive, it becomes crucial for such systems to take advantage of the available hardware parallelism. Event-coloring is a promising approach in this regard. First, it allows programmers to simply and progressively inject support for the safe, parallel execution of multiple event handlers through the use of annotations. Second, it relies on a workstealing algorithm to dynamically balance the execution of event handlers on the available cores. This paper studies the impact of the workstealing algorithm on the overall system performance. We first show that the only existing workstealing algorithm designed for event-coloring runtimes is not always efficient: for instance, it causes a 33% performance degradation on a Web server. We then introduce several enhancements to improve the workstealing behavior. An evaluation using both microbenchmarks and real applications, a Web server and the Secure File Server (SFS), shows that our system consistently outperforms a state-of-the-art runtime (Libasync-smp), with or without workstealing. In particular, our new workstealing improves performance by up to +25% compared to Libasync-smp without workstealing and by up to +73% compared to the Libasync-smp workstealing algorithm, in the Web server case

    Study of Scheduling in Programming Languages of Multi-Core Processor

    Get PDF
    Over the recent decades, the nature of multi core processors caused changing the serial programming model to parallel mode. There are several programming languages for the parallel multi core processors and processors with different architectures that these languages have faced programmers to challenges to achieve higher performance. In additional, different scheduling methods in the programming languages for the multi core processors have significant impact on efficiency of the programming languages. Therefore, this article addresses the investigation of the conventional scheduling techniques in the programming languages of multi core processors which allows researcher to choose more suitable programing languages by comparing efficiency than application. Several languages such as Cilk++، OpenMP، TBB and PThread were studied and their scheduling efficiency has been investigated by running Quick-Sort and Merge-Sort algorithms as wel

    Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages

    Get PDF
    International audienceWe present a joint scheduling and memory allocation algorithm for efficient execution of task-parallel programs on non-uniform memory architecture (NUMA) systems. Task and data placement decisions are based on a static description of the memory hierarchy and on runtime information about intertask communication. Existing locality-aware scheduling strategies for fine-grained tasks have strong limitations: they are specific to some class of machines or applications, they do not handle task dependences, they require manual program annotations, or they rely on fragile profiling schemes. By contrast, our solution makes no assumption on the structure of programs or on the layout of data in memory. Experimental results, based on the OpenStream language, show that locality of accesses to main memory of scientific applications can be increased significantly on a 64-core machine, resulting in a speedup of up to 1.63× compared to a state-of-the-art work-stealing scheduler

    Le Remote Core Lock (RCL) : une nouvelle technique de verrouillage pour les architectures multi-coeur

    Get PDF
    National audienceLes architectures multi-coeur sont désormais omniprésentes dans les systèmes informatiques per-sonnels et d'entreprise. À l'heure actuelle, les systèmes et les applications sont cependant incapables d'exploiter efficacement la puissance de ces nouvelles architectures, en particulier à cause du coût d'exécution des sections critiques. Nous proposons une nouvelle approche, baptisée Remote Core Lock (RCL), qui permet d'améliorer les performances des applications multi-fil sur les architectures multi-coeur. Le principe du RCL est de remplacer, dans les applications patrimoniales, certaines prises de verrous critiques en terme de performances par des appels de procédures distantes sur un coeur dédié appelé serveur. L'intérêt du RCL est double. D'une part, en remplaçant les demandes de prises de verrou par un unique envoi de message au serveur, le RCL évite les effets d'effondrement liés à la surcharge du bus lors d'un grand nombre de demandes concurrentes de prise de verrou. D'autre part, les verrous sont en général utilisés pour protéger les accès à des données partagées et le RCL évite la migration de ces données sur le coeur qui prend le verrou : les données partagées restent en effet dans les caches du serveur, puisque celui-ci est le seul à y accéder. Nos premières évaluations montrent que (i) le RCL offre des performances supérieures aux verrous classiques en cas de forte contention sur le bus, (ii) grâce au RCL, le benchmark SPLASH-2/Raytrace passe à l'échelle jusqu'à 32 coeurs, au lieu de 8 avec des verrous classiques et (iii) l'utilisation du RCL dans le serveur de cache memcached offre un gain de débit allant jusqu'à 65%

    The parallel event loop model and runtime: a parallel programming model and runtime system for safe event-based parallel programming

    Get PDF
    Recent trends in programming models for server-side development have shown an increasing popularity of event-based single- threaded programming models based on the combination of dynamic languages such as JavaScript and event-based runtime systems for asynchronous I/O management such as Node.JS. Reasons for the success of such models are the simplicity of the single-threaded event-based programming model as well as the growing popularity of the Cloud as a deployment platform for Web applications. Unfortunately, the popularity of single-threaded models comes at the price of performance and scalability, as single-threaded event-based models present limitations when parallel processing is needed, and traditional approaches to concurrency such as threads and locks don't play well with event-based systems. This dissertation proposes a programming model and a runtime system to overcome such limitations by enabling single-threaded event-based applications with support for speculative parallel execution. The model, called Parallel Event Loop, has the goal of bringing parallel execution to the domain of single-threaded event-based programming without relaxing the main characteristics of the single-threaded model, and therefore providing developers with the impression of a safe, single-threaded, runtime. Rather than supporting only pure single-threaded programming, however, the parallel event loop can also be used to derive safe, high-level, parallel programming models characterized by a strong compatibility with single-threaded runtimes. We describe three distinct implementations of speculative runtimes enabling the parallel execution of event-based applications. The first implementation we describe is a pessimistic runtime system based on locks to implement speculative parallelization. The second and the third implementations are based on two distinct optimistic runtimes using software transactional memory. Each of the implementations supports the parallelization of applications written using an asynchronous single-threaded programming style, and each of them enables applications to benefit from parallel execution
    corecore