Search CORE

19 research outputs found

Effect inference for deterministic parallelism

Author: Faxén Karl-Filip
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/2008
Field of study

In this report we sketch a polymorphic type and effect inference system for ensuring deterministic execution of parallel programs containing shared mutable state. It differs from that of Gifford and Lucassen in being based on Hindley Milner polymorphism and in formalizing the operational semantics of parallel and sequential computation

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

A Comparison of some recent Task-based Parallel Programming Models

Author: Brorsson Mats
Faxén Karl-Filip
Podobas Artur
Publication venue
Publication date: 01/01/2010
Field of study

The need for parallel programming models that are simple to use and at the same time efficient for current ant future parallel platforms has led to recent attention to task-based models such as Cilk++, Intel TBB and the task concept in OpenMP version 3.0. The choice of model and implementation can have a major impact on the final performance and in order to understand some of the trade-offs we have made a quantitative study comparing four implementations of OpenMP (gcc, Intel icc, Sun studio and the research compiler Mercurium/nanos mcc), Cilk++ and Wool, a high-performance task-based library developed at SICS. Abstract. We use microbenchmarks to characterize costs for task-creation and stealing and the Barcelona OpenMP Tasks Suite for characterizing application performance. By far Wool and Cilk++ have the lowest overhead in both spawning and stealing tasks. This is reflected in application performance when many tasks with small granularity are spawned where Cilk++ and, in particular, has the highest performance. For coarse granularity applications, the OpenMP implementations have quite similar performance as the more light-weight Cilk++ and Wool except for one application where mcc is superior thanks to a superior task scheduler. Abstract. The OpenMP implemenations are generally not yet ready for use when the task granularity becomes very small. There is no inherent reason for this, so we expect future implementations of OpenMP to focus on this issue

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Modular cloning

Author: Faxén Karl-Filip
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/2008
Field of study

In this paper we deal with the problem of making context dependent interprocedural optimizations (where the legality of optimizing a function depends on properties of the callers of the function) effective and compatible with (a form of) separate compilation. We improve effectiveness by cloning, generating several versions of a single function optimized for different call sites. We attack the separate compilation problem, that code can not be generated until all calls of a function are known, by splitting the compilation process into two phases. The first phase analyses the modules one at a time in bottom-up dependency order ('main' is processed last) and produces code in an intermediate language where the constructs targeted by the optimization are annotated to control the application of the optimization. In cases where the legality of an optimization depends on properties of the callers of the function, these annotations can take the form of annotation variables which become extra formal parameters. The second phase traverses the modules in top-down dependency order, removing all of these extra parameters by specialization. We illustrate our approach with an integrated programming analysis and transformation system featuring a context sensitive type based analysis, cloning with sharing of identical clones and a modular implementation allowing for the compilation of large programs. The system implements cheap eagerness and redundant eval elimination for a lazy functional language

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Wool-A work stealing library

Author: Karl-Filip Faxén
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

JakobRogstadius/MOSTACHI: v1.0

Author: Jakob Rogstadius
Karl-Filip Faxén
Publication venue
Publication date: 20/09/2023
Field of study

Initial release for peer review

ZENODO

Resource management for task-based parallel programs over a multi-kernel. : BIAS: Barrelfish Inter-core Adaptive Scheduling

Author: Brorsson Mats
Faxén Karl-Filip
Varisteas Georgios
Publication venue
Publication date: 01/01/2012
Field of study

Trying to attack the problem of resource contention, created by multiple parallel applications running simultaneously, we propose a space-sharing, two-level, adaptive scheduler for the Barrelﬁsh operating system.The ﬁrst level is system-wide, running close to the OS’ kernel, and has knowledge of the available resources, while the second level, integrated into the application’s runtime, is aware of its type and amount of parallelism. Feedback on efficiency from the second-level to the ﬁrst-level, allows the latter to adaptively modify the allotment of cores (domain), intelligently promoting space-sharing of resources while still allowing time-sharing when needed.In order to avoid excess inter-core communication, the system-level scheduler is designed as a distributed service, taking advantage of the message-passing nature of Barrelﬁsh. The processor topology is partitioned so that each instance of the scheduler handles an appropriately sized subset of cores.Malleability is achieved by suspending worker-threads. Two different methodologies are introduced and explained, each suitable for distinct programming models and applications.Preliminary results are quite promising and show minimal added overhead. In specific multiprogramming conﬁgurations, initial experiments proved significant performance improvement by avoiding contention.QC 20130116Barrelfis

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

A Quantitative Evaluation of popular Task-Centric Programming Models and Libraries

Author: Brorsson Mats
Faxén Karl-Filip
Podobas Artur
Publication venue: Swedish Institute of Computer Science, SICS
Publication date: 01/01/2012
Field of study

Programmers today face a bewildering array ofparallel programming models and tools, making it difficult tochoose an appropriate one for each application. The presentstudy focuses on the task centric approach and compares severalpopular systems, including Cilk Plus, TBB and various imple-mentations of OpenMP 3.0. We analyse their performance on theBOTS benchmark suite both on a 48 core Magny Cours serverand a 64 core TILEPro64 embedded manycore processor.QC 20121214</p

Publikationer från KTH

Dynamic Inter-core Scheduling in Barrelfish : avoiding contention with malleable process domains

Author: Brorsson Mats
Faxén Karl-Filip
Varisteas Georgios
Publication venue: 'Linkoping University Electronic Press'
Publication date: 01/01/2011
Field of study

Trying to attack the problem of resource contention, created by multiple parallel applications running simultaneously, we propose a space-sharing, two-level, adaptive scheduler for the Barrelfish operating system. The first level is system-wide, existing inside the OS, and has knowledge of the available resources, while the second level is aware of the parallelism in the application. Feedback on efficiency from the second-level to the first-level, allows the latter to adaptively modify the allotment of cores (domain) thus intelligently avoiding time-sharing. In order to avoid excess inter-core communication, the first-level scheduler is designed as a distributed service, taking advantage of the message-passing nature of Barrelfish. The processor topology is partitioned so that each instance of the scheduler handles an appropriately sized subset of cores. Malleability is achieved by suspending worker-threads. Two different methodologies are introduced and explained, each ideal for different situations.QC 20120202Barrelfis

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line