101,391 research outputs found
Analysis, classification and comparison of scheduling techniques for software transactional memories
Transactional Memory (TM) is a practical programming paradigm for developing concurrent applications. Performance is a critical factor for TM implementations, and various studies demonstrated that specialised transaction/thread scheduling support is essential for implementing performance-effective TM systems. After one decade of research, this article reviews the wide variety of scheduling techniques proposed for Software Transactional Memories. Based on peculiarities and differences of the adopted scheduling strategies, we propose a classification of the existing techniques, and we discuss the specific characteristics of each technique. Also, we analyse the results of previous evaluation and comparison studies, and we present the results of a new experimental study encompassing techniques based on different scheduling strategies. Finally, we identify potential strengths and weaknesses of the different techniques, as well as the issues that require to be further investigated
Retrofitting parallelism onto OCaml.
OCaml is an industrial-strength, multi-paradigm programming language, widely used in industry and academia. OCaml is also one of the few modern managed system programming languages to lack support for shared memory parallel programming. This paper describes the design, a full-fledged implementation and evaluation of a mostly-concurrent garbage collector (GC) for the multicore extension of the OCaml programming language. Given that we propose to add parallelism to a widely used programming language with millions of lines of existing code, we face the challenge of maintaining backwards compatibility--not just in terms of the language features but also the performance of single-threaded code running with the new GC. To this end, the paper presents a series of novel techniques and demonstrates that the new GC strikes a balance between performance and feature backwards compatibility for sequential programs and scales admirably on modern multicore processors
Intel Concurrent Collections for Haskell
Intel Concurrent Collections (CnC) is a parallel programming model in which a network of steps (functions) communicate through message-passing as well as a limited form of shared memory. This paper describes a new implementation of CnC for Haskell. Compared to existing parallel programming models for Haskell, CnC occupies a useful point in the design space: pure and deterministic like Evaluation Strategies, but more explicit about granularity and the structure of the parallel computation, which affords the programmer greater control over parallel performance. We present results on 4, 8, and 32-core machines demonstrating parallel speedups over 20x on non-trivial benchmarks
Supporting Concurrency Abstractions in High-level Language Virtual Machines
During the past decade, software developers widely adopted JVM and CLI as multi-language virtual machines (VMs). At the same time, the multicore revolution burdened developers with increasing complexity. Language implementers devised a wide range of concurrent and parallel programming concepts to address this complexity but struggle to build these concepts on top of common multi-language VMs. Missing support in these VMs leads to tradeoffs between implementation simplicity, correctly implemented language semantics, and performance guarantees. Departing from the traditional distinction between concurrency and parallelism, this dissertation finds that parallel programming concepts benefit from performance-related VM support, while concurrent programming concepts benefit from VM support that guarantees correct semantics in the presence of reflection, mutable state, and interaction with other languages and libraries. Focusing on these concurrent programming concepts, this dissertation finds that a VM needs to provide mechanisms for managed state, managed execution, ownership, and controlled enforcement. Based on these requirements, this dissertation proposes an ownership-based metaobject protocol (OMOP) to build novel multi-language VMs with proper concurrent programming support. This dissertation demonstrates the OMOP's benefits by building concurrent programming concepts such as agents, software transactional memory, actors, active objects, and communicating sequential processes on top of the OMOP. The performance evaluation shows that OMOP-based implementations of concurrent programming concepts can reach performance on par with that of their conventionally implemented counterparts if the OMOP is supported by the VM. To conclude, the OMOP proposed in this dissertation provides a unifying and minimal substrate to support concurrent programming on top of multi-language VMs. The OMOP enables language implementers to correctly implement language semantics, while simultaneously enabling VMs to provide efficient implementations
Persistent Memory Programming Abstractions in Context of Concurrent Applications
The advent of non-volatile memory (NVM) technologies like PCM, STT,
memristors and Fe-RAM is believed to enhance the system performance by getting
rid of the traditional memory hierarchy by reducing the gap between memory and
storage. This memory technology is considered to have the performance like that
of DRAM and persistence like that of disks. Thus, it would also provide
significant performance benefits for big data applications by allowing
in-memory processing of large data with the lowest latency to persistence.
Leveraging the performance benefits of this memory-centric computing technology
through traditional memory programming is not trivial and the challenges
aggravate for parallel/concurrent applications. To this end, several
programming abstractions have been proposed like NVthreads, Mnemosyne and
intel's NVML. However, deciding upon a programming abstraction which is easier
to program and at the same time ensures the consistency and balances various
software and architectural trade-offs is openly debatable and active area of
research for NVM community.
We study the NVthreads, Mnemosyne and NVML libraries by building a concurrent
and persistent set and open addressed hash-table data structure application. In
this process, we explore and report various tradeoffs and hidden costs involved
in building concurrent applications for persistence in terms of achieving
efficiency, consistency and ease of programming with these NVM programming
abstractions. Eventually, we evaluate the performance of the set and hash-table
data structure applications. We observe that NVML is easiest to program with
but is least efficient and Mnemosyne is most performance friendly but involves
significant programming efforts to build concurrent and persistent
applications.Comment: Accepted in HiPC SRS 201
A compiler approach to scalable concurrent program design
The programmer's most powerful tool for controlling complexity in program design is abstraction. We seek to use abstraction in the design of concurrent programs, so as to
separate design decisions concerned with decomposition, communication, synchronization, mapping, granularity, and load balancing. This paper describes programming and compiler techniques intended to facilitate this design strategy. The programming techniques are based on a core programming notation with two important properties: the ability to separate concurrent programming concerns, and extensibility with reusable programmer-defined
abstractions. The compiler techniques are based on a simple transformation system together with a set of compilation transformations and portable run-time support. The
transformation system allows programmer-defined abstractions to be defined as source-to-source transformations that convert abstractions into the core notation. The same
transformation system is used to apply compilation transformations that incrementally transform the core notation toward an abstract concurrent machine. This machine can be implemented on a variety of concurrent architectures using simple run-time support.
The transformation, compilation, and run-time system techniques have been implemented and are incorporated in a public-domain program development toolkit. This
toolkit operates on a wide variety of networked workstations, multicomputers, and shared-memory
multiprocessors. It includes a program transformer, concurrent compiler, syntax checker, debugger, performance analyzer, and execution animator. A variety of substantial
applications have been developed using the toolkit, in areas such as climate modeling and fluid dynamics
Towards Efficient Abstractions for Concurrent Consensus
Consensus is an often occurring problem in concurrent and distributed
programming. We present a programming language with simple semantics and
build-in support for consensus in the form of communicating transactions. We
motivate the need for such a construct with a characteristic example of
generalized consensus which can be naturally encoded in our language. We then
focus on the challenges in achieving an implementation that can efficiently run
such programs. We setup an architecture to evaluate different implementation
alternatives and use it to experimentally evaluate runtime heuristics. This is
the basis for a research project on realistic programming language support for
consensus.Comment: 15 pages, 5 figures, symposium: TFP 201
A Case Study in Coordination Programming: Performance Evaluation of S-Net vs Intel's Concurrent Collections
We present a programming methodology and runtime performance case study
comparing the declarative data flow coordination language S-Net with Intel's
Concurrent Collections (CnC). As a coordination language S-Net achieves a
near-complete separation of concerns between sequential software components
implemented in a separate algorithmic language and their parallel orchestration
in an asynchronous data flow streaming network. We investigate the merits of
S-Net and CnC with the help of a relevant and non-trivial linear algebra
problem: tiled Cholesky decomposition. We describe two alternative S-Net
implementations of tiled Cholesky factorization and compare them with two CnC
implementations, one with explicit performance tuning and one without, that
have previously been used to illustrate Intel CnC. Our experiments on a 48-core
machine demonstrate that S-Net manages to outperform CnC on this problem.Comment: 9 pages, 8 figures, 1 table, accepted for PLC 2014 worksho
- …