Search CORE

4 research outputs found

Low-Level Haskell Code: Measurements and Optimization Techniques

Author: Peixotto David
Publication venue
Publication date: 06/09/2012
Field of study

Haskell is a lazy functional language with a strong static type system and excellent support for parallel programming. The language features of Haskell make it easier to write correct and maintainable programs, but execution speed often suffers from the high levels of abstraction. While much past research focuses on high-level optimizations that take advantage of the functional properties of Haskell, relatively little attention has been paid to the optimization opportunities in the low-level imperative code generated during translation to machine code. One problem with current low-level optimizations is that their effectiveness is limited by the obscured control flow caused by Haskell's high-level abstractions. My thesis is that trace-based optimization techniques can be used to improve the effectiveness of low-level optimizations for Haskell programs. I claim three unique contributions in this work. The first contribution is to expose some properties of low-level Haskell codes by looking at the mix of operations performed by the selected benchmark codes and comparing them to the low-level codes coming from traditional programming languages. The low-level measurements reveal that the control flow is obscured by indirect jumps caused by the implementation of lazy evaluation, higher-order functions, and the separately managed stacks used by Haskell programs. My second contribution is a study on the effectiveness of a dynamic binary trace-based optimizer running on Haskell programs. My results show that while viable program traces frequently occur in Haskell programs the overhead associated with maintaing the traces in a dynamic optimization system outweigh the benefits we get from running the traces. To reduce the runtime overheads, I explore a way to find traces in a separate profiling step. My final contribution is to build and evaluate a static trace-based optimizer for Haskell programs. The static optimizer uses profiling data to find traces in a Haskell program and then restructures the code around the traces to increase the scope available to the low-level optimizer. My results show that we can successfully build traces in Haskell programs, and the optimized code yields a speedup over existing low-level optimizers of up to 86% with an average speedup of 5% across 32 benchmarks

DSpace at Rice University

Tuning a priority-based register allocator using adaptive compilation

Author: Peixotto David M.
Publication venue
Publication date: 01/01/2008
Field of study

Register allocation is a long studied optimization in compiler construction because it provides great opportunity for improving execution time. Adaptive compilation is a relatively new technique that uses repeated compilation and search to find effective parameters for compiler optimizations. We examine the priority-based graph-coloring register-allocation algorithm in the context of an adaptive compiler. The priority-based algorithm was selected because it is well known, but little information exists on how it should be tuned to produce good results or how it compares with competing algorithms. We show that adaptive compilation can be used to improve the performance of a priority-based allocator. Aggressive tuning through adaptive compilation enables us to fairly compare against the Chaitin-Briggs algorithm for register allocation. We found the standard priority-based allocator was, on average, 16-9% worse than Chaitin-Briggs. Adaptive compilation enabled the priority-based allocator to close this performance gap and slightly outperform Chaitin-Briggs by an average of 1%

DSpace at Rice University

Concurrent Collections

Author: David Peixotto
Frank Schlimbach
Geoff Lowney
Jens Palsberg
Kathleen Knobe
Michael Burke
Ryan Newton
Sa&gbreve
Vincent Cavé
Vivek Sarkar
Zoran Budimlić
Publication venue: Hindawi Limited
Publication date: 01/01/2010
Field of study

We introduce the Concurrent Collections (CnC) programming model. CnC supports flexible combinations of task and data parallelism while retaining determinism. CnC is implicitly parallel, with the user providing high-level operations along with semantic ordering constraints that together form a CnC graph. We formally describe the execution semantics of CnC and prove that the model guarantees deterministic computation. We evaluate the performance of CnC implementations on several applications and show that CnC offers performance and scalability equivalent to or better than that offered by lower-level parallel programming models

Directory of Open Access Journals

The Concurrent Collections Programming Model

Author: Budimlić Zoran
Burke Michael G.
Cavé Vincent
Knobe Kathleen
Lowney Geoff
Palsberg Jens
Peixotto David
Sarkar Vivek
Schlimbach Frank
Taşırlar Sağnak
Publication venue
Publication date: 04/01/2010
Field of study

We introduce the Concurrent Collections (CnC) programming model. In this model, programs are written in terms of high-level operations. These operations are partially ordered according to only their semantic constraints. These partial orderings correspond to data dependences and control dependences. The role of the domain expert, whose interest and expertise is in the application domain, and the role of the tuning expert, whose interest and expertise is in performance on a specific architecture, can be viewed as separate concerns. The CnC programming model pro vides a high-level specification that can be used as a common language between the two experts, raising the level of their discourse. The model facilitates a significant degree of separation, which simplifies the task of the domain expert, who can focus on the application rather than scheduling concerns and mapping to the target architecture. This separation also simplifies the work of the tuning expert, who is given the maximum possible freedom to map the computation onto the target architecture and is not required to understand the details of the domain. However, the domain and tuning expert may still be the same person. We formally describe the execution semantics of CnC and prove that this model guarantees deterministic computation. We evaluate the performance of CnC implementations on several applications and show that CnC can effectively exploit several different kinds of parallelism and offer performance and scalability that is equivalent to or better than that offered by the current low-level parallel programming models. Further, with respect to ease of programming, we discuss the tradeoffs between CnC and other parallel program ming models on these applications

DSpace at Rice University