Search CORE

11 research outputs found

Tradeoffs in Buffering Speculative Memory State for Thread-Level Speculation in Multiprocessors

Author: Jose Maria Llaberia
Maria Jesus Garzaran
Milos Prvulovic
Publication venue
Publication date: 01/01/2005
Field of study

this paper, we introduce a novel taxonomy of approaches to buffer and manage multiversion speculative memory state in multiprocessors. We also present a detailed complexity-benefit tradeoff analysis of the different approaches. Finally, we use numerical applications to evaluate the performance of the approaches under a single architectural framework. Our key insights are that support for buffering the state of multiple speculative tasks and versions per processor is more complexity-effective than support This paper extends an earlier version that appeared in the 9th International Symposium on High Performance Computer Architecture (HPCA), February 200

CiteSeerX

A Dynamically Tuned Sorting Library

Author: David Padua
Maria Jesus Garzaran
Xiaoming Li
Publication venue
Publication date
Field of study

Empirical search is a strategy used during the installation of library generators such as ATLAS, FFTW, and SPIRAL to identify the algorithm or the version of an algorithm that delivers the best performance. In the past, empirical search has been applied almost exclusively to scientific problems. In this paper, we discuss the application of empirical search to sorting, which is one of the best understood symbolic computing problems. When contrasted with the dense numerical computations of ATLAS, FFTW, and SPIRAL, sorting presents a new challenge, namely that the relative performance of the algorithms depend not only on the characteristics of the target machine and the size of the input data but also on the distribution of values in the input data set

CiteSeerX

In search of a program generator to implement generic transformations for high-performance computing

Author: Albert Cohen
Christoph Herrmann
David Padua
Maria-jesus Garzaran
Sébastien Donadio
Publication venue
Publication date: 01/01/2004
Field of study

The quality of compiler-optimized code for high-performance applications lags way behind what optimization and domain experts can achieve by hand. This paper explores in-between solutions, besides fully automatic and fully-manual code optimization. This work discusses how generative approaches can help the design and optimization of supercomputing applications. It outlines early results and research directions, using MetaOCaml for the design of a generative tool-box to design portable optimized code. We also identify some limitations the MetaOCaml system. We finally present and advocate for an offshoring approach to bring high-level and safe metaprogramming to imperative languages

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

Using Software Logging to Support Multi-Version Buffering in Thread-Level Speculation

Author: Jose Maria Llaberia
Josep Torrellas
Lawrence Rauchwerger
Maria Jesus Garzaran
Milos Prvulovic
Victor Vinals
Publication venue
Publication date
Field of study

In Thread-Level Speculation (TLS), speculative tasks generate memory state that cannot simply be combined with the rest of the system because it is unsafe. One way to deal with this difficulty is to allow speculative state to merge with memory but back up in an undo log the data that will be overwritten. Such undo log can be used to roll back to a safe state if a violation occurs. This approach is said to use Future Main Memory (FMM), as memory keeps the most speculative state

CiteSeerX

Architectural Support for Parallel Reductions in Scalable Shared-Memory Multiprocessors

Author: Alin Jula
Hao Yu
Josep Torrellas
Lawrence Rauchwerger
Maria Jesus Garzaran
Milos Prvulovic
Ye Zhang
Publication venue
Publication date: 01/01/2001
Field of study

Reductions are important and time-consuming operations in many scientific codes. Effective parallelization of reductions is a critical transformation for loop parallelization, especially for sparse, dynamic applications. Unfortunately, conventional reduction parallelization algorithms are not scalable. In this paper, we present new architectural support that significantly speeds-up parallel reduction and makes it scalable in shared-memory multiprocessors. The required architectural changes are mostly confined to the directory controllers. Experimental results based on simulations show that the proposed support is very effective. While conventional software-only reduction parallelization delivers average speedups of only 2.7 for 16 processors, our scheme delivers average speedups of 7.6.

CiteSeerX

Using Software Logging to Support Multi-Version Buffering in Thread-Level Speculation

Author: Jose Maria Llaberia
Josep Torrellas
José María Llabería Þ
Lawrence Rauchwerger
Maria Jesus Garzaran
Milos Prvulovic
Victor Vinals
Víctor Viñals Ý
Publication venue
Publication date
Field of study

CiteSeerX

SmartApps, an Application Centric Approach to High Performance Computing: Compiler-Assisted Software and Hardware Support for Reduction Operations

Author: Alin Jula
Francis Dang
Hao Yu
Josep Torrellas
Lawrence
Maria Jesus Garzaran
Milos Prvulovic
Nancy Amato
Ye Zhang
Publication venue
Publication date
Field of study

State-of-the-art run-time systems are a poor match to diverse, dynamic distributed applications because they are designed to provide support to a wide variety of applications, without much customization to individual specific requirements. Little or no guiding information flows directly from the application to the run-time system to allow the latter to fully tailor its services to the application. As a result, the performance is disappointing. To address this problem, we propose application-centric computing, or SMART APPLICATIONS. In the executable of smart applications, the compiler embeds most run-time system services, and a performance-optimizing feedback loop that monitors the application's performance and adaptively reconfigures the application and the OS/hardware platform. At run-time, after incorporating the code's input and the system's resources and state, the SMARTAPP performs a global optimization. This optimization is instance specific and thus much more tractable than a global generic optimization between application, OS and hardware. The resulting code and resource customization should lead to major speedups. In this paper, we first describe the overall architecture of SMARTAPPS and then present some achievements to date, focusing on compiler-assisted software and hardware techniques for parallelizing reduction operations. These illustrate SMARTAPPS use of adaptive algorithm selection and moderately reconfigurable hardware

CiteSeerX

Software Logging under Speculative Parallelization

Author: Jose Maria Llaberia
Josep Torrellas
José María Llabería Þ
Lawrence Rauchwerger
Maria Jesus Garzaran
Milos Prvulovic
Milos Prvulovic Ý
Victor Vinals
Víctor Viñals Lawrence Rauchwerger Ü
Publication venue
Publication date
Field of study

Speculative parallelization aggressively runs hardto -analyze codes in parallel. Speculative tasks generate unsafe state, which is typically buffered in caches. Often, a cache may have to buffer the state of several tasks and, as a result, it may have to hold multiple versions of the same variable. Modifying the cache to hold such multiple versions adds complexity and may increase the hit time. It is better to use logging, where the cache only stores the last versions of variables while the log keeps the older ones. Logging also helps to reduce the size of the speculative state to be retained in caches

CiteSeerX