11 research outputs found

    Tradeoffs in Buffering Speculative Memory State for Thread-Level Speculation in Multiprocessors

    No full text
    this paper, we introduce a novel taxonomy of approaches to buffer and manage multiversion speculative memory state in multiprocessors. We also present a detailed complexity-benefit tradeoff analysis of the different approaches. Finally, we use numerical applications to evaluate the performance of the approaches under a single architectural framework. Our key insights are that support for buffering the state of multiple speculative tasks and versions per processor is more complexity-effective than support This paper extends an earlier version that appeared in the 9th International Symposium on High Performance Computer Architecture (HPCA), February 200

    A Dynamically Tuned Sorting Library

    No full text
    Empirical search is a strategy used during the installation of library generators such as ATLAS, FFTW, and SPIRAL to identify the algorithm or the version of an algorithm that delivers the best performance. In the past, empirical search has been applied almost exclusively to scientific problems. In this paper, we discuss the application of empirical search to sorting, which is one of the best understood symbolic computing problems. When contrasted with the dense numerical computations of ATLAS, FFTW, and SPIRAL, sorting presents a new challenge, namely that the relative performance of the algorithms depend not only on the characteristics of the target machine and the size of the input data but also on the distribution of values in the input data set

    In search of a program generator to implement generic transformations for high-performance computing

    Get PDF
    The quality of compiler-optimized code for high-performance applications lags way behind what optimization and domain experts can achieve by hand. This paper explores in-between solutions, besides fully automatic and fully-manual code optimization. This work discusses how generative approaches can help the design and optimization of supercomputing applications. It outlines early results and research directions, using MetaOCaml for the design of a generative tool-box to design portable optimized code. We also identify some limitations the MetaOCaml system. We finally present and advocate for an offshoring approach to bring high-level and safe metaprogramming to imperative languages

    Using Software Logging to Support Multi-Version Buffering in Thread-Level Speculation

    No full text
    In Thread-Level Speculation (TLS), speculative tasks generate memory state that cannot simply be combined with the rest of the system because it is unsafe. One way to deal with this difficulty is to allow speculative state to merge with memory but back up in an undo log the data that will be overwritten. Such undo log can be used to roll back to a safe state if a violation occurs. This approach is said to use Future Main Memory (FMM), as memory keeps the most speculative state

    Architectural Support for Parallel Reductions in Scalable Shared-Memory Multiprocessors

    No full text
    Reductions are important and time-consuming operations in many scientific codes. Effective parallelization of reductions is a critical transformation for loop parallelization, especially for sparse, dynamic applications. Unfortunately, conventional reduction parallelization algorithms are not scalable. In this paper, we present new architectural support that significantly speeds-up parallel reduction and makes it scalable in shared-memory multiprocessors. The required architectural changes are mostly confined to the directory controllers. Experimental results based on simulations show that the proposed support is very effective. While conventional software-only reduction parallelization delivers average speedups of only 2.7 for 16 processors, our scheme delivers average speedups of 7.6.

    Using Software Logging to Support Multi-Version Buffering in Thread-Level Speculation

    No full text
    In Thread-Level Speculation (TLS), speculative tasks generate memory state that cannot simply be combined with the rest of the system because it is unsafe. One way to deal with this difficulty is to allow speculative state to merge with memory but back up in an undo log the data that will be overwritten. Such undo log can be used to roll back to a safe state if a violation occurs. This approach is said to use Future Main Memory (FMM), as memory keeps the most speculative state

    SmartApps, an Application Centric Approach to High Performance Computing: Compiler-Assisted Software and Hardware Support for Reduction Operations

    No full text
    State-of-the-art run-time systems are a poor match to diverse, dynamic distributed applications because they are designed to provide support to a wide variety of applications, without much customization to individual specific requirements. Little or no guiding information flows directly from the application to the run-time system to allow the latter to fully tailor its services to the application. As a result, the performance is disappointing. To address this problem, we propose application-centric computing, or SMART APPLICATIONS. In the executable of smart applications, the compiler embeds most run-time system services, and a performance-optimizing feedback loop that monitors the application's performance and adaptively reconfigures the application and the OS/hardware platform. At run-time, after incorporating the code's input and the system's resources and state, the SMARTAPP performs a global optimization. This optimization is instance specific and thus much more tractable than a global generic optimization between application, OS and hardware. The resulting code and resource customization should lead to major speedups. In this paper, we first describe the overall architecture of SMARTAPPS and then present some achievements to date, focusing on compiler-assisted software and hardware techniques for parallelizing reduction operations. These illustrate SMARTAPPS use of adaptive algorithm selection and moderately reconfigurable hardware

    Software Logging under Speculative Parallelization

    No full text
    Speculative parallelization aggressively runs hardto -analyze codes in parallel. Speculative tasks generate unsafe state, which is typically buffered in caches. Often, a cache may have to buffer the state of several tasks and, as a result, it may have to hold multiple versions of the same variable. Modifying the cache to hold such multiple versions adds complexity and may increase the hit time. It is better to use logging, where the cache only stores the last versions of variables while the log keeps the older ones. Logging also helps to reduce the size of the speculative state to be retained in caches
    corecore