63 research outputs found

    Dynamic Trace-Based Data Dependency Analysis for Parallelization of C Programs

    Get PDF
    Writing parallel code is traditionally considered a difficult task, even when it is tackled from the beginning of a project. In this paper, we demonstrate an innovative toolset that faces this challenge directly. It provides the software developers with profile data and directs them to possible top-level, pipeline-style parallelization opportunities for an arbitrary sequential C program. This approach is complementary to the methods based on static code analysis and automatic code rewriting and does not impose restrictions on the structure of the sequential code or the parallelization style, even though it is mostly aimed at coarse-grained task-level parallelization. The proposed toolset has been utilized to define parallel code organizations for a number of real-world representative applications and is based on and is provided as free source

    The "MIND" Scalable PIM Architecture

    Get PDF
    MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture

    Interactive Trace-Based Analysis Toolset for Manual Parallelization of C Programs

    Get PDF
    Massive amounts of legacy sequential code need to be parallelized to make better use of modern multiprocessor architectures. Nevertheless, writing parallel programs is still a difficult task. Automated parallelization methods can be effective both at the statement and loop levels and, recently, at the task level, but they are still restricted to specific source code constructs or application domains. We present in this article an innovative toolset that supports developers when performing manual code analysis and parallelization decisions. It automatically collects and represents the program profile and data dependencies in an interactive graphical format that facilitates the analysis and discovery of manual parallelization opportunities. The toolset can be used for arbitrary sequential C programs and parallelization patterns. Also, its program-scope data dependency tracing at runtime can complement the tools based on static code analysis and can also benefit from it at the same time. We also tested the effectiveness of the toolset in terms of time to reach parallelization decisions and of their quality. We measured a significant improvement for several real-world representative applications

    Disk-Directed I/O for MIMD Multiprocessors

    Get PDF
    Many scientific applications that run on today\u27s multiprocessors are bottlenecked by their file I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth to the application. Although libraries and improved file-system interfaces can make a significant improvement, we believe that fundamental changes are needed in the file-server software. We propose a new technique, \em disk-directed I/O, that flips the usual relationship between server and client to allow the disks (actually, disk servers) to determine the flow of data for maximum performance. Our simulations show that tremendous performance gains are possible. Indeed, disk-directed I/O provided consistent high performance that was largely independent of data distribution, and close to the maximum disk bandwidth

    A type-checking preprocessor for Cilk 2, a multithreaded C language

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (p. 37-38).by Robert C. Miller.M.Eng

    Disk-directed I/O for MIMD Multiprocessors

    Get PDF
    Many scientific applications that run on today\u27s multiprocessors are bottlenecked by their file I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth to the application. Although libraries and improved file-system interfaces can make a significant improvement, we believe that fundamental changes are needed in the file-server software. We propose a new technique, disk-directed I/O, that flips the usual relationship between server and client to allow the disks (actually, disk servers) to determine the flow of data for maximum performance. Our simulations show that tremendous performance gains are possible. Indeed, disk-directed I/O provided consistent high performance that was largely independent of data distribution, and close to the maximum disk bandwidth

    Translation techniques for distributed-shared memory programming models

    Get PDF
    This thesis argues that a modular, source-to-source translation system for distributed-shared memory programming models would be beneficial to the high-performance computing community. It goes on to present a proof-of-concept example in detail, translating between Global Arrays (GA) and Unified Parallel C (UPC). Some useful extensions to UPC are discussed, along with how they are implemented in the proof-of-concept translator

    Memory coherence activity prediction in commercial workloads

    Get PDF
    Recent research indicates that prediction-based coherence optimizations offer substantial performance improvements for scientific applications in distributed shared memory multiprocessors. Important commercial applications also show sensitivity to coherence latency, which will become more acute in the future as technology scales. Therefore it is important to investigate prediction of memory coherence activity in the context of commercial workloads.This paper studies a trace-based Downgrade Predictor (DGP) for predicting last stores to shared cache blocks, and a pattern-based Consumer Set Predictor (CSP) for predicting subsequent readers. We evaluate this class of predictors for the first time on commercial applications and demonstrate that our DGP correctly predicts 47%-76% of last stores. Memory sharing patterns in commercial workloads are inherently non-repetitive; hence CSP cannot attain high coverage. We perform an opportunity study of a DGP enhanced through competitive underlying predictors, and in commercial and scientific applications, demonstrate potential to increase coverage up to 14%

    Desafíos en el diseño de sistemas Ciber-Físicos

    Get PDF
    Los sistemas cyber-físicos ─Cyber-Physical Systems CPS─ es un proceso que integra la computación con los procesos físicos. Los computadores embebidos, el monitoreo de redes y el control de procesos físicos, usualmente tienen ciclos de retroalimentación en los que los procesos físicos afectan los cálculos, y viceversa. En este artículo se examinan los desafíos en el diseño de estos sistemas, y se plantea la cuestión de si la informática y las tecnologías de redes actuales proporcionan una base adecuada para ellos. La conclusión es que para mejorar los procesos de diseño de estos sistemas no será suficiente con elevar el nivel de abstracción o verificar, formalmente o no, los diseños en los que se basan las abstracciones de hoy. El potencial social y económico de los CPS es mucho mayor de lo que hasta el momento se ha pensado; en todo el mundo se están realizando grandes inversiones para desarrollar esta tecnología, pero los retos son considerables. Para aprovechar todo el potencial de los CPS se tendrán que reconstruir los procesos de las abstracciones informáticas y de las redes, y los procesos se deberán acoger en pleno a los principios de las dinámicas físicas y de la computación

    Sorting on Clusters of SMPs

    Get PDF
    Clusters of symmetric multiprocessors (SMPs) have emerged as the primary candidates for large scale multiprocessor systems. In this paper, we introduce an efficient sorting algorithm for clusters of SMPs. This algorithm relies on a novel scheme for stably sorting on a single SMP coupled with balanced regular communication on the cluster. Our SMP algorithm seems to be asymptotically faster than any of the published algorithms we are aware of. The algorithms were implemented in C using Posix Threads and the SIMPLE library of communication primitives and run on a cluster of DEC AlphaServer 2100A systems. Our experimental results verify the scalability and efficiency of our proposed solution and illustrate the importance of considering both memory hierarchy and the overhead of shifting to multiple nodes. (Also cross-reference as UMIACS-TR-97-6
    • …
    corecore