40,649 research outputs found

    BarrierPoint: sampled simulation of multi-threaded applications

    Get PDF
    Sampling is a well-known technique to speed up architectural simulation of long-running workloads while maintaining accurate performance predictions. A number of sampling techniques have recently been developed that extend well- known single-threaded techniques to allow sampled simulation of multi-threaded applications. Unfortunately, prior work is limited to non-synchronizing applications (e.g., server throughput workloads); requires the functional simulation of the entire application using a detailed cache hierarchy which limits the overall simulation speedup potential; leads to different units of work across different processor architectures which complicates performance analysis; or, requires massive machine resources to achieve reasonable simulation speedups. In this work, we propose BarrierPoint, a sampling methodology to accelerate simulation by leveraging globally synchronizing barriers in multi-threaded applications. BarrierPoint collects microarchitecture-independent code and data signatures to determine the most representative inter-barrier regions, called barrierpoints. BarrierPoint estimates total application execution time (and other performance metrics of interest) through detailed simulation of these barrierpoints only, leading to substantial simulation speedups. Barrierpoints can be simulated in parallel, use fewer simulation resources, and define fixed units of work to be used in performance comparisons across processor architectures. Our evaluation of BarrierPoint using NPB and Parsec benchmarks reports average simulation speedups of 24.7x (and up to 866.6x) with an average simulation error of 0.9% and 2.9% at most. On average, BarrierPoint reduces the number of simulation machine resources needed by 78x

    Million Atom Electronic Structure and Device Calculations on Peta-Scale Computers

    Full text link
    Semiconductor devices are scaled down to the level which constituent materials are no longer considered continuous. To account for atomistic randomness, surface effects and quantum mechanical effects, an atomistic modeling approach needs to be pursued. The Nanoelectronic Modeling Tool (NEMO 3-D) has satisfied the requirement by including emprical sp3s∗sp^{3}s^{*} and sp3d5s∗sp^{3}d^{5}s^{*} tight binding models and considering strain to successfully simulate various semiconductor material systems. Computationally, however, NEMO 3-D needs significant improvements to utilize increasing supply of processors. This paper introduces the new modeling tool, OMEN 3-D, and discusses the major computational improvements, the 3-D domain decomposition and the multi-level parallelism. As a featured application, a full 3-D parallelized Schr\"odinger-Poisson solver and its application to calculate the bandstructure of δ\delta doped phosphorus(P) layer in silicon is demonstrated. Impurity bands due to the donor ion potentials are computed.Comment: 4 pages, 6 figures, IEEE proceedings of the 13th International Workshop on Computational Electronics, Tsinghua University, Beijing, May 27-29 200

    Comparative Evaluation and Case Studies of Shared-Memory and Data-Parallel Execution Patterns

    Get PDF

    A Linked-Cell Domain Decomposition Method for Molecular Dynamics Simulation on a Scalable Multiprocessor

    Get PDF

    Database interfaces on NASA's heterogeneous distributed database system

    Get PDF
    The purpose of Distributed Access View Integrated Database (DAVID) interface module (Module 9: Resident Primitive Processing Package) is to provide data transfer between local DAVID systems and resident Data Base Management Systems (DBMSs). The result of current research is summarized. A detailed description of the interface module is provided. Several Pascal templates were constructed. The Resident Processor program was also developed. Even though it is designed for the Pascal templates, it can be modified for templates in other languages, such as C, without much difficulty. The Resident Processor itself can be written in any programming language. Since Module 5 routines are not ready yet, there is no way to test the interface module. However, simulation shows that the data base access programs produced by the Resident Processor do work according to the specifications

    Impulse: Memory System Support for Scientific Applications

    Get PDF

    Requirements and Problems in Parallel Model Development at DWD

    Get PDF

    On the Performance of the Python Programming Language for Serial and Parallel Scientific Computations

    Get PDF
    • …
    corecore