7 research outputs found

    Measurement of the CP violation parameter Re(e'/e)

    Get PDF

    Scheduling communication on an SMP node parallel machine

    Get PDF
    Distributed-memory parallel computers and networks of workstations (NOWs) both rely on efficient communication over increasingly high-speed networks. Software communication protocols are often the performance bottleneck. Several current and proposed parallel systems address this problem by dedicating one general-purpose processor in a symmetric multiprocessor (SMP) node specifically for protocol processing. This scheduling convention reduces communication latency and increases effective bandwidth but also reduces the peak performance since the dedicated processor no longer performs computation. In this paper, we study a parallel machine with SMP nodes and compare two protocol processing policies: Fixed, which uses a dedicated protocol processor; and Floating, where all processors perform both computation and protocol processing. The results from synthetic microbenchmarks and five macrobenchmarks show that: (i) a dedicated protocol processor benefits light-weight protocols much more than heavy- weight protocols; (ii) fixed improves performance over Floating when communication becomes the bottleneck, which is more likely when the application is very communication-intensive, overheads are very high, or there are multiple (i.e., more than two) processors per node; (iii) a system with optimal cost-effectiveness is likely to include a dedicated protocol processor, at least for light-weight protocol

    Coherent network interfaces for fine-grain communication

    Get PDF
    Using coherence can improve performance by facilitating burst transfers of whole cache blocks and reducing control overheads. This paper describes an attempt to explore network interfaces that use coherence, i.e., coherent network interfaces (CNIs), to improve communication performance. First, it reports on the development and optimization of two mechanisms that CNIs use to communicate with processors. A taxonomy and comparison of four CNIs with a more conventional NI are then presented

    Performance evaluation of scientific programs on advanced architecture computers

    No full text
    Recently a number of advanced architecture machines have become commercially available. These new machines promise better cost-performance then traditional computers, and some of them have the potential of competing with current supercomputers, such as the Cray X/MP, in terms of maximum performance. This paper describes an on-going project to evaluate a broad range of advanced architecture computers using a number of complete scientific application programs. The computers to be evaluated include (1) distributed-memory machines such as the NCUBE, INTEL and Caltech/JPL hypercubes, and the MEIKO computing surface, (2) shared-memory, bus architecture machines such as the Sequent Balance and the Alliant, (3) very long instruction word machines such as the Multiflow Trace 7/200 computer, (4) “traditional” supercomputers such as the Cray X/MP and Cray-2, and (5) SIMD machines such as the Connection Machine. Currently 11 application codes from a number of scientific disciplines have been selected, although it is not intended to run all codes on all machines. Results are presented for two of the codes (QCD and missile tracking), and future work is proposed

    Performance evaluation of scientific programs on advanced architecture computers

    No full text
    Recently a number of advanced architecture machines have become commercially available. These new machines promise better cost-performance then traditional computers, and some of them have the potential of competing with current supercomputers, such as the Cray X/MP, in terms of maximum performance. This paper describes an on-going project to evaluate a broad range of advanced architecture computers using a number of complete scientific application programs. The computers to be evaluated include (1) distributed-memory machines such as the NCUBE, INTEL and Caltech/JPL hypercubes, and the MEIKO computing surface, (2) shared-memory, bus architecture machines such as the Sequent Balance and the Alliant, (3) very long instruction word machines such as the Multiflow Trace 7/200 computer, (4) “traditional” supercomputers such as the Cray X/MP and Cray-2, and (5) SIMD machines such as the Connection Machine. Currently 11 application codes from a number of scientific disciplines have been selected, although it is not intended to run all codes on all machines. Results are presented for two of the codes (QCD and missile tracking), and future work is proposed
    corecore