16 research outputs found
A Study of Multithreaded Benchmarks on the Hewlett-Packard X- and V-Class Architectures
The Hewlett-Packard X- and V-Class ccNUMA systems appear well suited to exploiting coarse and fine-grained parallelism, using multithreading techniques. This paper briefly summarizes the multilevel memory subsystem for the X- and V-Class platforms. Typical MPP distributed memory programming concerns for the codes under investigation, such as explicit memory localization and load balancing, are compared to relevant issues when porting and tuning for the X- and V-Class.
This paper uses two small benchmarks as the basis for investigating differences running multithreaded codes in SPP-UX and HP-UX environments. One code is from the Command, Control, Communication and Intelligence (C3I) Parallel Benchmark suite, shown to have the potential for large-scale parallelization with straightforward multithreading techniques. The second benchmark exhibits the computationally dynamic behavior of a thermally-driven explosion model. Both codes are shown to stress the HP systems' ability to keep memory close to processors and appropriate threads of execution
Analysis, Tracing, Characterization and Performance Modeling of Select ASCI Applications for BlueGene/L Using Parallel Discrete Event Simulation
Caltech's Jet Propulsion Laboratory (JPL) and Center for Advanced Computer Architecture (CACR) are conducting application and simulation analyses of Blue Gene/L[1] in order to establish a range of effectiveness of the architecture in performing important classes of computations and to determine the design sensitivity of the global interconnect network in support of real world ASCI application execution
A Test Suite for High-Performance Parallel Java
The Java programming language has a number of features that make it attractive for writing high-quality, portable parallel programs. A pure object formulation, strong typing and the exception model make programs easier to create, debug, and maintain. The elegant threading provides a simple route to parallelism on shared-memory machines. Anticipating great improvements in numerical performance, this paper presents a suite of simple programs that indicate how a pure Java Navier-Stokes solver might perform. The suite includes a parallel Euler solver. We present results from a 32-processor Hewlett-Packard machine and a 4-processor Sun server. While speedup is excellent on both machines, indicating a high-quality thread scheduler, the single-processor performance needs much improvement
The US Program in Ground-Based Gravitational Wave Science: Contribution from the LIGO Laboratory
Recent gravitational-wave observations from the LIGO and Virgo observatories have brought a sense of great excitement to scientists and citizens the world over. Since September 2015,10 binary black hole coalescences and one binary neutron star coalescence have been observed. They have provided remarkable, revolutionary insight into the "gravitational Universe" and have greatly extended the field of multi-messenger astronomy. At present, Advanced LIGO can see binary black hole coalescences out to redshift 0.6 and binary neutron star coalescences to redshift 0.05. This probes only a very small fraction of the volume of the observable Universe. However, current technologies can be extended to construct "3rd Generation" (3G) gravitational-wave observatories that would extend our reach to the very edge of the observable Universe. The event rates over such a large volume would be in the hundreds of thousands per year (i.e. tens per hour). Such 3G detectors would have a 10-fold improvement in strain sensitivity over the current generation of instruments, yielding signal-to-noise ratios of 1000 for events like those already seen. Several concepts are being studied for which engineering studies and reliable cost estimates will be developed in the next 5 years
Tuecke: Application Experiences with the Globus Toolkit
The development of applications and tools for highperformance “computational grids ” is complicated by the heterogeneity and frequently dynamic behavior of the underlying resources; by the complexity of the applications themselves, which often combine aspects of supercomputing and distributed computing; and by the need to achieve high levels of performance. The Globus toolkit has been developed with the goal of simplifying this application development task, by providing implementations of various core services deemed essential for high-performance distributed computing. In this paper, we describe two large applications developed with this toolkit: a distributed interactive simulation and a teleimmersion system. We describe the process used to develop the applications, review lessons learned, and draw conclusions regarding the effectiveness of the toolkit approach. 1
Abstract
The BlueGene/L supercomputer is expected to deliver new levels of application performance by providing a combination of good single-node computational performance and high scalability. To achieve good single-node performance, the BlueGene/L design includes a special dual floating-point unit on each processor and the ability to use two processors per node. BlueGene/L also includes both a torus and a tree network to achieve high scalability. We demonstrate how benchmarks and applications can take advantage of these architectural features to get the most out of BlueGene/L. 1