8 research outputs found

    Www.redbooks.ibm.com

    No full text
    This memory advantage would allow us to solve the same problem as PESSL using a smaller number of MPI tasks or to solve a bigger problem using the same number of tasks. 3.3.4 Domain splitting Another good example for MPI is the Domain Splitting method. It assumes you have a domain that you split onto different MPI processes. The problem with this approach is that you will have to exchange border values between the different processes as shown in Figure 19. In general, you have to send information between processes that share a common edge. In our example, this involves communication between domains A and B and B and C. Depending on your problem, you might have communication between domains sharing a common edge, C and A in our example. If you have to do this, you might need a total of eight communications. There is an algorithm that, in a two-dimensional domain splitting, reduces the number of communications needed from eight unordered to four ordered ones. In a ssend column (gen.) 3.551 1.464 .331 All2All column 3.654 1.279 .621 Pack standard 3.599 1.508 .335 Pack column 3.364 1.272 .335 three-dimensional problem, you would only need six communication steps instead of 26. Figure 19 illustrates domain splitting with nine domains. Figure 19. Domain splitting with nine domains Here is a code fragment that transfers MPI_COMM_WORLD into a two dimensional Cartesian topology, defines two MPI data types on the border, and, finally, uses MPI_Sendrecv() to do the update. The trick is to define the data types to include the corners and also transfer them. In the code fragment, the update of the corner between each communication step is missing; this can be a problem depending on how the transposition is don

    Abstract

    No full text
    The BlueGene/L supercomputer is expected to deliver new levels of application performance by providing a combination of good single-node computational performance and high scalability. To achieve good single-node performance, the BlueGene/L design includes a special dual floating-point unit on each processor and the ability to use two processors per node. BlueGene/L also includes both a torus and a tree network to achieve high scalability. We demonstrate how benchmarks and applications can take advantage of these architectural features to get the most out of BlueGene/L. 1

    Early Experience with Scientific Applications on the Blue Gene/L Supercomputer

    No full text
    Abstract. Blue Gene/L uses a large number of low power processors, together with multiple integrated interconnection networks, to build a supercomputer with low cost, space and power consumption. It uses a novel system software architecture designed with application scalability in mind. However, whether real applications will scale to tens of thousands of processors has been an open question. In this paper, we describe early experience with several applications on a 16,384 node Blue Gene/L system. This study establishes that applications from a broad variety of scientific disciplines can effectively scale to thousands of processors. The results reported in this study represent the highest performance ever demonstrated for most of these applications, and in fact, show effective scaling for the first time ever on thousands of processors.

    Scaling Physics and Material Science Applications on a Massively Parallel Blue Gene/L System

    No full text
    Blue Gene/L represents a new way to build supercomputers, using a large number of low power processors, together with multiple integrated interconnection networks. Whether real applications can scale to tens of thousands of processors (on a machine like Blue Gene/L) has been an open question. In this paper, we describe early experience with several physics and material science applications on a 32,768 node Blue Gene/L system, which was installed recently at the Lawrence Livermore National Laboratory. Our study shows some problems in the applications and in the current software implementation, but overall, excellent scaling of these applications to 32K nodes on the current Blue Gene/L system. While there is clearly room for improvement, these results represent the first proof point that MPI applications can effectively scale to over ten thousand processors. They also validate the scalability of the hardware and software architecture of Blue Gene/L. Categories and Subject Descriptors J.2 [Computer Applications]: Physical Sciences and Engineerin
    corecore