45 research outputs found

    Parallel computing 2011, ParCo 2011: book of abstracts

    Get PDF
    This book contains the abstracts of the presentations at the conference Parallel Computing 2011, 30 August - 2 September 2011, Ghent, Belgiu

    Transferring ecosystem simulation codes to supercomputers

    Get PDF
    Many ecosystem simulation computer codes have been developed in the last twenty-five years. This development took place initially on main-frame computers, then mini-computers, and more recently, on micro-computers and workstations. Supercomputing platforms (both parallel and distributed systems) have been largely unused, however, because of the perceived difficulty in accessing and using the machines. Also, significant differences in the system architectures of sequential, scalar computers and parallel and/or vector supercomputers must be considered. We have transferred a grassland simulation model (developed on a VAX) to a Cray Y-MP/C90. We describe porting the model to the Cray and the changes we made to exploit the parallelism in the application and improve code execution. The Cray executed the model 30 times faster than the VAX and 10 times faster than a Unix workstation. We achieved an additional speedup of 30 percent by using the compiler's vectoring and 'in-line' capabilities. The code runs at only about 5 percent of the Cray's peak speed because it ineffectively uses the vector and parallel processing capabilities of the Cray. We expect that by restructuring the code, it could execute an additional six to ten times faster

    An Application Perspective on High-Performance Computing and Communications

    Get PDF
    We review possible and probable industrial applications of HPCC focusing on the software and hardware issues. Thirty-three separate categories are illustrated by detailed descriptions of five areas -- computational chemistry; Monte Carlo methods from physics to economics; manufacturing; and computational fluid dynamics; command and control; or crisis management; and multimedia services to client computers and settop boxes. The hardware varies from tightly-coupled parallel supercomputers to heterogeneous distributed systems. The software models span HPF and data parallelism, to distributed information systems and object/data flow parallelism on the Web. We find that in each case, it is reasonably clear that HPCC works in principle, and postulate that this knowledge can be used in a new generation of software infrastructure based on the WebWindows approach, and discussed in an accompanying paper

    Implementation of MPICH on top of MPLi̲te

    Get PDF
    The goal of this thesis is to develop a new Channel Interface device for the MPICH implementation of the MPI (Message Passing Interface) standard using MPLi̲te. MPLi̲te is a lightweight message-passing library that is not a full MPI implementation, but offers high performance. MPICH (Message Passing Interface CHameleon) is a full implementation of the MPI standard that has the p4 library as the underlying communication device for TCP/IP networks. By integrating MPLi̲te as a Channel Interface device in MPICH, a parallel programmer can utilize the full MPI implementation of MPICH as well as the high bandwidth offered by MPLi̲te. There are several layers in the MPICH library where one can tie a new device. The Channel Interface is the lowest layer that requires very few functions to add a new device. By attaching MPLi̲te to MPICH at the lowest level, the Channel Interface, almost all of the performance of the MPLi̲te library can be delivered to the applications using MPICH. MPLi̲te can be implemented either as a blocking or a non-blocking Channel Interface device. The performance was measured on two separate test clusters, the PC and the Alpha mini-clusters, having Gigabit Ethernet connections. The PC cluster has two 1.8 GHz Pentium 4 PCs and the Alpha cluster has two 500 MHz Compaq DS20 workstations. Different network interface cards like Netgear, TrendNet and SysKonnect Gigabit Ethernet cards were used for the measurements. Both the blocking and non-blocking MPICH-MPLi̲te Channel Interface devices perform close to raw TCP, whereas a performance loss of 25-30% is seen in the MPICH-p4 Channel Interface device for larger messages. The superior performance offered by the MPICH-MPLi̲te device compared to the MPICH-p4 device can be easily seen on the SysKonnect cards using jumbo frames. The throughput curve also improves considerably by increasing the Eager/Rendezvous threshold

    Physics-based balancing domain decomposition by constraints for multi-material problems

    Get PDF
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10915-018-0870-zIn this work, we present a new variant of the balancing domain decomposition by constraints preconditioner that is robust for multi-material problems. We start with a well-balanced subdomain partition, and based on an aggregation of elements according to their physical coefficients, we end up with a finer physics-based (PB) subdomain partition. Next, we define corners, edges, and faces for this PB partition, and select some of them to enforce subdomain continuity (primal faces/edges/corners). When the physical coefficient in each PB subdomain is constant and the set of selected primal faces/edges/corners satisfy a mild condition on the existence of acceptable paths, we can show both theoretically and numerically that the condition number does not depend on the contrast of the coefficient across subdomains. An extensive set of numerical experiments for 2D and 3D for the Poisson and linear elasticity problems is provided to support our findings. In particular, we show robustness and weak scalability of the new preconditioner variant up to 8232 cores when applied to 3D multi-material problems with the contrast of the physical coefficient up to 108 and more than half a billion degrees of freedom. For the scalability analysis, we have exploited a highly scalable advanced inter-level overlapped implementation of the preconditioner that deals very efficiently with the coarse problem computation. The proposed preconditioner is compared against a state-of-the-art implementation of an adaptive BDDC method in PETSc for thermal and mechanical multi-material problems.Peer ReviewedPostprint (author's final draft

    Physics-based balancing domain decomposition by constraints for multi-material problems

    Get PDF
    In this work, we present a new variant of the balancing domain decomposition by constraints preconditioner that is robust for multi-material problems. We start with a well-balanced subdomain partition, and based on an aggregation of elements according to their physical coefficients, we end up with a finer physics-based (PB) subdomain partition. Next, we define corners, edges, and faces for this PB partition, and select some of them to enforce subdomain continuity (primal faces/edges/corners). When the physical coefficient in each PB subdomain is constant and the set of selected primal faces/edges/corners satisfy a mild condition on the existence of acceptable paths, we can show both theoretically and numerically that the condition number does not depend on the contrast of the coefficient across subdomains. An extensive set of numerical experiments for 2D and 3D for the Poisson and linear elasticity problems is provided to support our findings. In particular, we show robustness and weak scalability of the new preconditioner variant up to 8232 cores when applied to 3D multi-material problems with the contrast of the physical coefficient up to 108 and more than half a billion degrees of freedom. For the scalability analysis, we have exploited a highly scalable advanced inter-level overlapped implementation of the preconditioner that deals very efficiently with the coarse problem computation. The proposed preconditioner is compared against a state-of-the-art implementation of an adaptive BDDC method in PETSc for thermal and mechanical multi-material problems

    NASA Langley Research Center's distributed mass storage system

    Get PDF
    There is a trend in institutions with high performance computing and data management requirements to explore mass storage systems with peripherals directly attached to a high speed network. The Distributed Mass Storage System (DMSS) Project at NASA LaRC is building such a system and expects to put it into production use by the end of 1993. This paper presents the design of the DMSS, some experiences in its development and use, and a performance analysis of its capabilities. The special features of this system are: (1) workstation class file servers running UniTree software; (2) third party I/O; (3) HIPPI network; (4) HIPPI/IPI3 disk array systems; (5) Storage Technology Corporation (STK) ACS 4400 automatic cartridge system; (6) CRAY Research Incorporated (CRI) CRAY Y-MP and CRAY-2 clients; (7) file server redundancy provision; and (8) a transition mechanism from the existent mass storage system to the DMSS

    The first ICASE/LARC industry roundtable: Session proceedings

    Get PDF
    The first 'ICASE/LaRC Industry Roundtable' was held on October 3-4, 1994, in Williamsburg, Virginia. The main purpose of the roundtable was to draw attention of ICASE/LaRC scientists to industrial research agendas. The roundtable was attended by about 200 scientists, 30% from NASA Langley; 20% from universities; 17% NASA Langley contractors (including ICASE personnel); and the remainder from federal agencies other than NASA Langley. The technical areas covered reflected the major research programs in ICASE and closely associated NASA branches. About 80% of the speakers were from industry. This report is a compilation of the session summaries prepared by the session chairmen
    corecore