452 research outputs found

    Mapping unstructured grid problems to the connection machine

    Get PDF
    We present a highly parallel graph mapping technique that enables one to solve unstructured grid problems on massively parallel computers. Many implicit and explicit methods for solving discretizated partial differential equations require each point in the discretization to exchange data with its neighboring points every time step or iteration. The time spent communicating can limit the high performance promised by massively parallel computing. To eliminate this bottleneck, we map the graph of the irregular problem to the graph representing the interconnection topology of the computer such that the sum of the distances that the messages travel is minimized. We show that, in comparison to a naive assignment of processors, our heuristic mapping algorithm significantly reduces the communication time on the Connection Machine, CM-2

    MPF: A portable message passing facility for shared memory multiprocessors

    Get PDF
    The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations

    Computational methods and software systems for dynamics and control of large space structures

    Get PDF
    Two key areas of crucial importance to the computer-based simulation of large space structures are discussed. The first area involves multibody dynamics (MBD) of flexible space structures, with applications directed to deployment, construction, and maneuvering. The second area deals with advanced software systems, with emphasis on parallel processing. The latest research thrust in the second area involves massively parallel computers

    Instrumentation, performance visualization, and debugging tools for multiprocessors

    Get PDF
    The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessor architectures. However, without effective means to monitor (and visualize) program execution, debugging, and tuning parallel programs becomes intractably difficult as program complexity increases with the number of processors. Research on performance evaluation tools for multiprocessors is being carried out at ARC. Besides investigating new techniques for instrumenting, monitoring, and presenting the state of parallel program execution in a coherent and user-friendly manner, prototypes of software tools are being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Our current tool set, the Ames Instrumentation Systems (AIMS), incorporates features from various software systems developed in academia and industry. The execution of FORTRAN programs on the Intel iPSC/860 can be automatically instrumented and monitored. Performance data collected in this manner can be displayed graphically on workstations supporting X-Windows. We have successfully compared various parallel algorithms for computational fluid dynamics (CFD) applications in collaboration with scientists from the Numerical Aerodynamic Simulation Systems Division. By performing these comparisons, we show that performance monitors and debuggers such as AIMS are practical and can illuminate the complex dynamics that occur within parallel programs

    A massively-parallel Navier-Stokes implementation

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/76255/1/AIAA-1989-1940-323.pd

    Optimal processor assignment for pipeline computations

    Get PDF
    The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered

    An Evolutionary Approach to Load Balancing Parallel Computations

    Get PDF
    We present a new approach to balancing the workload in a multicomputer when the problem is decomposed into subproblems mapped to the processors. It is based on a hybrid genetic algorithm. A number of design choices for genetic algorithms are combined in order to ameliorate the problem of premature convergence that is often encountered in the implementation of classical genetic algorithms. The algorithm is hybridized by including a hill climbing procedure which significantly improves the efficiency of the evolution. Moreover, it makes use of problem specific information to evade some computational costs and to reinforce favorable aspects of the genetic search at some appropriate points. The experimental results show that the hybrid genetic algorithm can find solutions within 3% of the optimum in a reasonable time. They also suggest that this approach is not biased towards particular problem structures

    Adaptive Parallel Iterative Deepening Search

    Full text link
    Many of the artificial intelligence techniques developed to date rely on heuristic search through large spaces. Unfortunately, the size of these spaces and the corresponding computational effort reduce the applicability of otherwise novel and effective algorithms. A number of parallel and distributed approaches to search have considerably improved the performance of the search process. Our goal is to develop an architecture that automatically selects parallel search strategies for optimal performance on a variety of search problems. In this paper we describe one such architecture realized in the Eureka system, which combines the benefits of many different approaches to parallel heuristic search. Through empirical and theoretical analyses we observe that features of the problem space directly affect the choice of optimal parallel search strategy. We then employ machine learning techniques to select the optimal parallel search strategy for a given problem space. When a new search task is input to the system, Eureka uses features describing the search space and the chosen architecture to automatically select the appropriate search strategy. Eureka has been tested on a MIMD parallel processor, a distributed network of workstations, and a single workstation using multithreading. Results generated from fifteen puzzle problems, robot arm motion problems, artificial search spaces, and planning problems indicate that Eureka outperforms any of the tested strategies used exclusively for all problem instances and is able to greatly reduce the search time for these applications
    • …
    corecore