144,070 research outputs found

    Principles for problem aggregation and assignment in medium scale multiprocessors

    Get PDF
    One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior

    An occam Style Communications System for UNIX Networks

    Get PDF
    This document describes the design of a communications system which provides occam style communications primitives under a Unix environment, using TCP/IP protocols, and any number of other protocols deemed suitable as underlying transport layers. The system will integrate with a low overhead scheduler/kernel without incurring significant costs to the execution of processes within the run time environment. A survey of relevant occam and occam3 features and related research is followed by a look at the Unix and TCP/IP facilities which determine our working constraints, and a description of the T9000 transputer's Virtual Channel Processor, which was instrumental in our formulation. Drawing from the information presented here, a design for the communications system is subsequently proposed. Finally, a preliminary investigation of methods for lightweight access control to shared resources in an environment which does not provide support for critical sections, semaphores, or busy waiting, is made. This is presented with relevance to mutual exclusion problems which arise within the proposed design. Future directions for the evolution of this project are discussed in conclusion

    Control versus Data Flow in Parallel Database Machines

    Get PDF
    The execution of a query in a parallel database machine can be controlled in either a control flow way, or in a data flow way. In the former case a single system node controls the entire query execution. In the latter case the processes that execute the query, although possibly running on different nodes of the system, trigger each other. Lately, many database research projects focus on data flow control since it should enhance response times and throughput. The authors study control versus data flow with regard to controlling the execution of database queries. An analytical model is used to compare control and data flow in order to gain insights into the question which mechanism is better under which circumstances. Also, some systems using data flow techniques are described, and the authors investigate to which degree they are really data flow. The results show that for particular types of queries data flow is very attractive, since it reduces the number of control messages and balances these messages over the node

    The "MIND" Scalable PIM Architecture

    Get PDF
    MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture
    corecore