1,762 research outputs found

    Access to vectors in multi-module memories

    Get PDF
    The poor bandwidth obtained from memory when conflicts arise in the modules or in the interconnection network degrades the performance of computers. Address transformation schemes, such as interleaving, skewing and linear transformations, have been proposed to achieve conflict-free access for streams with constant stride. However, this is achieved only for some strides. In this paper, we summarize a mechanism to request the elements in an out-of-order way which allows to achieve conflict-free access for a larger number of strides. We study the cases of a single vector processor and of a vector multiprocessor system. For this latter case, we propose a synchronous mode of accessing memory that can be applied in SIMD machines or in MIMD systems with decoupled access and execution.Peer ReviewedPostprint (published version

    Access to streams in multiprocessor systems

    Get PDF
    When accessing streams in vector multiprocessor machines, degradation in the interconnection network and conflicts in the memory modules are the factors that reduce the efficiency of the system. In this paper, we present a synchronous access mechanism that allows conflict-free access to streams in a SIMD vector multiprocessor system. Each processor accesses the corresponding elements out of order, in such a way that in each cycle the requested elements do not collide in the interconnection network. Moreover, memory modules are accessed so that conflicts are avoided. The use of the proposed mechanism in present-day architectures would allow conflict-free access to streams with the most common strides that appear in real applications. The additional hardware is described and is shown to be of a similar complexity as that required for access in order.Postprint (published version

    The force on the flex: Global parallelism and portability

    Get PDF
    A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment

    Multiprocessor/Multicomputer Systems and Optimal Loading Techniques

    Get PDF
    This report reviews the subject of multiprocessor/multicomputer systems and optimal loading techniques. This report covers: 1. The interrelationship of Multiprocessor/Multicomputer (Multiple Instruction Stream Multiple Data Stream, MIMD) systems and other architectures by presenting a categorization of computer architectures. 2. Comparison of Multiprocessor/Multicomputer (MIMD), versus Parallel Processor (Single Instruction stream Multiple Data stream, SIMD) systems. 3. Multiprocessor/Multicomputer problems, pitfalls and new goals. 4. Investigation of loading techniques by reviewing particular MIMD executive designs

    Context flow architecture

    Get PDF

    A multiprocessor system using a switch matrix configuration

    Get PDF
    This thesis describes a class of interconnection networks based on the use of a switch matrix to provide processor to memory communication. This switch allows a direct link between any processor to any memory module. The cost and performance of this network are analytically examined. The results are compared with those of a multiprocessor system using a time-shared bus configuration and it is shown that for the two extreme cases of maximum and minimum throughput, the two approaches are equivalent from a performance point of view. However, in the general case, even with a higher cost, the switch matrix provides a much better performance than the time-shared bus configuration. Furthermore, the architecture of a multiprocessor MIMD type computer using a switch matrix is investigated and Petri net techniques are used to model process coordination among processors --Abstract, page ii

    Fast, Accurate and Detailed NoC Simulations

    Get PDF
    Network-on-Chip (NoC) architectures have a wide variety of parameters that can be adapted to the designer's requirements. Fast exploration of this parameter space is only possible at a high-level and several methods have been proposed. Cycle and bit accurate simulation is necessary when the actual router's RTL description needs to be evaluated and verified. However, extensive simulation of the NoC architecture with cycle and bit accuracy is prohibitively time consuming. In this paper we describe a simulation method to simulate large parallel homogeneous and heterogeneous network-on-chips on a single FPGA. The method is especially suitable for parallel systems where lengthy cycle and bit accurate simulations are required. As a case study, we use a NoC that was modelled and simulated in SystemC. We simulate the same NoC on the described FPGA simulator. This enables us to observe the NoC behavior under a large variety of traffic patterns. Compared with the SystemC simulation we achieved a speed-up of 80-300, without compromising the cycle and bit level accuracy

    Optimistic Parallel State-Machine Replication

    Full text link
    State-machine replication, a fundamental approach to fault tolerance, requires replicas to execute commands deterministically, which usually results in sequential execution of commands. Sequential execution limits performance and underuses servers, which are increasingly parallel (i.e., multicore). To narrow the gap between state-machine replication requirements and the characteristics of modern servers, researchers have recently come up with alternative execution models. This paper surveys existing approaches to parallel state-machine replication and proposes a novel optimistic protocol that inherits the scalable features of previous techniques. Using a replicated B+-tree service, we demonstrate in the paper that our protocol outperforms the most efficient techniques by a factor of 2.4 times

    Architecture independent environment for developing engineering software on MIMD computers

    Get PDF
    Engineers are constantly faced with solving problems of increasing complexity and detail. Multiple Instruction stream Multiple Data stream (MIMD) computers have been developed to overcome the performance limitations of serial computers. The hardware architectures of MIMD computers vary considerably and are much more sophisticated than serial computers. Developing large scale software for a variety of MIMD computers is difficult and expensive. There is a need to provide tools that facilitate programming these machines. First, the issues that must be considered to develop those tools are examined. The two main areas of concern were architecture independence and data management. Architecture independent software facilitates software portability and improves the longevity and utility of the software product. It provides some form of insurance for the investment of time and effort that goes into developing the software. The management of data is a crucial aspect of solving large engineering problems. It must be considered in light of the new hardware organizations that are available. Second, the functional design and implementation of a software environment that facilitates developing architecture independent software for large engineering applications are described. The topics of discussion include: a description of the model that supports the development of architecture independent software; identifying and exploiting concurrency within the application program; data coherence; engineering data base and memory management
    corecore