6 research outputs found

    Cache coherence requirements for interprocess rendezvous

    Full text link
    Multiprocessors in which a shared bus is used by the processor to communicate with common memory are an emerging class of machines where there is a need to support parallel programming languages. A language construct that is found in a number of parallel programming languages to support synchronization and communication in the interprocess rendezvous. Shared-bus multiprocessor require a protocol to keep the date in their caches coherent. There are two major categories of these protocols: invalidation and write-boadcast. This paper examines the requirements for cache coherence protocols to support efficient interprocessor rendezvous. The approach taken is to examine the memory referencing patterns to the run-time data structures during rendezvous execution. The appropriate coherence protocol is shown to be a function of the processor scheduling strategy used by the run-time system at synchronzation points during the rendezvous. When processes migrate freely as a result of the scheduling strategy, invalidation protocols are found to be more efficient. When migration is restricted by the scheduler, write-broadcast protocols are more efficient.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/44571/1/10766_2005_Article_BF01407863.pd

    SIMD based multicore processor for image and video processing

    Get PDF
    制度:新 ; 報告番号:甲3602号 ; 学位の種類:博士(工学) ; 授与年月日:2012/3/15 ; 早大学位記番号:新595

    Simulation models of shared-memory multiprocessor systems

    Get PDF

    Combinatorial Design and Analysis of Optimal Multiple Bus Systems for Parallel Algorithms.

    Get PDF
    This dissertation develops a formal and systematic methodology for designing optimal, synchronous multiple bus systems (MBSs) realizing given (classes of) parallel algorithms. Our approach utilizes graph and group theoretic concepts to develop the necessary model and procedural tools. By partitioning the vertex set of the graphical representation CFG of the algorithm, we extract a set of interconnection functions that represents the interprocessor communication requirement of the algorithm. We prove that the optimal partitioning problem is NP-Hard. However, we show how to obtain polynomial time solutions by exploiting certain regularities present in many well-behaved parallel algorithms. The extracted set of interconnection functions is represented by an edge colored, directed graph called interconnection function graph (IFG). We show that the problem of constructing an optimal MBS to realize an IFG is NP-Hard. We show important special cases where polynomial time solutions exist. In particular, we prove that polynomial time solutions exist when the IFG is vertex symmetric. This is the case of interest for the vast majority of important interconnection function sets, whether extracted from algorithms or correspond to existing interconnection networks. We show that an IFG is vertex symmetric if and only if it is the Cayley color graph of a finite group Γ\Gamma and its generating set Δ.\Delta. Using this property, we present a particular scheme to construct a symmetric MBS M(Γ,Δ)MBS\ M(\Gamma,\Delta) with minimum number of buses as well as minimum number of interfaces realizing a vertex symmetric IFG. We demonstrate several advantages of the optimal MBS M(Γ,Δ)MBS\ M(\Gamma,\Delta) in terms of its symmetry, number of ports per processor, number of neighbors per processor, and the diameter. We also investigate the fault tolerant capabilities and performance degradation of M(Γ,Δ)M(\Gamma,\Delta) in the case of a single bus failure, single driver failure, single receiver failure, and single processor failure. Further, we address the problem of designing an optimal MBS realizing a class of algorithms when the number of buses and/or processors in the target MBS are specified. The optimality criteria are maximizing the speed and minimizing the number of interfaces

    Mixed integer programming on transputers

    Get PDF
    Mixed Integer Programming (MIP) problems occur in many industries and their practical solution can be challenging in terms of both time and effort. Although faster computer hardware has allowed the solution of more MIP problems in reasonable times, there will come a point when the hardware cannot be speeded up any more. One way of improving the solution times of MIP problems without further speeding up the hardware is to improve the effectiveness of the solution algorithm used. The advent of accessible parallel processing technology and techniques provides the opportunity to exploit any parallelism within MIP solving algorithms in order to accelerate the solution of MIP problems. Many of the MIP problem solving algorithms in the literature contain a degree of exploitable parallelism. Several algorithms were considered as candidates for parallelisation within the constraints imposed by the currently available parallel hardware and techniques. A parallel Branch and Bound algorithm was designed for and implemented on an array of transputers hosted by a PC. The parallel algorithm was designed to operate as a process farm, with a master passing work to various slave processors. A message-passing harness was developed to allow full control of the slaves and the work sent to them. The effects of using various node selection techniques were studied and a default node selection strategy decided upon for the parallel algorithm. The parallel algorithm was also designed to take full advantage of the structure of MIP problems formulated using global entities such as general integers and special ordered sets. The presence of parallel processors makes practicable the idea of performing more than two branches on an unsatisfied global entity. Experiments were carried out using multiway branching strategies and a default branching strategy decided upon for appropriate types of MIP problem
    corecore