579 research outputs found
Extending Static Synchronization Beyond SIMD and VLIW
A key advantage of SIMD (Single Instruction stream, Multiple Data stream) architectures is that synchronization is effected statically at compile-time, hence the execution-time cost of synchronization between âprocessesâ is essentially zero. VLIW (Very Long Instruction Word) machines are successful in large part because they preserve this property while providing more flexibility in terms of what kinds of operations can be parallelized. In this paper, we propose a new kind of architecture â- the âstatic barrier MIMDâ or SBM â which can be viewed as a further generalization of the parallel execution abilities of static synchronization machines. Barrier MIMDs are asynchronous Multiple Instruction stream Multiple Data stream architectures capable of parallel execution of loops, subprogram calls, and variable execution- time instructions; however, little or no run-time synchronization is needed. When a group of processors within a barrier MIMD has just encountered a barrier, any conceptual synchronizations between the processors are statically accomplished with zero cost â as in a SIMD or VLIW and using similar compiler technology. Unlike these machines, however, as execution continues the relative timing of processors may become less precisely knowable as a static, compile-time, quantity. Where this imprecision becomes too large, the compiler simply inserts a synchronization barrier to insure that timing imprecision at that point is zero, and again employs purely static, implicit, synchronization. Both the architecture and the supporting compiler technology are discussed in detail
Recommended from our members
Percolation scheduling for non-VLIW machines
Percolation Scheduling, a technique for compile-time code parallelization, has proven very successful for exploiting fine-grain irregular parallelism in ordinary programs. Currently, this technology is targeted only to VLIW (Very Long Instruction Word) machines, which have the advantages of 'free' synchronization and communication. Shared memory multi-processors can simulate the execution characteristics of VLIW machines with the use of static barriers. Preliminary results show that Percolation Scheduling can be used with good results on this type of architecture by increasing the granularity from operation level to source statement level, removing any redundant synchronization, and providing an efficient implementation of multi-way jumps
Static Scheduling for Barrier MIMD Architectures
Barrier MIMDs are asynchronous Multiple Instruction stream Multiple Data stream architectures capable of parallel execution of variable-execution-time instructions and arbitrary control flow (e.g., w h ile loops and calls); however, they differ from conventional MIMDs in that the need for run-time synchronization is significantly reduced. Whenever a group of processors within a barrier MIMD encounters a synchronization point (barrier), static timing constraints become precise, hence, conceptual synchronizations between the processors often can be statically resolved with zero cost â as in a SIMD or VLIW and using similar compiler technology. Unlike these machines, however, as execution continues past the synchronization point the accuracy within which the compiler can track the relative timing between processors is reduced. Where this imprecision becomes too large, the compiler simply inserts a synchronization barrier to insure that timing imprecision at that point is zero, and again employs static, implicit synchronization. This paper describes new scheduling and barrier placement algorithms for barrier MIMDs that are based loosely on the list scheduling approach employed for VLIWs [Elli85]. In addition, the experimental results from scheduling more than 3500 synthetic benchmark programs for a parameterized barrier MIMD machine are presented
Hardware Barrier Synchronization: Static Barrier MIMD (SBM)
In this paper, we give the design, and performance analysis, of a new, highly efficient, synchronization mechanism called âStatic Barrier MIMDâ or âSBM.â Unlike traditional barrier synchronization, the proposed barriers are designed to facilitate the use of static (compile-time) code scheduling for eliminating some synchronizations. For this reason, our barrier hardware is more general than most hardware barrier mechanisms, allowing any subset of the processors to participate in each barrier. Since code scheduling typically operates on fine-grain parallelism, it is also vital that barriers be able to execute in a small number of clock ticks. The SBM is actually only one of two new classes of barrier machines proposed to facilitate static code scheduling; the other architecture is the âDynamic Barrier MIMD,â or âDBM,â which is described in a companion paper1. The DBM differs from the SBM in that the DBM employs more complex hardware to make the system less dependent on the precision of the static analysis and code scheduling; for example, an SBM cannot efficiently manage simultaneous execution of independent parallel programs, whereas a DBM can
Recommended from our members
Fine-grain loop scheduling for MIMD machines
Previous algorithms for parallelizing loops on MIMD machines have been based on assigning one or more loop iterations to each processor, introducing synchronization as required. These methods exploit only iteration level parallelism, and ignore the parallelism that may exist at a lower level.In order to exploit parallelism both within and across iterations, our algorithm analyzes and schedules the loop at the statement level. The loop schedule reflects the expected communication and synchronization costs of the target machine. We provide test results that show that this algorithm can produce good speedup of loops on an MIMD machine
Compiler Techniques for Loosely-Coupled Multi-Cluster Architectures
No abstract available
- âŠ