Search CORE

6,066 research outputs found

Instruction replication for clustered microarchitectures

Author: Aleta Ortega Alexandre
Codina Viñas Josep M.
David Kaeli
González Colás Antonio María
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

This work presents a new compilation technique that uses instruction replication in order to reduce the number of communications executed on a clustered microarchitecture. For such architectures, the need to communicate values between clusters can result in a significant performance loss. Inter-cluster communications can be reduced by selectively replicating an appropriate set of instructions. However, instruction replication must be done carefully since it may also degrade performance due to the increased contention it can place on processor resources. The proposed scheme is built on top of a previously proposed state-of-the-art modulo scheduling algorithm that effectively reduces communications. Results show that the number of communications can decrease using replication, which results in significant speed-ups. IPC is increased by 25% on average for a 4-cluster microarchitecture and by as mush as 70% for selected programs.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Scheduling with Communication Delays

Author: J.C. Koenig
R. Giroudeau
Publication venue: 'IntechOpen'
Publication date: 01/12/2007
Field of study

International audiencehanbook on ordonnancemen

IntechOpen

Integration of tools for the Design and Assessment of High-Performance, Highly Reliable Computing Systems (DAHPHRS), phase 1

Author: Baker R.
Frank G.
Gray G.
Scheper C.
Yalamanchili S.
Publication venue
Publication date
Field of study

Systems for Space Defense Initiative (SDI) space applications typically require both high performance and very high reliability. These requirements present the systems engineer evaluating such systems with the extremely difficult problem of conducting performance and reliability trade-offs over large design spaces. A controlled development process supported by appropriate automated tools must be used to assure that the system will meet design objectives. This report describes an investigation of methods, tools, and techniques necessary to support performance and reliability modeling for SDI systems development. Models of the JPL Hypercubes, the Encore Multimax, and the C.S. Draper Lab Fault-Tolerant Parallel Processor (FTPP) parallel-computing architectures using candidate SDI weapons-to-target assignment algorithms as workloads were built and analyzed as a means of identifying the necessary system models, how the models interact, and what experiments and analyses should be performed. As a result of this effort, weaknesses in the existing methods and tools were revealed and capabilities that will be required for both individual tools and an integrated toolset were identified

NASA Technical Reports Server

A communication-ordered task graph allocation algorithm

Author: Evans John
Publication venue: University of Utah
Publication date: 01/01/1992
Field of study

technical reportThe inherently asynchronous nature of the data flow computation model allows the exploitation of maximum parallelism in program execution?? While this computational model holds great promise several problems must be solved in order to achieve a high degree of program performance?? The allocation and scheduling of programs on MIMD distributed memory parallel hardware is necessary for the implementation of e cient parallel systems?? Finding optimal solutions requires that maxi mum parallelism be achieved consistent with resource limits and minimizing communication costs and has been proven to be in the class of NP complete problems?? This paper addresses the problem of static allocation of tasks to distributed memory MIMD systems where simultaneous computation and communication is a factor?? This paper discusses similarities and di erences between several recent heuristic allocation approaches and identi es common problems inherent in these approaches?? This paper presents a new algorithm scheme and heuristics that resolves the identi ed problems and shows signi cant performance bene ts?

The University of Utah: J. Willard Marriott Digital Library

A communication-ordered task graph allocation algorithm

Author: Evans John D.
Kessler Robert R.
Publication venue: University of Utah
Publication date: 01/01/1992
Field of study

technical reportThe inherently asynchronous nature of the data flow computation model allows the exploitation of maximum parallelism in program execution. While this computational model holds great promise, several problems must be solved in order to achieve a high degree of program performance. The allocation and scheduling of programs on MIMD distributed memory parallel hardware, is necessary for the implementation of efficient parallel systems. Finding optimal solutions requires that maximum parallelism be achieved consistent with resource limits and minimizing communication costs, and has been proven to be in the class of NP-complete problems. This paper addresses the problem of static allocation of tasks to distributed memory MIMD systems where simultaneous computation and communication is a factor. This paper discusses similarities and differences between several recent heuristic allocation approaches and identifies common problems inherent in these approaches. This paper presents a new algorithm scheme and heuristics that resolves the identified problems and shows significant performance benefits

The University of Utah: J. Willard Marriott Digital Library