Search CORE

6 research outputs found

Limits of a decoupled out-of-order superscalar architecture

Author: Jones Graham P.
Publication venue: The University of Edinburgh
Publication date: 01/01/1999
Field of study

Exploiting cache locality at run-time

Author: Yan Yong
Publication venue: W&M ScholarWorks
Publication date: 01/01/1998
Field of study

With the increasing gap between the speeds of the processor and memory system, memory access has become a major performance bottleneck in modern computer systems. Recently, Symmetric Multi-Processor (SMP) systems have emerged as a major class of high-performance platforms. Improving the memory performance of Parallel applications with dynamic memory-access patterns on Symmetric Multi-Processors (SMP) is a hard problem. The solution to this problem is critical to the successful use of the SMP systems because dynamic memory-access patterns occur in many real-world applications. This dissertation is aimed at solving this problem.;Based on a rigorous analysis of cache-locality optimization, we propose a memory-layout oriented run-time technique to exploit the cache locality of parallel loops. Our technique have been implemented in a run-time system. Using simulation and measurement, we have shown our run-time approach can achieve comparable performance with compiler optimizations for those regular applications, whose load balance and cache locality can be well optimized by tiling and other program transformations. However, our approach was shown to improve significantly the memory performance for applications with dynamic memory-access patterns. Such applications are usually hard to optimize with static compiler optimizations.;Several contributions are made in this dissertation. We present models to characterize the complexity and present a solution framework for optimizing cache locality. We present an effective estimation technique for memory-access patterns to support efficient locality optimizations and information integration. We present a memory-layout oriented run-time technique for locality optimization. We present efficient scheduling algorithms to trade off locality and load imbalance. We provide a detailed performance evaluation of the run-time technique

College of William & Mary: W&M Publish

Infrastructures et stratégies de compilation pour parallélisme à grain fin

Author: Rohou Erven
Publication venue: HAL CCSD
Publication date: 17/11/1998
Field of study

The increasing complexity of processors has led to the development of a large number of code transformations to adapt computations to the hardware architecture. The major difficulty faced by a compiler is to determine the sequence of transformations that will provide the best performance. This sequence depends on the application and the processor considered. The deep interaction between the various code transformations does not allow to find a static solution.We propose an iterative approach to compilation to solve this problem: each optimization module can revisit the decisions made by another module. These modules can communicate information about the properties of the code they have produced. This approach requires a complete redesign of the structure of current compilers.The realization was only made possible thanks to the software infrastructures that we developed: Salto and SEA. Thanks to these environments, we were able to quickly develop prototypes of compilation strategies.We also show that analysis and optimization should not be limited to the local behavior of a code fragment. On the contrary, the global behavior of the application must be considered, especially for embedded systems.La complexité croissante des processeurs a conduit au développement d'un grand nombre de transformations de code pour adapter l'organisation des calculs à l'architecture matérielle. La difficulté majeure à laquelle est confronté un compilateur consiste à déterminer la séquence de transformations qui va fournir la meilleure performance. Cette séquence dépend de l'application et du processeur considérés. L'interaction profonde entre les diverses transformations de code ne permet pas de trouver une solution statique.Nous proposons une approche itérative de la compilation pour résoudre ce problème : chaque module d'optimisation peut remettre en cause les décisions prises par un autre module. Ces modules peuvent se communiquer des informations sur les propriétés du code qu'ils ont produit. Cette approche nécessite une refonte complète de la structure des compilateurs actuels.La réalisation n'a été rendue possible que grâce aux infrastructures logicielles que nous avons développées : Salto et SEA. Grâce à ces environnements nous avons pu développer rapidement des prototypes de stratégies de compilation.Nous montrons aussi que l'analyse et l'optimisation ne doivent pas se contenter d'un comportement local à un fragment de code. Au contraire, le comportement global de l'application doit être considéré, en particulier pour les systèmes enfouis

INRIA a CCSD electronic archive server

Changing interaction of compiler and architecture

Author: Adve Sarita V.
Burger Doug
Choudhary Alok N.
Eigenmann Rudolf
Fang Jesse Z.
Gebotys Catherine H.
Kandemir Mahmut T.
Lilja David J.
Rawsthorne Alasdair
Smith Michael D.
Yew Pen Chung
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/1997
Field of study

The University of Manchester - Institutional Repository