6 research outputs found

    Limits of a decoupled out-of-order superscalar architecture

    Get PDF

    Exploiting cache locality at run-time

    Get PDF
    With the increasing gap between the speeds of the processor and memory system, memory access has become a major performance bottleneck in modern computer systems. Recently, Symmetric Multi-Processor (SMP) systems have emerged as a major class of high-performance platforms. Improving the memory performance of Parallel applications with dynamic memory-access patterns on Symmetric Multi-Processors (SMP) is a hard problem. The solution to this problem is critical to the successful use of the SMP systems because dynamic memory-access patterns occur in many real-world applications. This dissertation is aimed at solving this problem.;Based on a rigorous analysis of cache-locality optimization, we propose a memory-layout oriented run-time technique to exploit the cache locality of parallel loops. Our technique have been implemented in a run-time system. Using simulation and measurement, we have shown our run-time approach can achieve comparable performance with compiler optimizations for those regular applications, whose load balance and cache locality can be well optimized by tiling and other program transformations. However, our approach was shown to improve significantly the memory performance for applications with dynamic memory-access patterns. Such applications are usually hard to optimize with static compiler optimizations.;Several contributions are made in this dissertation. We present models to characterize the complexity and present a solution framework for optimizing cache locality. We present an effective estimation technique for memory-access patterns to support efficient locality optimizations and information integration. We present a memory-layout oriented run-time technique for locality optimization. We present efficient scheduling algorithms to trade off locality and load imbalance. We provide a detailed performance evaluation of the run-time technique

    Infrastructures et stratégies de compilation pour parallélisme à grain fin

    Get PDF
    The increasing complexity of processors has led to the development of a large number of code transformations to adapt computations to the hardware architecture. The major difficulty faced by a compiler is to determine the sequence of transformations that will provide the best performance. This sequence depends on the application and the processor considered. The deep interaction between the various code transformations does not allow to find a static solution.We propose an iterative approach to compilation to solve this problem: each optimization module can revisit the decisions made by another module. These modules can communicate information about the properties of the code they have produced. This approach requires a complete redesign of the structure of current compilers.The realization was only made possible thanks to the software infrastructures that we developed: Salto and SEA. Thanks to these environments, we were able to quickly develop prototypes of compilation strategies.We also show that analysis and optimization should not be limited to the local behavior of a code fragment. On the contrary, the global behavior of the application must be considered, especially for embedded systems.La complexité croissante des processeurs a conduit au développement d'un grand nombre de transformations de code pour adapter l'organisation des calculs à l'architecture matérielle. La difficulté majeure à laquelle est confronté un compilateur consiste à déterminer la séquence de transformations qui va fournir la meilleure performance. Cette séquence dépend de l'application et du processeur considérés. L'interaction profonde entre les diverses transformations de code ne permet pas de trouver une solution statique.Nous proposons une approche itérative de la compilation pour résoudre ce problème : chaque module d'optimisation peut remettre en cause les décisions prises par un autre module. Ces modules peuvent se communiquer des informations sur les propriétés du code qu'ils ont produit. Cette approche nécessite une refonte complète de la structure des compilateurs actuels.La réalisation n'a été rendue possible que grâce aux infrastructures logicielles que nous avons développées : Salto et SEA. Grâce à ces environnements nous avons pu développer rapidement des prototypes de stratégies de compilation.Nous montrons aussi que l'analyse et l'optimisation ne doivent pas se contenter d'un comportement local à un fragment de code. Au contraire, le comportement global de l'application doit être considéré, en particulier pour les systèmes enfouis
    corecore