3 research outputs found
Techniques d'exploration architecturale de design à usage spécifique pour l'accélération de boucles
RÉSUMÉ De nos jours, les industriels privilégient les architectures flexibles afin de réduire le temps et les coûts de conception d’un système. Les processeurs à usage spécifique (ASIP) fournissent beaucoup de flexibilité, tout en atteignant des performances élevées. Une tendance qui a de plus en plus de succès dans le processus de conception d’un système sur puce consiste à spécifier le comportement du système en langage évolué tel que le C, SystemC, etc. La spécification est ensuite utilisée durant le partitionement pour déterminer les composantes logicielles et matérielles du système. Avec la maturité des générateurs automatiques de ASIP, les concepteurs peuvent rajouter dans leurs boîtes à outils un nouveau type d’architecture, à savoir les ASIP, en sachant que ces derniers sont conçus à partir d’une spécification décrite en langage évolué. D’un autre côté, dans le monde matériel, et cela depuis très longtemps, les chercheurs ont vu l’avantage de baser le processus de conception sur un langage évolué. Cette recherche a abouti à l’avénement de générateurs automatiques de matériel sur le marché qui sont des outils d’aide à la conception comme CapatultC, Forte’s Cynthetizer, etc. Ainsi, avec tous ces outils basés sur le langage C, les concepteurs ont un choix de types de design élargi mais, d’un autre côté, les options de designs possibles explosent, ce qui peut allonger au lieu de réduire le temps de conception. C’est dans ce cadre que notre thèse doctorale s’inscrit, puisqu’elle présente des méthodologies d’exploration architecturale de design à usage spécifique pour l’accélération de boucles afin de réduire le temps de conception, entre autres.
Cette thèse a débuté par l’exploration de designs de ASIP. Les boucles de traitement sont de bonnes candidates à l’accélération, si elles comportent de bonnes possibilités de parallélisme et si ces dernières sont bien exploitées. Le matériel est très efficace à profiter des possibilités de parallélisme au niveau instruction, donc, une méthode de conception a été proposée. Cette dernière extrait le parallélisme d’une boucle afin d’exécuter plus d’opérations concurrentes dans des instructions spécialisées. Notre méthode se base aussi sur l’optimisation des données dans l’architecture du processeur.---------- ABSTRACT Time to market is a very important concern in industry. That is why the industry always looks for new CAD tools that contribute to reducing design time. Application-specific instruction-set processors (ASIPs) provide flexibility and they allow reaching good performance if they are well designed. One trend that gains more and more success is C-based design that uses a high level language such as C, SystemC, etc. The C-based specification is used during the partitionning phase to determine the software and hardware components of the system. Since automatic processor generators are mature now, designers have a new type of tool they can rely on during architecture design. In the hardware world, high level synthesis was and is still a hot research topic. The advances in ESL lead to commercial high-level synthesis tools such as CapatultC, Forte’s Cynthetizer, etc. The designers have more tools in their box but they have more solutions to explore, thus their use can have a reverse effect since the design time can increase instead of being reduced. Our doctoral research tackles this issue by proposing new methodologies for design space exploration of application specific architecture for loop acceleration in order to reduce the design time while reaching some targeted performances.
Our thesis starts with the exploration of ASIP design. We propose a method that targets loop acceleration with highly coupled specialized-instructions executing loop operations. Loops are good candidates for acceleration when the parallelism they offer is well exploited (if they have any parallelization opportunities). Hardware components such as specialized-instructions can leverage parallelization opportunities at low level. Thus, we propose to extract loop parallelization opportunities and to execute more concurrent operations in specialized-instructions. The main contribution of this method is a new approach to specialized-instruction (SI) design based on loop acceleration where loop optimization and transformation are done in SIs directly, instead of optimizing the software code. Another contribution is the design of tightly-coupled specialized-instructions associated with loops based on a 5-pattern representation
Optimizing whole programs for code size
Reducing code size has benefits at every scale. It can help fit embedded software into strictly limited storage space, reduce mobile app download time, and improve the cache usage of supercomputer software. There are many optimizations available that reduce code size, but research has often neglected this goal in favor of speed, and some recently developed compiler techniques have not yet been applied for size reduction.
My work shows that newly practical compiler techniques can be used to develop novel code size optimizations. These optimizations complement each other, and other existing methods, in minimizing code size. I introduce two new optimizations, Guided Linking and Semantic Outlining, and also present a comparison framework for code size reduction methods that explains how and when my new optimizations work well with other, existing optimizations.
Guided Linking builds on recent work that optimizes multiple programs and shared libraries together. It links an arbitrary set of programs and libraries into a single module. The module can then be optimized with arbitrary existing link-time optimizations, without changes to the optimization code, allowing them to work across program and library boundaries; for example, a library function can be inlined into a plugin module. I also demonstrate that deduplicating functions in the merged module can significantly reduce code size in some cases. Guided Linking ensures that all necessary dynamic linker behavior, such as plugin loading, still works correctly; it relies on developer-provided constraints to indicate which behavior must be preserved. Guided Linking can achieve a 13% to 57% size reduction in some scenarios, and can speed up the Python interpreter by 9%.
Semantic Outlining relies on the use of automated theorem provers to check semantic equivalence of pieces of code, which has only recently become feasible to perform at scale. It extends outlining, an established technique for deduplicating structurally equivalent pieces of code, to work on code pieces that are semantically equivalent even if their structure is completely different.
My comparison framework covers a large number of different code size reduction methods from the literature, in addition to my new methods. It describes several different aspects by which each method can be compared; in particular, there are multiple types of redundancy in program code that can be exploited to reduce code size, and methods that exploit different types of redundancy are likely to work well in combination with each other. This explains why Guided Linking and Semantic Outlining can be effective when used together, along with some kinds of existing optimizations