5 research outputs found

    Survey on Combinatorial Register Allocation and Instruction Scheduling

    Full text link
    Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a compiler. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can deliver optimal solutions according to a model, can precisely capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time. This paper provides an exhaustive literature review and a classification of combinatorial optimization approaches to register allocation and instruction scheduling, with a focus on the techniques that are most applied in this context: integer programming, constraint programming, partitioned Boolean quadratic programming, and enumeration. Researchers in compilers and combinatorial optimization can benefit from identifying developments, trends, and challenges in the area; compiler practitioners may discern opportunities and grasp the potential benefit of applying combinatorial optimization

    Fast and Clean: Auditable high-performance assembly via constraint solving

    Get PDF
    Handwritten assembly is a widely used tool in the development of high-performance cryptography: By providing full control over instruction selection, instruction scheduling, and register allocation, highest performance can be unlocked. On the flip side, developing handwritten assembly is not only time-consuming, but the artifacts produced also tend to be difficult to review and maintain – threatening their suitability for use in practice. In this work, we present SLOTHY (Super (Lazy) Optimization of Tricky Handwritten assemblY), a framework for the automated superoptimization of assembly with respect to instruction scheduling, register allocation, and loop optimization (software pipelining): With SLOTHY, the developer controls and focuses on algorithm and instruction selection, providing a readable “base” implementation in assembly, while SLOTHY automatically finds optimal and traceable instruction scheduling and register allocation strategies with respect to a model of the target (micro)architecture. We demonstrate the flexibility of SLOTHY by instantiating it with models of the Cortex-M55, Cortex-M85, Cortex-A55 and Cortex-A72 microarchitectures, implementing the Armv8.1-M+Helium and AArch64+Neon architectures. We use the resulting tools to optimize three workloads: First, for Cortex-M55 and Cortex-M85, a radix-4 complex Fast Fourier Transform (FFT) in fixed-point and floating-point arithmetic, fundamental in Digital Signal Processing. Second, on Cortex-M55, Cortex-M85, Cortex-A55 and Cortex-A72, the instances of the Number Theoretic Transform (NTT) underlying CRYSTALS-Kyber and CRYSTALS-Dilithium, two recently announced winners of the NIST Post-Quantum Cryptography standardization project. Third, for Cortex-A55, the scalar multiplication for the elliptic curve key exchange X25519. The SLOTHY-optimized code matches or beats the performance of prior art in all cases, while maintaining compactness and readability

    A Framework for Resource-Constrained Rate-Optimal Software Pipelining

    No full text
    The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs on the given architecture (with a fixed number of processor resources) at the maximum possible iteration rate (`a la rate-optimal) while minimizing the number of buffers --- a close approximation to minimizing the number of registers. The main contributions of this paper are: ffl First, we demonstrate that such problem can be described by a simple mathematical formulation with precise optimization objectives under a periodic linear scheduling framework. The mathematical formulation provides a clear picture which permits one to visualize the overall solution space (for rate-optimal schedules) under different sets of constraints. ffl Secondly, we show that a precise mathematical formulation..

    Programmation linéaire en nombres entiers pour l'ordonnancement cyclique sous contraintes de ressources

    Get PDF
    Un problème d'ordonnancement cyclique consiste à ordonner dans le temps l'exécution répétitive d'un ensemble d'opérations liées par des contraintes de précédence, en utilisant un nombre limité de ressources. Ces problèmes ont des applications immédiates dans les systèmes de production ou en informatique parallèle. Particulièrement, ils permettent de modéliser l'ensemble des contraintes de précédence et de ressource à prendre en compte pour l'ordonnancement d'instructions dans les processeurs de type VLIW (Very Long Instruction Word). Dans ce cas, une opération représente une instance d'une instruction dans un programme. L'ordonnancement d'instructions de boucles internes est connu sous le nom de pipeline logiciel. Le pipeline logiciel désigne une méthode efficace pour l'optimisation de boucles qui permet la réalisation en parallèle des opérations des différentes itérations de la boucle. Dans cette thèse, nous nous intéressons principalement au problème d'ordonnancement périodique qui est un cas particulier de l'ordonnancement cyclique et qui est également la base du pipeline logiciel. Le terme ordonnancement modulo désigne un ordonnancement périodique tel que l'allocation de ressources pour une opération donnée n'est pas modifiée d'une itération sur l'autre. Pour résoudre le problème, nous nous intéressons aux formulations de programmation linéaire en nombres entiers, et notamment à la résolution du problème par des techniques de séparation, évaluation, génération de colonnes, relaxation lagrangienne et des méthodes hybrides. En particulier, nous proposons des nouvelles formulations basées sur des variables binaires représentant l'exécution d'ensembles d'instructions en parallèle. Enfin, les méthodes développées ont été validées sur des jeux d'instances industrielles pour des processeurs de type VLIW.The resource-constrained modulo scheduling problem (RCMSP) is a general periodic cyclic scheduling problem, abstracted from the problem solved by compilers when optimizing inner loops at instruction level for very long instruction word parallel processors. Since solving the instruction scheduling problem at compilation phase in less time critical than for real time scheduling, integer linear programming (ILP) is a relevant technique for the RCMSP. In this work, we are interested in the methods based on the integer linear programming for the RCMSP. At first, we present a study of the two classic integer linear formulation for the RCMSP. A theoretical evidence of the equivalence between the classic formulations is shown in terms of linear programming (LP) relaxation. Secondly, based on the ILP formulations for the RCMSP, stronger formulations for the RCMSP derived from Dantzig-Wolfe decomposition are presented. In these formulations, the number of variables can be huge, for this reason, we proposed a column generation scheme to solve their LP relaxations. We propose also the heuristics methods based on the Lagragian relaxation and decomposed software pipelining. The heuristic methods search the transformation of the classic integer linear programming for the RCMSP for the performance improvement in the time for the search of solutions. All formulations are compared experimentally on problem instances generated from real data issued from the STMicroelectronics ST200 VLIW processor family
    corecore