50 research outputs found

    Lost in translation: Exposing hidden compiler optimization opportunities

    Get PDF
    Existing iterative compilation and machine-learning-based optimization techniques have been proven very successful in achieving better optimizations than the standard optimization levels of a compiler. However, they were not engineered to support the tuning of a compiler's optimizer as part of the compiler's daily development cycle. In this paper, we first establish the required properties which a technique must exhibit to enable such tuning. We then introduce an enhancement to the classic nightly routine testing of compilers which exhibits all the required properties, and thus, is capable of driving the improvement and tuning of the compiler's common optimizer. This is achieved by leveraging resource usage and compilation information collected while systematically exploiting prefixes of the transformations applied at standard optimization levels. Experimental evaluation using the LLVM v6.0.1 compiler demonstrated that the new approach was able to reveal hidden cross-architecture and architecture-dependent potential optimizations on two popular processors: the Intel i5-6300U and the Arm Cortex-A53-based Broadcom BCM2837 used in the Raspberry Pi 3B+. As a case study, we demonstrate how the insights from our approach enabled us to identify and remove a significant shortcoming of the CFG simplification pass of the LLVM v6.0.1 compiler.Comment: 31 pages, 7 figures, 2 table. arXiv admin note: text overlap with arXiv:1802.0984

    OpenISA, um conjunto de instruções híbrido

    Get PDF
    Orientador: Edson BorinTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: OpenISA é concebido como a interface de processadores que pretendem ser altamente flexíveis. Isto é conseguido por meio de três estratégias: em primeiro lugar, o ISA é empiricamente escolhido para ser facilmente traduzido para outros, possibilitando flexibilidade do software no caso de um processador OpenISA físico não estar disponível. Neste caso, não há nenhuma necessidade de aplicar um processador virtual OpenISA em software. O ISA está preparado para ser estaticamente traduzido para outros ISAs. Segundo, o ISA não é um ISA concreto nem um ISA virtual, mas um híbrido com a capacidade de admitir modificações nos opcodes sem afetar a compatibilidade retroativa. Este mecanismo permite que as futuras versões do ISA possam sofrer modificações em vez de extensões simples das versões anteriores, um problema comum com ISA concretos, como o x86. Em terceiro lugar, a utilização de uma licença permissiva permite o ISA ser usado livremente por qualquer parte interessada no projeto. Nesta tese de doutorado, concentramo-nos nas instruções de nível de usuário do OpenISA. A tese discute (1) alternativas para ISAs, alternativas para distribuição de programas e o impacto de cada opção, (2) características importantes de OpenISA para atingir seus objetivos e (3) fornece uma completa avaliação do ISA escolhido com respeito a emulação de desempenho em duas CPUs populares, uma projetada pela Intel e outra pela ARM. Concluímos que a versão do OpenISA apresentada aqui pode preservar desempenho próximo do nativo quando traduzida para outros hospedeiros, funcionando como um modelo promissor para ISAs flexíveis da próxima geração que podem ser facilmente estendidos preservando a compatibilidade. Ainda, também mostramos como isso pode ser usado como um formato de distribuição de programas no nível de usuárioAbstract: OpenISA is designed as the interface of processors that aim to be highly flexible. This is achieved by means of three strategies: first, the ISA is empirically chosen to be easily translated to others, providing software flexibility in case a physical OpenISA processor is not available. Second, the ISA is not a concrete ISA nor a virtual ISA, but a hybrid one with the capability of admitting modifications to opcodes without impacting backwards compatibility. This mechanism allows future versions of the ISA to have real changes instead of simple extensions of previous versions, a common problem with concrete ISAs such as the x86. Third, the use of a permissive license allows the ISA to be freely used by any party interested in the project. In this PhD. thesis, we focus on the user-level instructions of OpenISA. The thesis discusses (1) ISA alternatives, program distribution alternatives and the impact of each choice, (2) important features of OpenISA to achieve its goals and (3) provides a thorough evaluation of the chosen ISA with respect to emulation performance on two popular host CPUs, one from Intel and another from ARM. We conclude that the version of OpenISA presented here can preserve close-to-native performance when translated to other hosts, working as a promising model for next-generation, flexible ISAs that can be easily extended while preserving backwards compatibility. Furthermore, we show how this can also be a program distribution format at user-levelDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2011/09630-1FAPES

    Emulação de RISC-V com Alto Desempenho

    Get PDF
    Orientador: Edson BorinDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: RISC-V é uma ISA aberta que tem chamado a atenção ao redor do mundo por seu rápido crescimento e adoção. Já é suportado pelo GCC, Clang e Kernel Linux. Além disso, vários emuladores e simuladores para RISC-V surgiram recentemente, mas nenhum deles com desempenho próximo ao nativo. Nesta dissertação, nós investigamos se emuladores mais rápidos para RISC-V podem ser criados. Como a técnica mais comum e também a mais rápida para implementar um emulador, Tradução Dinâmica de Binários (TDB), depende diretamente de boa qualidade de tradução para alcançar bom desempenho, nós investigamos se uma tradução de alta qualidade de binários RISC-V é plausível. Desta forma, neste trabalho nós implementamos e avaliamos um motor de Tradução Estática de Binários (TEB) baseado no LLVM, para investigar se é ou não possível produzir traduções de alta qualidade de RISC-V para x86 e ARM. Nossos resultados experimentais indicam que nosso motor de TEB consegue produzir código de alta qualidade quando traduz binários RISC-V para x86 e ARM, com sobrecargas médias em torno de 1.2x/1.3x quando comparado à código nativo x86/ARM, um resultado melhor do que motores de TDB de RISC-V bem conhecidos, como RV8 e QEMU. Além disso, como motores de TDB tem seu desempenho fortemente relacionado à qualidade de tradução, nosso motor de TEB evidencia a oportunidade na direção da criação de emuladores RISC-V de TDB com desempenho superior aos atuaisAbstract: RISC-V is an open ISA which has been calling the attention worldwide by its fast growth and adoption. It is already supported by GCC, Clang and the Linux Kernel. Moreover, several emulators and simulators for RISC-V have arisen recently, but none of them with near-native performance. In this work, we investigate if faster emulators for RISC-V could be created. As the most common and also the fastest technique to implement an emulator, Dynamic Binary Translation (DBT), depends directly on good translation quality to achieve good performance, we investigate if a high-quality translation of RISC- V binaries is feasible. Thus, in this work we implemented and evaluated a LLVM-based Static Binary Translation (SBT) engine to investigate whether or not it is possible to produce high-quality translations from RISC-V to x86 and ARM. Our experimental results indicate that our SBT engine is able to produce high-quality code when translating RISC- V binaries to x86 and ARM, with average overheads around 1.2x/1.3x when compared to native x86/ARM code, a better result than well-known RISC-V DBT engines such as RV8 and QEMU. Moreover, since DBT engines have its performance strongly related to translation quality, our SBT engine evidences the opportunity towards the creation of RISC-V DBT emulators with higher performance than the current onesMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

    Infrastructures and Compilation Strategies for the Performance of Computing Systems

    Get PDF
    This document presents our main contributions to the field of compilation, and more generally to the quest of performance ofcomputing systems.It is structured by type of execution environment, from static compilation (execution of native code), to JIT compilation, and purelydynamic optimization. We also consider interpreters. In each chapter, we give a focus on the most relevant contributions.Chapter 2 describes our work about static compilation. It covers a long time frame (from PhD work 1995--1998 to recent work on real-timesystems and worst-case execution times at Inria in 2015) and various positions, both in academia and in the industry.My research on JIT compilers started in the mid-2000s at STMicroelectronics, and is still ongoing. Chapter 3 covers the results we obtained on various aspects of JIT compilers: split-compilation, interaction with real-time systems, and obfuscation.Chapter 4 reports on dynamic binary optimization, a research effort started more recently, in 2012. This considers the optimization of a native binary (without source code), while it runs. It incurs significant challenges but also opportunities.Interpreters represent an alternative way to execute code. Instead of native code generation, an interpreter executes an infinite loop thatcontinuously reads a instruction, decodes it and executes its semantics. Interpreters are much easier to develop than compilers,they are also much more portable, often requiring a simple recompilation. The price to pay is the reduced performance. Chapter 5presents some of our work related to interpreters.All this research often required significant software infrastructures for validation, from early prototypes to robust quasi products, andfrom open-source to proprietary. We detail them in Chapter 6.The last chapter concludes and gives some perspectives

    Simulation Native des Systèmes Multiprocesseurs sur Puce à l'aide de la Virtualisation Assistée par le Matériel

    Get PDF
    L'intégration de plusieurs processeurs hétérogènes en un seul système sur puce (SoC) est une tendance claire dans les systèmes embarqués. La conception et la vérification de ces systèmes nécessitent des plateformes rapides de simulation, et faciles à construire. Parmi les approches de simulation de logiciels, la simulation native est un bon candidat grâce à l'exécution native de logiciel embarqué sur la machine hôte, ce qui permet des simulations à haute vitesse, sans nécessiter le développement de simulateurs d'instructions. Toutefois, les techniques de simulation natives existantes exécutent le logiciel de simulation dans l'espace de mémoire partagée entre le matériel modélisé et le système d'exploitation hôte. Il en résulte de nombreux problèmes, par exemple les conflits l'espace d'adressage et les chevauchements de mémoire ainsi que l'utilisation des adresses de la machine hôte plutôt des celles des plates-formes matérielles cibles. Cela rend pratiquement impossible la simulation native du code existant fonctionnant sur la plate-forme cible. Pour surmonter ces problèmes, nous proposons l'ajout d'une couche transparente de traduction de l'espace adressage pour séparer l'espace d'adresse cible de celui du simulateur de hôte. Nous exploitons la technologie de virtualisation assistée par matériel (HAV pour Hardware-Assisted Virtualization) à cet effet. Cette technologie est maintenant disponibles sur plupart de processeurs grande public à usage général. Les expériences montrent que cette solution ne dégrade pas la vitesse de simulation native, tout en gardant la possibilité de réaliser l'évaluation des performances du logiciel simulé. La solution proposée est évolutive et flexible et nous fournit les preuves nécessaires pour appuyer nos revendications avec des solutions de simulation multiprocesseurs et hybrides. Nous abordons également la simulation d'exécutables cross- compilés pour les processeurs VLIW (Very Long Instruction Word) en utilisant une technique de traduction binaire statique (SBT) pour généré le code natif. Ainsi il n'est pas nécessaire de faire de traduction à la volée ou d'interprétation des instructions. Cette approche est intéressante dans les situations où le code source n'est pas disponible ou que la plate-forme cible n'est pas supporté par les compilateurs reciblable, ce qui est généralement le cas pour les processeurs VLIW. Les simulateurs générés s'exécutent au-dessus de notre plate-forme basée sur le HAV et modélisent les processeurs de la série C6x de Texas Instruments (TI). Les résultats de simulation des binaires pour VLIW montrent une accélération de deux ordres de grandeur par rapport aux simulateurs précis au cycle près.Integration of multiple heterogeneous processors into a single System-on-Chip (SoC) is a clear trend in embedded systems. Designing and verifying these systems require high-speed and easy-to-build simulation platforms. Among the software simulation approaches, native simulation is a good candidate since the embedded software is executed natively on the host machine, resulting in high speed simulations and without requiring instruction set simulator development effort. However, existing native simulation techniques execute the simulated software in memory space shared between the modeled hardware and the host operating system. This results in many problems, including address space conflicts and overlaps as well as the use of host machine addresses instead of the target hardware platform ones. This makes it practically impossible to natively simulate legacy code running on the target platform. To overcome these issues, we propose the addition of a transparent address space translation layer to separate the target address space from that of the host simulator. We exploit the Hardware-Assisted Virtualization (HAV) technology for this purpose, which is now readily available on almost all general purpose processors. Experiments show that this solution does not degrade the native simulation speed, while keeping the ability to accomplish software performance evaluation. The proposed solution is scalable as well as flexible and we provide necessary evidence to support our claims with multiprocessor and hybrid simulation solutions. We also address the simulation of cross-compiled Very Long Instruction Word (VLIW) executables, using a Static Binary Translation (SBT) technique to generated native code that does not require run-time translation or interpretation support. This approach is interesting in situations where either the source code is not available or the target platform is not supported by any retargetable compilation framework, which is usually the case for VLIW processors. The generated simulators execute on top of our HAV based platform and model the Texas Instruments (TI) C6x series processors. Simulation results for VLIW binaries show a speed-up of around two orders of magnitude compared to the cycle accurate simulators.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF

    Survey on Instruction Selection: An Extensive and Modern Literature Review

    Full text link
    Instruction selection is one of three optimisation problems involved in the code generator backend of a compiler. The instruction selector is responsible of transforming an input program from its target-independent representation into a target-specific form by making best use of the available machine instructions. Hence instruction selection is a crucial part of efficient code generation. Despite on-going research since the late 1960s, the last, comprehensive survey on the field was written more than 30 years ago. As new approaches and techniques have appeared since its publication, this brings forth a need for a new, up-to-date review of the current body of literature. This report addresses that need by performing an extensive review and categorisation of existing research. The report therefore supersedes and extends the previous surveys, and also attempts to identify where future research should be directed.Comment: Major changes: - Merged simulation chapter with macro expansion chapter - Addressed misunderstandings of several approaches - Completely rewrote many parts of the chapters; strengthened the discussion of many approaches - Revised the drawing of all trees and graphs to put the root at the top instead of at the bottom - Added appendix for listing the approaches in a table See doc for more inf

    Survey on Combinatorial Register Allocation and Instruction Scheduling

    Full text link
    Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a compiler. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can deliver optimal solutions according to a model, can precisely capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time. This paper provides an exhaustive literature review and a classification of combinatorial optimization approaches to register allocation and instruction scheduling, with a focus on the techniques that are most applied in this context: integer programming, constraint programming, partitioned Boolean quadratic programming, and enumeration. Researchers in compilers and combinatorial optimization can benefit from identifying developments, trends, and challenges in the area; compiler practitioners may discern opportunities and grasp the potential benefit of applying combinatorial optimization

    Análise de canais laterais de tempo em tradutores dinâmicos de binários

    Get PDF
    Orientadores: Edson Borin, Diego de Freitas AranhaDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Ataques de canais laterais são um importante problema para os algoritmos criptográficos. Se o tempo de execução de uma implementação depende de uma informação secreta, um adversário pode recuperar a mesma através da medição de seu tempo. Diferentes abordagens surgiram recentemente para explorar o vazamento de informações em implementações criptográficas e para protegê-las contra esses ataques. Para tanto, a criptografia em tempo constante é uma pratica amplamente adotada visando descorrelacionar a dependencia entre um dado secreto e suas amostras de tempo. Apesar das contra-medidas serem eficazes para garantir execução dos algoritmos em um sistema evitando canais laterais de tempo, emuladores podem modificar e reintroduzir pontos de vazamento durante sua execução. Trabalhos recentes discutem os impactos dos compiladores Just-In-Time (JIT) de linguagens de alto nível no vazamento de informações a partir do tempo de execução. Entretanto, pouco foi dito sobre a emulação entre ISAs e seu impacto em vazamentos de tempo. Neste trabalho, nós investigamos o impacto de emuladores (tradutores dinâmicos de binários) entre ISAs na propriedade de tempo constante de implementações criptográficas. Utilizando métodos estatísticos e rotinas criptográficas validas, nós afirmamos a viabilidade de vazamentos de tempo em códigos gerados por tradutores dinâmicos de binários, usando diferentes técnicas de formação de regiões. Nós mostramos que a emulação pode ter um impacto significante, inserindo construções de tempo não constante durante sua tradução, levando a vazamentos de tempo significantes. Esses vazamentos podem ser observados em tradutores dinâmicos como o QEMU e o HQEMU durante a emulação de rotinas de bibliotecas criptográficas conhecidas, como a mbedTLS e podem ser rapidamente verificados. Por fim, para garantir a propriedade de tempo constante nós propusemos um modelo de mitigação para tradutores dinâmicos de binários baseado em transformações de compiladores, mitigando os canais laterais inseridosAbstract: Timing side-channel attacks are an important issue for cryptographic algorithms. If the execution time of an implementation depends on secret information, an adversary may recover the latter through measuring the former. Different approaches have recently emerged to exploit information leakage on cryptographic implementations and to protect them against these attacks. Therefore, implementation of constant-time cryptography is a widely adopted practice aiming to decorrelate the dependency between a secret data and its timing samples. Despite the countermeasures are effective to guarantee the execution of algorithms in a system by avoiding timing side-channels, emulators can modify and reintroduce leakage points during their execution. Recent works discusses the impact of high level language Just-In-Time (JIT) compilers in leakages through execution time. However, little has been said about Cross-ISA emulation through DBT and its impact on timing leakages. In this work, we investigate the impact of emulators (dynamic binary translators) on constant-time property of cryptographic implementations. By using statistical methods and cryptographic routines we asserted the feasibility of timing leaks in codes generated by a dynamic binary translator, even using different Region Formation Techniques. We show that the emulation may have a significant impact by inserting non constant-time constructions during its translations, leading to a significant timing leakage. This leakage is observed in dynamic binary translation systems such as QEMU and HQEMU when emulating routines from known cryptographic libraries, such mbedTLS and can be quickly verified. Finally, to guarantee the constant-time property we implemented a compiler transformation based on the if-conversion transformation in the dynamic binary translators, mitigating the inserted timing side-channelsMestradoCiência da ComputaçãoMestre em Ciência da Computação2014/50704-7FAPES

    Generic Reverse Compilation to Recognize Specific Behavior

    Get PDF
    Práce je zaměřena na rozpoznávání specifického chování pomocí generického zpětného překladu. Generický zpětný překlad je proces, který transformuje spustitelné soubory z různých architektur a formátů objektových souborů na stejný jazyk na vysoké úrovni. Tento proces se vztahuje k nástroji Lissom Decompiler. Pro účely rozpoznání chování práce zavádí Language for Decompilation -- LfD. LfD představuje jednoduchý imperativní jazyk, který je vhodný pro srovnávaní. Konkrétní chování je dáno známým spustitelným souborem (např. malware) a rozpoznání se provádí jako najítí poměru podobnosti s jiným neznámým spustitelným souborem. Tento poměr podobnosti je vypočítán nástrojem LfDComparator, který zpracovává dva vstupy v LfD a rozhoduje o jejich podobnosti.Thesis is aimed on recognition of specific behavior by generic reverse compilation. The generic reverse compilation is a process that transforms executables from different architectures and object file formats to same high level language. This process is covered by a tool Lissom Decompiler. For purpose of behavior recognition the thesis introduces Language for Decompilation -- LfD. LfD represents a simple imperative language, which is suitable for a comparison. The specific behavior is given by the known executable (e.g. malware) and the recognition is performed as finding the ratio of similarity with other unknown executable. This ratio of similarity is calculated by a tool LfDComparator, which processes two sources in LfD to decide their similarity.
    corecore