9 research outputs found

    Compiler optimization and ordering effects on VLIW code compression

    Get PDF
    Code size has always been an important issue for all embedded applications as well as larger systems. Code compression techniques have been devised as a way of battling bloated code; however, the impact of VLIW compiler methods and outputs on these compression schemes has not been thoroughly investigated. This paper describes the application of single- and multipleinstruction dictionary methods for code compression to decrease overall code size for the TI TMS320C6xxx DSP family. The compression scheme is applied to benchmarks taken from the Mediabench benchmark suite built with differing compiler optimization parameters. In the single instruction encoding scheme, it was found that compression ratios were not a useful indicator of the best overall code size – the best results (smallest overall code size) were obtained when the compression scheme was applied to sizeoptimized code. In the multiple instruction encoding scheme, changing parallel instruction order was found to only slightly improve compression in unoptimized code and does not affect the code compression when it is applied to builds already optimized for size

    Compiler optimization and ordering effects on VLIW code compression

    Get PDF

    Expression-tree-based algorithms for code compression on embedded RISC architectures

    No full text
    Reducing program size has become an important goal in the design of modern embedded systems targeted to mass production. This problem has driven efforts aimed at designing processors with shorter instruction formats (e.g., ARM Thumb and MIPS16) or able to execute compressed code (e.g., IBM PowerPC 405), This paper proposes three code compression algorithms for embedded RISC architectures. In all algorithms, the encoded symbols are extracted from program expression trees. The algorithms differ on the granularity of the encoded symbol, which are selected from whole trees, parts of trees, or single instructions. Dictionary-based decompression engines are proposed for each compression algorithm. Experimental results, based on SPEC CINT95 programs running on the MIPS R4000 processor, reveal an average compression ratio of 53.6% (31.545) if the area of the decompression engine is (not) considered.8553053

    SPARC16 : a new compression approach for SPARC processors

    Get PDF
    Orientadores: Rodolfo Jardim de Azevedo, Paulo César CentoducatteDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Processadores RISC podem ser usados para atender a crescente demanda por desempenho requerida por sistemas embarcados. Entretanto, essas arquiteturas têm como desvantagem uma densidade de código ruim. Recodificações do conjunto de instruções, como o MIPS16 e o Thumb, representam uma abordagem eficiente para lidar com esse problema. Esse trabalho propõe uma codificação alternativa para a arquitetura SPARCv8. A nova codificação, chamada SPARC16, foi projetada com a ajuda de um modelo de programação linear inteira. As novas instruções utilizam 16 bits para serem codificadas e são facilmente traduzidas para suas correspondentes no conjunto de instruções original em tempo de execução, tornando possível posicionar um descompressor antes do estágio de decode de um processador SPARC e usar o restante do pipeline de forma transparente. O descompressor foi projetado e integrado no processador Leon 3 (SPARCv8) e ocasionou um acréscimo de 24% na área e nenhuma penalização na freqüência. Apenas um montador foi implementado para a extensão SPARC16. O descompressor foi validado através de programas que exercitam todas as instruções SPARC16 escritos diretamente em linguagem de montagem. As razões de compressão dos programas dos benchmarks Mediabench e Mibench foram obtidas inferindo como código SPARCv8 seria representado com instruções SPARC16. Através desse método, razões de compressão de até 58% foram atingidas (para o programa cjpeg) com uma média de 61.27% para os programas do Mediabench e 60.77% para os programas do Mibench. Utilizando a mesma abordagem, uma avaliação da mudança trazida pelo uso de SPARC16 nos padrões de acesso à cachê de instruções foi feita e mostrou reduções no número de misses até superiores a 50%Abstract: RISC processors can be used to face the ever increasing demand for performance required by embedded systems. Nevertheless, these architectures have as drawback a poor code density. Alternate encodings for instruction sets, such as MIPS16 and Thumb, represent an effective approach to deal with this problem. This work proposes an alternate encoding for the SPARCv8 architecture. The new encoding, called SPARC16, was designed with the aid of an integer linear programming model. The new instructions are 16-bits wide and are easily translated to its 32-bit counterparts during execution time, making it possible to place a decompressor engine before the decode stage of a SPARC processor and use the remaining of the pipeline transparently. The decompressor engine was designed and integrated into the Leon 3 processor (SPARCv8) and caused an increase of 24% in area and no timing overhead. Only an assembler was implemented for the SPARC16 extension. The decompressor engine was validated using programs that cover all the SPARC16 instructions written directly in assembly language. The compression ratios for the programs belonging to the Mediabench and Mibench benchmarks were obtained inferring how SPARCv8 code would be represented with SPARC16 instructions. Through this method, compression ratios as low as 58% were achieved (for the cjpeg program) with an average of 61.27% for the Mediabench programs and 60.77% for the Mibench programs. Using the same approach, an evaluation of the change brought by the use of SPARC16 in the instruction cache access patterns was performed and showed reductions in the number of misses even greater than 50%MestradoCiência da ComputaçãoMestre em Ciência da Computaçã

    Estudo e avaliação de conjuntos de instruções compactos

    Get PDF
    Orientador: Rodolfo Jardim de AzevedoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Sistemas embarcados modernos são compostos de SoC heterogêneos, variando entre processadores de baixo e alto custo. Apesar de processadores RISC serem o padrão para estes dispositivos, a situação mudou recentemente: fabricantes estão construindo sistemas embarcados utilizando processadores RISC - ARM e MIPS - e CISC (x86). A adição de novas funcionalidades em software embarcados requer maior utilização da memória, um recurso caro e escasso em SoCs. Assim, o tamanho de código executável é crítico, porque afeta diretamente o número de misses na cache de instruções. Processadores CISC costumavam possuir maior densidade de código do que processadores RISC, uma vez que a codificação de instruções com tamanho variável beneficia as instruções mais usadas, os programas são menores. No entanto, com a adição de novas extensões e instruções mais longas, a densidade do CISC em aplicativos recentes tornou-se similar ao RISC. Nesta tese de doutorado, investigamos a compressibilidade de processadores RISC e CISC; SPARC e x86. Nós propomos uma extensão de 16-bits para o processador SPARC, o SPARC16. Apresentamos também, a primeira metodologia para gerar ISAs de 16-bits e avaliamos a compressão atingida em comparação com outras extensões de 16-bits. Programas do SPARC16 podem atingir taxas de compressão melhores do que outros ISAs, atingindo taxas de até 67%. O SPARC16 também reduz taxas de cache miss em até 9%, podendo usar caches menores do que processadores SPARC mas atingindo o mesmo desempenho; a redução pode chegar à um fator de 16. Estudamos também como novas extensões constantemente introduzem novas funcionalidades para o x86, levando ao inchaço do ISA - com o total de 1300 instruções em 2013. Alem disso, 57 instruções se tornam inutilizadas entre 1995 e 2012. Resolvemos este problema propondo um mecanismo de reciclagem de opcodes utilizando emulação de instruções legadas, sem quebrar compatibilidade com softwares antigos. Incluímos um estudo de caso onde instruções x86 da extensão AVX são recodificadas usando codificações menores, oriundas de instruções inutilizadas, atingindo até 14% de redução no tamanho de código e 53% de diminuição do número de cache misses. Os resultados finais mostram que usando nossa técnica, até 40% das instruções do x86 podem ser removidas com menos de 5% de perda de desempenhoAbstract: Modern embedded devices are composed of heterogeneous SoC systems ranging from low to high-end processor chips. Although RISC has been the traditional processor for these devices, the situation changed recently; manufacturers are building embedded systems using both RISC - ARM and MIPS - and CISC processors (x86). New functionalities in embedded software require more memory space, an expensive and rare resource in SoCs. Hence, executable code size is critical since performance is directly affected by instruction cache misses. CISC processors used to have a higher code density than RISC since variable length encoding benefits most used instructions, yielding smaller programs. However, with the addition of new extensions and longer instructions, CISC density in recent applications became similar to RISC. In this thesis, we investigate compressibility of RISC and CISC processors, namely SPARC and x86. We propose a 16-bit extension to the SPARC processor, the SPARC16. Additionally, we provide the first methodology for generating 16-bit ISAs and evaluate compression among different 16-bit extensions. SPARC16 programs can achieve better compression ratios than other ISAs, attaining results as low as 67%. SPARC16 also reduces cache miss rates up to 9%, requiring smaller caches than SPARC processors to achieve the same performance; a cache size reduction that can reach a factor of 16. Furthermore, we study how new extensions are constantly introducing new functionalities to x86, leading to the ISA bloat at the cost a complex microprocessor front-end design, area and energy consumption - the x86 ISA reached over 1300 different instructions in 2013. Moreover, analyzed x86 code from 5 Windows versions and 7 Linux distributions in the range from 1995 to 2012 shows that up to 57 instructions get unused with time. To solve this problem, we propose a mechanism to recycle instruction opcodes through legacy instruction emulation without breaking backward software compatibility. We present a case study of the AVX x86 SIMD instructions with shorter instruction encodings from other unused instructions to yield up to 14% code size reduction and 53% instruction cache miss reduction in SPEC CPU2006 floating-point programs. Finally, our results show that up to 40% of the x86 instructions can be removed with less than 5% of overhead through our technique without breaking any legacy codeDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã

    Exploiting The Area X Performance Trade-off With Code Compression

    No full text
    Code compression has been shown to be efficient in code size reduction and, recently in performance improvement. In this paper we use a compression method, the ComPacket, which has a very fast decompressor in hardware, to compress selective regions of the code (the inner-loops) to improve performance and in the complementary regions we use the Instruction Based Compression (IBC) method to sustain the code size reduction both at the same time. Using the Ieon (SPARC v8) platform and benchmarks from Mediabench and MiBench suites we reached 29% of memory area reduction, on average, and a speed-up of 1.8 simultaneously. © 2005 IEEE.20054245Wolfe, A., Chanin, A., Executing Compressed Programs on an Embedded RISC Architecture (1992) Proc. of ACM/IEEE Annual International Symposium on Microarchitecture, pp. 81-91. , NovAraujo, G., Centoducatte, P., Azevedo, R., Pannain, R., Expression tree based algorithms for code compression on embedded RISC architectures (2000) IEEE Transactions on VLSI Systems, 8 (5), pp. 530-533. , OctWanderley Netto, E., Azevedo, R., Centoducatte, P., Araujo, G., (2003) Mixed Static/Dynamic Profiling for Dictionary Based Code Compression, pp. 159-163. , SoC, NovBenini, L., Macci, A., Nannarelli, A., Cached-code compression for energy minimization in embedded processor (2001) Proc. of ISPLED'01, pp. 322-327. , AugLekatsas, H., Henkel, J., Jakkula, V., Design of one-cycle decompression hardware for performance increase in embedded systems (2002) Proc. of DAC'02, pp. 34-39. , JunKemp, T., Montoye, R., Auerbach, D., Harper, J., Palmer, J., A Decompression Core for PowerPC (1998) IBM Journal of Research and Development, 42 (6), pp. 807-812. , SepKirovski, D., Kin, J., Mangione-Smith, W., Procedure Based Program Compression (1997) Proc. of ACM/IEEE Annual International Symposium on Microarchitecture, pp. 194-203. , DecGaisler, G., Leon, (2003), http://www.gaisler.com, OnLine, Oct, AvailableLee, C., Potkonjak, M., Mangione-Smith, W., Mediabench: A tool for evaluating and synthesizing multimedia communication systems (1997) Proc. of ACM/IEEE Annual International Symposium on Microarchitecture, pp. 330-337. , DecGuthaus, M., Ringenberg, J., Ernst, D., Austin, T., Mudge, T., Mibench: A free, commercially representative mbedded benchmark suite (2001) Proc. of the IEEE 4th Annual Workshop on Workload Characterization, pp. 3-14. , De

    Multi-profile Based Code Compression

    No full text
    Code compression has been shown to be an effective technique to reduce code size in memory constrained embedded systems. It has also been used as a way to increase cache hit ratio, thus reducing power consumption and improving performance. This paper proposes an approach to mix static/dynamic instruction profiling in dictionary construction, so as to best exploit trade-offs in compression ratio/performance. Compressed instructions are stored as variable-size indices into fixed-size codewords, eliminating compressed code misalignments. Experimental results, using the Leon (SPARCv8) processor and a program mix from MiBench and Mediabench, show that our approach halves the number of cache accesses and power consumption while produces compression ratios as low as 56%.244249Araujo, G., Centoducatte, P., Azevedo, R., Pannain, R., Expression tree based algorithms for code compression on embedded RISC architectures (2000) IEEE Transactions on VLSI Systems, 8 (5), pp. 530-533. , OctBenini, L., Macci, A., Nannarelli, A., Cached-code compression for energy minimization in embedded processor (2001) Proceedings of the International Symposium on Low Power Electronics and Design, pp. 322-327. , AugDebray, S., Evans, W., Profile-guided code compression (1998) Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation, pp. 95-105. , JuneGaisler, G.L., (2003), www.gaisler.comGuthaus, M., Ringenberg, M., Ernst, D., Austin, T., Mudge, T., Brown, R., MiBench: A free, commercially representative embedded benchmark suite (2001) Proceedings of the IEEE 4th Annual Workshop on Workload Characterization, pp. 3-14. , DecHennessy, J., Patterson, D., (2002) Computer Architecture: A Quantitative Approach, 3rd Ed., , Morgan Kaufmann Publ(1998) CodePack: PowerPC Code Compression Utility User's Manual. V3., , IBM CorporationLee, C., Potkonjak, M., Mangione-Smith, W., MediaBench: A tool for evaluating and synthesizing multimedia communication system (1997) Proceedings of the Int'l Symp. on Microarchitecture, pp. 330-337. , DecLefurgy, C., Bird, P., Chen, I.-C., Mudge, T., Improving code density using compression technique (1997) Proc. of the Int'l Symp. on Microarchitecture, pp. 194-203. , DecLekatsas, H., Henkel, J., Jakkula, V., Design of one-cycle decompression hardware for performance increase in embedded systems (2002) Proceedings of the Design Automation Conference, pp. 34-39. , JuneLekatsas, H., Wolf, W., SAMC: A code compression algorithm for embedded systems (1999) IEEE Transactions on CAD, 18 (12), pp. 1689-1701. , DecSeal, D., ARM Architecture Reference Manual, 2nd Ed., , Adison-Wesley, Reading/MA, 2000Wanderley Netto, E., Azevedo, R., Centoducatte, P., Araujo, G., Mixed static/dynamic profiling for dictionary based code compression (2003) Proceedings of the International Symposium on System-on-chip, pp. 159-163. , NovWilton, S., Jouppi, N., CACTI: An enhanced cache access and cycle time model (1996) IEEE J. of Solid-state Circuits, 35 (5), pp. 677-688. , MayWolfe, A., Chanin, A., Executing compressed programs on an embedded RISC architecture (1992) Proceedings of the Int'l Symp. on Microarchitecture, pp. 81-91. , De

    Multi-profile Instruction Based Compression

    No full text
    Code compression has been used to minimize the memory area requirement of embedded systems. Recently, performance improvement and energy consumption reductionare observed as a by-product of compression. In this paper we propose a novel technique for efficiently exploring the trade-offs involved in code compression. Our Multi-Profile approach to build dictionaries combines the best features of both static and dynamic program behaviors. The experiments with Mediabench and MiBench suites and the Leon (SPARCv8) processor reveals a compression ratio as low as 71% while performance speed-up reaches 1.5. © 2004 IEEE.2329Araujo, G., Centoducatte, P., Azevedo, R., Pannain, R., Expression tree based algorithms for code compression on embedded RISC architectures (2000) IEEE Transactions on VLSI Systems, , MarAraujo, G., Centoducatte, P., Côrtes, M., Pannain, R., Code compression based on operand factorization (1998) Proc. Int'l Symp. on Microarchitecture, pp. 194-201. , Dec(1995) An Introduction to Thumb, , Advanced RISC Machines Ltd., MarBenini, L., Macii, A., Macii, E., Poncino, M., Selective instruction compression for memory energy reduction in embedded systems (1999) Proc. Int'l. Symp. on Low-power Electronics and Design, pp. 206-211De Azevedo, R.J., (2002) Uma Arquitetura Para Execução de Código Comprimido em Sistemas Dedicados, , PhD thesis, Institute de Computação - UNICAMP, JunDebray, S., Evans, W., Muth, R., Compiler techniques for code compression (1999) Workshop on Compiler Support for System Software, , May. Note: Preliminary version of article in ACM Transactions on Programming Languages and SystemsGaisler, G., (2003) Leon, , http://www.gaisler.com, [OnLine], OctGame, M., Booker, A., (1998) CodePack: Code Compression for PowerPC Processors, , International Business Machines (IBM) CorporationGuthaus, M., Ringenberg, J., Ernst, D., Austin, T., Mudge, T., Mibench: A free, commercially representative embedded benchmark suite (2001) Proc. of the IEEE 4th Annual Workshop on Workload Characterization, pp. 3-14. , DecKissell, K., (1997) MIPS16: High-density MIPS for the Embedded Market, , Silicon Graphics MIPS GroupLee, C., Potkonjak, M., Mangione-Smith, W., (1997) Mediabench: A Tool for Evaluating and Synthesizing Multimedia Communication Systems, pp. 330-337. , DecLefurgy, C., Bird, P., Chen, I.-C., Mudge, T., Improving code density using compression techniques (1997) Proc. Int'l Symp. on Microarchitecture, , DecLefurgy, C., Piccininni, E., Mudge, T., Analysis of a high-performance code compression method (1999) Proc. Int'l Symp. on Microarchitecture, , NovLekatsas, H., Henkel, J., Jakkula, V., Design of a one-cycle decompression hardware for performance increase in embedded systems (2002) Proc. ACM/IEEE Design Automation Conference, pp. 34-39. , JuneLekatsas, H., Henkel, J., Wolf, W., Code compression for low power embedded system design (2000) Proc. ACM/IEEE Design Automation ConferenceLekatsas, H., Wolf, W., Code compression for embedded systems (1998) Proc. ACM/IEEE Design Automation ConferenceLekatsas, H., Wolf, W., SAMC: A code compression algorithm for embedded processors (1999) IEEE Transactions on CAD, 18 (12), pp. 1689-1701. , DecNetto, E.W., Azevedo, R., Centoducatte, P., Araujo, G., Multi-profile based code compression (2004) Design Automation Conference, DAC04, , To appear inWolfe, A., Chanin, A., Executing compressed programs on an embedded RISC architecture (1992) Proc. Int'l Symp. on Microarchitectur
    corecore