3 research outputs found

    A hybrid code compression technique using bitmask and prefix encoding with enhanced dictionary selection

    No full text

    Estudo e avaliação de conjuntos de instruções compactos

    Get PDF
    Orientador: Rodolfo Jardim de AzevedoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Sistemas embarcados modernos são compostos de SoC heterogêneos, variando entre processadores de baixo e alto custo. Apesar de processadores RISC serem o padrão para estes dispositivos, a situação mudou recentemente: fabricantes estão construindo sistemas embarcados utilizando processadores RISC - ARM e MIPS - e CISC (x86). A adição de novas funcionalidades em software embarcados requer maior utilização da memória, um recurso caro e escasso em SoCs. Assim, o tamanho de código executável é crítico, porque afeta diretamente o número de misses na cache de instruções. Processadores CISC costumavam possuir maior densidade de código do que processadores RISC, uma vez que a codificação de instruções com tamanho variável beneficia as instruções mais usadas, os programas são menores. No entanto, com a adição de novas extensões e instruções mais longas, a densidade do CISC em aplicativos recentes tornou-se similar ao RISC. Nesta tese de doutorado, investigamos a compressibilidade de processadores RISC e CISC; SPARC e x86. Nós propomos uma extensão de 16-bits para o processador SPARC, o SPARC16. Apresentamos também, a primeira metodologia para gerar ISAs de 16-bits e avaliamos a compressão atingida em comparação com outras extensões de 16-bits. Programas do SPARC16 podem atingir taxas de compressão melhores do que outros ISAs, atingindo taxas de até 67%. O SPARC16 também reduz taxas de cache miss em até 9%, podendo usar caches menores do que processadores SPARC mas atingindo o mesmo desempenho; a redução pode chegar à um fator de 16. Estudamos também como novas extensões constantemente introduzem novas funcionalidades para o x86, levando ao inchaço do ISA - com o total de 1300 instruções em 2013. Alem disso, 57 instruções se tornam inutilizadas entre 1995 e 2012. Resolvemos este problema propondo um mecanismo de reciclagem de opcodes utilizando emulação de instruções legadas, sem quebrar compatibilidade com softwares antigos. Incluímos um estudo de caso onde instruções x86 da extensão AVX são recodificadas usando codificações menores, oriundas de instruções inutilizadas, atingindo até 14% de redução no tamanho de código e 53% de diminuição do número de cache misses. Os resultados finais mostram que usando nossa técnica, até 40% das instruções do x86 podem ser removidas com menos de 5% de perda de desempenhoAbstract: Modern embedded devices are composed of heterogeneous SoC systems ranging from low to high-end processor chips. Although RISC has been the traditional processor for these devices, the situation changed recently; manufacturers are building embedded systems using both RISC - ARM and MIPS - and CISC processors (x86). New functionalities in embedded software require more memory space, an expensive and rare resource in SoCs. Hence, executable code size is critical since performance is directly affected by instruction cache misses. CISC processors used to have a higher code density than RISC since variable length encoding benefits most used instructions, yielding smaller programs. However, with the addition of new extensions and longer instructions, CISC density in recent applications became similar to RISC. In this thesis, we investigate compressibility of RISC and CISC processors, namely SPARC and x86. We propose a 16-bit extension to the SPARC processor, the SPARC16. Additionally, we provide the first methodology for generating 16-bit ISAs and evaluate compression among different 16-bit extensions. SPARC16 programs can achieve better compression ratios than other ISAs, attaining results as low as 67%. SPARC16 also reduces cache miss rates up to 9%, requiring smaller caches than SPARC processors to achieve the same performance; a cache size reduction that can reach a factor of 16. Furthermore, we study how new extensions are constantly introducing new functionalities to x86, leading to the ISA bloat at the cost a complex microprocessor front-end design, area and energy consumption - the x86 ISA reached over 1300 different instructions in 2013. Moreover, analyzed x86 code from 5 Windows versions and 7 Linux distributions in the range from 1995 to 2012 shows that up to 57 instructions get unused with time. To solve this problem, we propose a mechanism to recycle instruction opcodes through legacy instruction emulation without breaking backward software compatibility. We present a case study of the AVX x86 SIMD instructions with shorter instruction encodings from other unused instructions to yield up to 14% code size reduction and 53% instruction cache miss reduction in SPEC CPU2006 floating-point programs. Finally, our results show that up to 40% of the x86 instructions can be removed with less than 5% of overhead through our technique without breaking any legacy codeDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã

    Sparc16: A New Compression Approach For The Sparc Architecture

    No full text
    RISC processors can be used to face the ever increasing demand for performance required by embedded systems. Nevertheless, this solution comes with the cost of poor code density. Alternative encodings for instruction sets, such as MIPS16 and Thumb, represent an effective approach to deal with this drawback. This article proposes to apply a new encoding to the SPARCv8 architecture. Through extensive analysis of a program mix from the Mibench and Mediabench benchmark suites, we suggest a new 16-bit instruction set, easily translated to its 32-bit counterpart during execution time. Using the aforementioned program mix to infer how code could be represented in the proposed 16-bit ISA, compression ratios as low as 56% can be obtained. We also evaluated the cache behavior and showed reductions of 42% on cache misses that can increase performance up to 28% (for patricia program with 2KB cache). © 2009 IEEE.169176(1995) An Introduction to Thumb, , ARM, Advanced RISC Machines Ltd, MarAslam, N., Milward, M., Nousias, I., Arslan, T., Erdogan, A., (2007) Code compression and decompression for instruction cell based reconfigurable systems, pp. 1-7. , MarchBeszédes, A., Ferenc, R., Gyimóthy, T., Dolenc, A., Karsisto, K., Survey of code-size reduction methods (2003) ACM Comput. Surv, 35 (3), pp. 223-267Billo, E., Azevedo, R., Araujo, G., Centoducatte, P., Netto, E.W., Design of a decompressor engine on a sparc processor (2005) SBCCI '05: Proceedings of the 18th annual symposium on Integrated circuits and system design, pp. 110-114. , New York, NY, USA, ACMBonny, T., Henkel, J., Efficient code density through look-up table compression (2007) Design, Automation and Test in Europe Conference and Exhibition, 0, p. 151Bonny, T., Henkel, J., Instruction re-encoding facilitating dense embedded code (2008) Design, Automation and Test in Europe Conference and Exhibition, 0, pp. 770-775Bunda, J., Fussell, D., Athas, W.C., Jenevein, R., 16-bit vs. 32-bit instructions for pipelined microprocessors (1993) SIGARCH Comput. Archit. News, 21 (2), pp. 237-246Chen, X., Yang, L., Lekatsas, H., Dick, R.P., Shang, L., Design and implementation of a high-performance microprocessor cache compression algorithm (2008) Data Compression Conference, 0, pp. 43-52Collin, M., Brorsson, M., Two-level dictionary code compression: A new scheme to improve instruction code density of embedded applications (2009) Code Generation and Optimization, IEEE/ACM International Symposium on, 0, pp. 231-242Corliss, M.L., Lewis, E.C., Roth, A., The implementation and evaluation of dynamic code decompression using dise (2005) ACM Trans. Embed. Comput. Syst, 4 (1). , 38-72J. Edler and M. Hill. Dinero iv trace-driven uniprocessor cache simulator, , http://www.cs.wisc.edu/markhill/dineroiv, online at, 2003Game, M., Booker, A., (1998) CodePack: Code Compression for PowerPC Processors, , International Business Machines (IBM) CorporationGuthaus, M., Ringenberg, J., Ernst, D., Austin, T., Mudge, T., Brown, R., Mibench: A free, commercially representative embedded benchmark suite (2001) Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop on, pp. 3-14. , DecHaider, S.I., Nazhandali, L., A hybrid code compression technique using bitmask and prefix encoding with enhanced dictionary selection (2007) CASES '07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, pp. 58-62. , New York, NY, USA, ACMKissell, K., (1997) MIPS16: High-density MIPS for the Embedded Market, , Silicon Graphics MIPS GroupKumar, R., Das, D., Code compression for performance enhancement of variable-length embedded processors (2008) ACM Trans. Embed. Comput. Syst, 7 (3). , 1-36Lee, C., Potkonjak, M., Mangione-Smith, W.H., Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems (1997) MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, pp. 330-335. , Washington, DC, USA, IEEE Computer SocietyNetto, E.W., Azevedo, R., Centoducatte, P., Araujo, G., Multi-profile based code compression (2004) DAC '04: Proceedings of the 41st annual conference on Design automation, pp. 244-249. , New York, NY, USA, ACMPatterson, D.A., Hennessy, J.L., (1990) Computer architecture: A quantitative approach, , Morgan Kaufmann Publishers Inc, San Francisco, CA, USAQin, X., Mishra, P., Efficient placement of compressed code for parallel decompression (2009) VLSI Design, International Conference on, 0, pp. 335-340Rigo, S., Araujo, G., Bartholomeu, M., Azevedo, R., (2004) Archc: A systemc-based architecture description language, pp. 66-73. , OctSeong, S.-W., Mishra, P., An efficient code compression technique using application-aware bitmask and dictionary selection methods (2007) Design, Automation and Test in Europe Conference and Exhibition, 0, p. 112C. SPARC International, Inc. The SPARC architecture manual: version 8. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1992Wilner, W.T., Burroughs b1700 memory utilization (1972) AFIPS '72 (Fall, part I): Proceedings of the December 5-7, 1972, fall joint computer conference, part I, pp. 579-586. , New York, NY, USA, AC
    corecore