7 research outputs found

    Microarchitectural design-space exploration of an in-order RISC-V processor in a 22nm CMOS technology

    Get PDF
    The purpose of this paper is to explore the trade-offs between IPC and maximum clock frequency in an in-order processor design. This work evaluates the impact on the performance and frequency of different pipeline optimizations. We target ASIC implementation using an advanced synthesis tool-flow with modern technology libraries. As a result, we can analyze the processor’s critical paths in a representative environment. In this paper, we analyze and modify Riscy, an in-order processor, taking into account the consequences of considering the ASIC target for this design. We have achieved a frequency of 1.3GHz and 2.03 CoreMark/MHz in the EEMBC CoreMark.Peer ReviewedPostprint (published version

    Mix-GEMM: An efficient HW-SW architecture for mixed-precision quantized deep neural networks inference on edge devices

    Get PDF
    Deep Neural Network (DNN) inference based on quantized narrow-precision integer data represents a promising research direction toward efficient deep learning computations on edge and mobile devices. On one side, recent progress of Quantization-Aware Training (QAT) frameworks aimed at improving the accuracy of extremely quantized DNNs allows achieving results close to Floating-Point 32 (FP32), and provides high flexibility concerning the data sizes selection. Unfortunately, current Central Processing Unit (CPU) architectures and Instruction Set Architectures (ISAs) targeting resource-constrained devices present limitations on the range of data sizes supported to compute DNN kernels.This paper presents Mix-GEMM, a hardware-software co-designed architecture capable of efficiently computing quantized DNN convolutional kernels based on byte and sub-byte data sizes. Mix-GEMM accelerates General Matrix Multiplication (GEMM), representing the core kernel of DNNs, supporting all data size combinations from 8- to 2-bit, including mixed-precision computations, and featuring performance that scale with the decreasing of the computational data sizes. Our experimental evaluation, performed on representative quantized Convolutional Neural Networks (CNNs), shows that a RISC-V based edge System-on-Chip (SoC) integrating Mix-GEMM achieves up to 1.3 TOPS/W in energy efficiency, and up to 13.6 GOPS in throughput, gaining from 5.3× to 15.1× in performance over the OpenBLAS GEMM frameworks running on a commercial RISC-V based edge processor. By performing synthesis and Place and Route (PnR) of the enhanced SoC in Global Foundries 22nm FDX technology, we show that Mix-GEMM only accounts for 1% of the overall area consumption.This research was supported by the ERDF Operational Program of Catalonia 2014-2020, with a grant from the Spanish State Research Agency [PID2019-107255GB] and with DRAC project [001-P-001723], by the grant [PID2019-107255G-C21] funded by MCIN/AEI/ 10.13039/501100011033, by the Generalitat de Catalunya [2017-SGR-1328], and by Lenovo-BSC Contract-Framework (2020). The Spanish Ministry of Economy, Industry and Competitiveness has partially supported M. Doblas through an FPU fellowship [FPU20-04076] and M. Moreto through a Ramon y Cajal fellowship [RYC-2016-21104].Peer ReviewedPostprint (author's final draft

    Sargantana: A 1 GHz+ in-order RISC-V processor with SIMD vector extensions in 22nm FD-SOI

    Get PDF
    The RISC-V open Instruction Set Architecture (ISA) has proven to be a solid alternative to licensed ISAs. In the past 5 years, a plethora of industrial and academic cores and accelerators have been developed implementing this open ISA. In this paper, we present Sargantana, a 64-bit processor based on RISC-V that implements the RV64G ISA, a subset of the vector instructions extension (RVV 0.7.1), and custom application-specific instructions. Sargantana features a highly optimized 7-stage pipeline implementing out-of-order write-back, register renaming, and a non-blocking memory pipeline. Moreover, Sar-gantana features a Single Instruction Multiple Data (SIMD) unit that accelerates domain-specific applications. Sargantana achieves a 1.26 GHz frequency in the typical corner, and up to 1.69 GHz in the fast corner using 22nm FD-SOI commercial technology. As a result, Sargantana delivers a 1.77× higher Instructions Per Cycle (IPC) than our previous 5-stage in-order DVINO core, reaching 2.44 CoreMark/MHz. Our core design delivers comparable or even higher performance than other state-of-the-art academic cores performance under Autobench EEMBC benchmark suite. This way, Sargantana lays the foundations for future RISC-V based core designs able to meet industrial-class performance requirements for scientific, real-time, and high-performance computing applications.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (contract PID2019- 107255GB-C21), by the Generalitat de Catalunya (contract 2017-SGR-1328), by the European Union within the framework of the ERDF of Catalonia 2014-2020 under the DRAC project [001-P-001723], and by Lenovo-BSC Contract-Framework (2020). The Spanish Ministry of Economy, Industry and Competitiveness has partially supported M. Doblas and V. Soria-Pardos through a FPU fellowship no. FPU20-04076 and FPU20-02132 respectively. G. Lopez-Paradis has been supported by the Generalitat de Catalunya through a FI fellowship 2021FI-B00994. S. Marco-Sola was supported by Juan de la Cierva fellowship grant IJC2020-045916-I funded by MCIN/AEI/10.13039/501100011033 and by “European Union NextGenerationEU/PRTR”, and M. Moretó through a Ramon y Cajal fellowship no. RYC-2016-21104.Peer ReviewedPostprint (author's final draft

    DVINO: A RISC-V vector processor implemented in 65nm technology

    Get PDF
    This paper describes the design, verification, implementation and fabrication of the Drac Vector IN-Order (DVINO) processor, a RISC-V vector processor capable of booting Linux jointly developed by BSC, CIC-IPN, IMB-CNM (CSIC), and UPC. The DVINO processor includes an internally developed two-lane vector processor unit as well as a Phase Locked Loop (PLL) and an Analog-to-Digital Converter (ADC). The paper summarizes the design from architectural as well as logic synthesis and physical design in CMOS 65nm technology.The DRAC project is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. The authors are part of RedRISCV which promotes activities around open hardware. The Lagarto Project is supported by the Research and Graduate Secretary (SIP) of the Instituto Politecnico Nacional (IPN) from Mexico, and by the CONACyT scholarship for Center for Research in Computing (CIC-IPN).Peer ReviewedArticle signat per 43 autors/es: Guillem Cabo∗, Gerard Candón∗, Xavier Carril∗, Max Doblas∗, Marc Domínguez∗, Alberto González∗, Cesar Hernández†, Víctor Jiménez∗, Vatistas Kostalampros∗, Rubén Langarita∗, Neiel Leyva†, Guillem López-Paradís∗, Jonnatan Mendoza∗, Francesco Minervini∗, Julian Pavón∗, Cristobal Ramírez∗, Narcís Rodas∗, Enrico Reggiani∗, Mario Rodríguez∗, Carlos Rojas∗, Abraham Ruiz∗, Víctor Soria∗, Alejandro Suanes‡, Iván Vargas∗, Roger Figueras∗, Pau Fontova∗, Joan Marimon∗, Víctor Montabes∗, Adrián Cristal∗, Carles Hernández∗, Ricardo Martínez‡, Miquel Moretó∗§, Francesc Moll∗§, Oscar Palomar∗§, Marco A. Ramírez†, Antonio Rubio§, Jordi Sacristán‡, Francesc Serra-Graells‡, Nehir Sonmez∗, Lluís Terés‡, Osman Unsal∗, Mateo Valero∗§, Luís Villa† // ∗Barcelona Supercomputing Center (BSC), Barcelona, Spain. Email: [email protected]; †Centro de Investigación en Computación, Instituto Politécnico Nacional (CIC-IPN), Mexico City, Mexico; ‡ Institut de Microelectronica de Barcelona, IMB-CNM (CSIC), Spain. Email: [email protected]; §Universitat Politecnica de Catalunya (UPC), Barcelona, Spain. Email: [email protected] (author's final draft

    Optimització de un processador amb una tecnologia FD-SOI de 22 nm

    No full text
    The thesis aims to design and implement an in-order core named Sargantana using the open RISC-V Instruction Set Architecture (ISA). Sargantana targets to improve the in-order 5-stage Lagaro-Hun core used in the early iterations of the DRAC project. Sargantana has a more mature 7-stage pipeline with out-of-order write back, register renaming, and a non-blocking memory pipeline. Also, this thesis applies microarchitectural design space exploration into Sargantana to balance the cycle performance and the clock frequency. 22nm FD-SOI commercial technology libraries are used to take physical effects into account in the register-transfer level (RTL) design. In this way, the processor's bottlenecks are analyzed to reach the maximum clock frequency using realistic technology. With this exploration, Sargantana achieves 2 CoreMark/MHz and 1 GHz in the worst corner (and up to 1,88 GHz in the faster) using 22nm FD-SOI commercial technology libraries. Sargantana obtains over its predecessor, the 5-stage Lagarto-Hun, an IPC speed-up of 1,37X.La tesi té com a objectiu dissenyar i implementar un processador en ordre anomenat Sargantanal, el qual implementa el set d'instruccions (ISA) lliure RISC-V. Sargantana té com a objectiu millorar el processador Lagarto-Hun de cinc etapes, utilitzat en les primeres iteracions del projecte DRAC. Sargantana té una pipeline de set etapes més madura amb escriptura fora d'odre i una pipeline de memòria que no bloquejant. A més, aquesta tesi aplica una exploració espacial de disseny microarquitectònic a Sargantana per equilibrar el IPC i la freqüència del rellotge. S'han utilitzat les llibreries tecnològiques comercials FD-SOI de 22 nm per tenir en compte els efectes físics en el disseny digital. D'aquesta manera, s'analitzen els punts limitants del processador per tal d'assolir una freqüència de rellotge màxima mitjançant una tecnologia realista. Amb aquesta exploració, Sargantana aconsegueix 2 CoreMark/MHz i 1~GHz en les pitjors condicions (i fins a 1,88 GHz en les millors) mitjançant les llibreries tecnològiques comercials FD-SOI de 22 nm. Sargantana obté sobre el seu predecessor, el Lagarto-Hun de cinc etapes, una millora en IPC de 1,37X

    Microarchitectural design-space exploration of an in-order RISC-V processor in a 22nm CMOS technology

    Get PDF
    El propòsit d'aquesta tesi és aplicar l'exploració d'espai de disseny microarquitectònic en un processador en ordre per aconseguir un equilibri entre el rendiment per cicle i la freqüència màxima del rellotge. El treball mostra l'impacte sobre el rendiment per cicle i la freqüència màxima de rellotge per a diferents optimitzacions aplicades al processador. Utilitzem una implementació enfocada a ASIC usant unes eines de síntesis avançades amb biblioteques de tecnologia de fabricació modernes per analitzar millor els punts crítics del processador en termes de la freqüència de rellotge màxima en un entorn real.El propósito de esta tesis es aplicar la exploración de espacio de diseño microarquitectònic en un procesador en orden para conseguir un equilibrio entre el rendimiento por ciclo y la frecuencia máxima del reloj. El trabajo muestra el impacto sobre el rendimiento por ciclo y la frecuencia máxima de reloj para diferentes optimizaciones aplicadas al procesador. Utilizamos una implementación enfocada a ASIC usando unas herramientas de síntesis avanzadas con bibliotecas de tecnología de fabricación modernas para analizar mejor los puntos críticos del procesador en términos de la frecuencia de reloj máxima en un entorno real.The purpose of this thesis is to apply microarchitectural design space exploration into an in-order processor to achieve a balance between cycle performance and maximum clock frequency. The work shows the impact on cycle performance, and maximum clock frequency for different pipeline optimizations applied to the processor. We target ASIC implementation using advanced synthesis tool flow with modern technology libraries to better analyze the processor’s bottlenecks in terms of the maximum clock frequency in a real environment.Outgoin

    An academic RISC-V silicon implementation based on open-source components

    Get PDF
    ©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The design presented in this paper, called preDRAC, is a RISC-V general purpose processor capable of booting Linux jointly developed by BSC, CIC-IPN, IMB-CNM (CSIC), and UPC. The preDRAC processor is the first RISC-V processor designed and fabricated by a Spanish or Mexican academic institution, and will be the basis of future RISC-V designs jointly developed by these institutions. This paper summarizes the design tasks, for FPGA first and for SoC later, from high architectural level descriptions down to RTL and then going through logic synthesis and physical design to get the layout ready for its final tapeout in CMOS 65nm technology.The DRAC project is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. The authors are part of RedRISCV which promotes activities around open hardware. The Lagarto Project is supported by the Research and Graduate Secretary (SIP) of the Instituto Politecnico Nacional (IPN) ´ from Mexico, and by the CONACyT scholarship for Center for Research in Computing (CIC-IPN).Peer ReviewedPostprint (author's final draft
    corecore