Search CORE

1,169 research outputs found

Instruction-Level Abstraction (ILA): A Uniform Specification for System-on-Chip (SoC) Verification

Author: Gupta Aarti
Huang Bo-Yuan
Malik Sharad
Subramanyan Pramod
Vizel Yakir
Zhang Hongce
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/06/2018
Field of study

Modern Systems-on-Chip (SoC) designs are increasingly heterogeneous and contain specialized semi-programmable accelerators in addition to programmable processors. In contrast to the pre-accelerator era, when the ISA played an important role in verification by enabling a clean separation of concerns between software and hardware, verification of these "accelerator-rich" SoCs presents new challenges. From the perspective of hardware designers, there is a lack of a common framework for the formal functional specification of accelerator behavior. From the perspective of software developers, there exists no unified framework for reasoning about software/hardware interactions of programs that interact with accelerators. This paper addresses these challenges by providing a formal specification and high-level abstraction for accelerator functional behavior. It formalizes the concept of an Instruction Level Abstraction (ILA), developed informally in our previous work, and shows its application in modeling and verification of accelerators. This formal ILA extends the familiar notion of instructions to accelerators and provides a uniform, modular, and hierarchical abstraction for modeling software-visible behavior of both accelerators and programmable processors. We demonstrate the applicability of the ILA through several case studies of accelerators (for image processing, machine learning, and cryptography), and a general-purpose processor (RISC-V). We show how the ILA model facilitates equivalence checking between two ILAs, and between an ILA and its hardware finite-state machine (FSM) implementation. Further, this equivalence checking supports accelerator upgrades using the notion of ILA compatibility, similar to processor upgrades using ISA compatibility.Comment: 24 pages, 3 figures, 3 table

arXiv.org e-Print Archive

Princeton University Open Access Repository

The Development of TIGRA: A Zero Latency Interface For Accelerator Communication in RISC-V Processors

Author: Green Wesley Brad
Publication venue: Clemson University Libraries
Publication date: 01/05/2022
Field of study

Field programmable gate arrays (FPGA) give developers the ability to design application specific hardware by means of software, providing a method of accelerating algorithms with higher power efficiency when compared to CPU or GPU accelerated applications. FPGA accelerated applications tend to follow either a loosely coupled or tightly coupled design. Loosely coupled designs often use OpenCL to utilize the FPGA as an accelerator much like a GPU, which provides a simplifed design flow with the trade-off of increased overhead and latency due to bus communication. Tightly coupled designs modify an existing CPU to introduce instruction set extensions to provide a minimal latency accelerator at the cost of higher programming effort to include the custom design. This dissertation details the design of the Tightly Integrated, Generic RISC-V Accelerator (TIGRA) interface which provides the benefits of both loosely and tightly coupled accelerator designs. TIGRA enabled designs incur zero latency with a simple-to-use interface that reduces programming effort when implementing custom logic within a processor. This dissertation shows the incorporation of TIGRA into the simple PicoRV32 processor, the highly customizable Rocket Chip generator, and the FPGA optimized Taiga processor. Each processor design is tested with AES 128-bit encryption and posit arithmetic to demonstrate TIGRA functionality. After a one time programming cost to incorporate a TIGRA interface into an existing processor, new functional units can be added with up to a 75% reduction in the lines of code required when compared to non-TIGRA enabled designs. Additionally, each functional unit created is co-compatible with each processor as the TIGRA interface remains constant between each design. The results prove that using the TIGRA interface introduces no latency and is capable of incorporating existing custom logic designs without modification for all three processors tested. When compared to the PicoRV32 coprocessor interface (PCPI), TIGRA coupled designs complete one clock cycle faster. Similarly, TIGRA outperforms the Rocket Chip custom coprocessor (RoCC) interface by an average of 6.875 clock cycles per instruction. The Taiga processor\u27s decoupled execution units allow for instructions to execute concurrently and uses a tag management system that is similar to out-of-order processors. The inclusion of the TIGRA interface within this processor abstracts the tag management from the user and demonstrates that the TIGRA interface can be applied to out-of-order processors. When coupled with partial reconfiguration, the flexibility and modularity of TIGRA drastically increases. By creating a reprogrammable region for the custom logic connected via TIGRA, users can swap out the connected design at runtime to customize the processor for a given application. Further, partial reconfiguration allows users to only compile the custom logic design as opposed to the entire CPU, resulting in an 18.1% average reduction of compilation during the design process in the case studies. Paired with the programming effort saved by using TIGRA, partial reconfiguration improves the time to design and test new functionality timelines for a processor

Clemson University: TigerPrints

Enhancing an Embedded Processor Core with a Cryptographic Unit for Performance and Security

Author: Kocabas Ovunc
Kocabaş Övünç
Savas Erkay
Savaş Erkay
Publication venue: IEEE Computer Society
Publication date: 18/09/2008
Field of study

We present a set of low-cost architectural enhancements to accelerate the execution of certain arithmetic operations common in cryptographic applications on an extensible embedded processor core. The proposed enhancements are generic in the sense that they can be beneficially applied in almost any RISC processor. We implemented the enhancements in form of a cryptographic unit (CU) that offers the programmer an extended instruction set. The CU features a 128-bit wide register file and datapath, which enables it to process 128-bit words and perform 128-bit loads/stores. We analyze the speed-up factors for some arithmetic operations and public-key cryptographic algorithms obtained through these enhancements. In addition, we evaluate the hardware overhead (i.e. silicon area) of integrating the CU into an embedded RISC processor. Our experimental results show that the proposed architectural enhancements allow for a significant performance gain for both RSA and ECC at the expense of an acceptable increase in silicon area. We also demonstrate that the proposed enhancements facilitate the protection of cryptographic algorithms against certain types of side-channel attacks and present an AES implementation hardened against cache-based attacks as a case study

Sabanci University Research Database

Transparent control flow transfer between CPU and Intel FPGAs

Author: Daniel Miranda Silva Malafaia Granhão
Publication venue
Publication date: 09/07/2019
Field of study

The possibility of accelerating software using dedicated hardware is one of the advantages that heterogeneous computing platforms provide, and can greatly increase computing efficiency. Despite being more efficient, applications need to be rewritten to effectively exploit the dedicated accelerators, using a complex and tedious process.The goal of this work is to research transparent mechanisms that would allow the transfer of the control flow between a CPU and an FPGA, which houses an accelerator. Such a mechanism could be coupled with transparent software profiling and translation to hardware, which would allow for regular software to take advantage of hardware acceleration in a transparent manner. Implementation is done over Intel's new Xeon+FPGA hybrid platform which combines a Xeon processor and an Arria 10 FPGA in the same package while sharing the main system memory.A prototype was achieved and tested on real hardware, in which the Linux ptrace system call is used to control the process to be accelerated and transfer its execution between the CPU and the FPGA. An AES encryption kernel was accelerated and speedups of 8x were recorded when large data sizes were used

Repositório Aberto da Universidade do Porto

Recommended from our members

Cryptoraptor : high throughput reconfigurable cryptographic processor for symmetric key encryption and cryptographic hash functions

Author: Sayilar Gokhan
Publication venue
Publication date: 03/02/2015
Field of study

textIn cryptographic processor design, the selection of functional primitives and connection structures between these primitives are extremely crucial to maximize throughput and flexibility. Hence, detailed analysis on the specifications and requirements of existing crypto-systems plays a crucial role in cryptographic processor design. This thesis provides the most comprehensive literature review that we are aware of on the widest range of existing cryptographic algorithms, their specifications, requirements, and hardware structures. In the light of this analysis, it also describes a high performance, low power, and highly flexible cryptographic processor, Cryptoraptor, that is designed to support both today's and tomorrow's encryption standards. To the best of our knowledge, the proposed cryptographic processor supports the widest range of cryptographic algorithms compared to other solutions in the literature and is the only crypto-specific processor targeting the future standards as well. Unlike previous work, we aim for maximum throughput for all known encryption standards, and to support future standards as well. Our 1GHz design achieves a peak throughput of 128Gbps for AES-128 which is competitive with ASIC designs and has 25X and 160X higher throughput per area than CPU and GPU solutions, respectively.Electrical and Computer Engineerin

Texas ScholarWorks