Search CORE

111 research outputs found

Automated Debugging Methodology for FPGA-based Systems

Author: Khan Habib ul Hasan
Publication venue
Publication date: 30/12/2019
Field of study

Electronic devices make up a vital part of our lives. These are seen from mobiles, laptops, computers, home automation, etc. to name a few. The modern designs constitute billions of transistors. However, with this evolution, ensuring that the devices fulfill the designer’s expectation under variable conditions has also become a great challenge. This requires a lot of design time and effort. Whenever an error is encountered, the process is re-started. Hence, it is desired to minimize the number of spins required to achieve an error-free product, as each spin results in loss of time and effort. Software-based simulation systems present the main technique to ensure the verification of the design before fabrication. However, few design errors (bugs) are likely to escape the simulation process. Such bugs subsequently appear during the post-silicon phase. Finding such bugs is time-consuming due to inherent invisibility of the hardware. Instead of software simulation of the design in the pre-silicon phase, post-silicon techniques permit the designers to verify the functionality through the physical implementations of the design. The main benefit of the methodology is that the implemented design in the post-silicon phase runs many order-of-magnitude faster than its counterpart in pre-silicon. This allows the designers to validate their design more exhaustively. This thesis presents five main contributions to enable a fast and automated debugging solution for reconfigurable hardware. During the research work, we used an obstacle avoidance system for robotic vehicles as a use case to illustrate how to apply the proposed debugging solution in practical environments. The first contribution presents a debugging system capable of providing a lossless trace of debugging data which permits a cycle-accurate replay. This methodology ensures capturing permanent as well as intermittent errors in the implemented design. The contribution also describes a solution to enhance hardware observability. It is proposed to utilize processor-configurable concentration networks, employ debug data compression to transmit the data more efficiently, and partially reconfiguring the debugging system at run-time to save the time required for design re-compilation as well as preserve the timing closure. The second contribution presents a solution for communication-centric designs. Furthermore, solutions for designs with multi-clock domains are also discussed. The third contribution presents a priority-based signal selection methodology to identify the signals which can be more helpful during the debugging process. A connectivity generation tool is also presented which can map the identified signals to the debugging system. The fourth contribution presents an automated error detection solution which can help in capturing the permanent as well as intermittent errors without continuous monitoring of debugging data. The proposed solution works for designs even in the absence of golden reference. The fifth contribution proposes to use artificial intelligence for post-silicon debugging. We presented a novel idea of using a recurrent neural network for debugging when a golden reference is present for training the network. Furthermore, the idea was also extended to designs where golden reference is not present

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Cycle-accurate modeling of multicore processors on FPGAs

Author: Khan Asif I. (Asif Imtiaz)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2013
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 169-176).We present a novel modeling methodology which enables the generation of a high-performance, cycle-accurate simulator from a cycle-level specification of the target design. We describe Arete, a full-system multicore processor simulator, developed using our modeling methodology. We provide details on Arete's resource-efficient and high-performance implementation on multiple FPGA platforms, and the architectural experiments performed using it. We present clear evidence that the use of simplified models in architectural studies can lead to wrong conclusions. Through two experiments performed using both cycle-accurate and simplified models, we show that on one hand there are substantial quantitative and qualitative differences in results, and on the other, the results match quite well.by Asif Imtiaz Khan.Ph.D

DSpace@MIT

In-FPGA instrumentation framework for openCL-based designs

Author: Bensalem Hachem
Blaquiere Yves
Savaria Yvon
Publication venue: IEEE
Publication date: 01/01/2020
Field of study

ABSTRACT: The productivity achieved when developing applications on high-performance reconfigurable heterogeneous computing (HPRHC) systems is increased by using the Open Computing Language (OpenCL). However, the hardware produced by OpenCL compilers in field-programmable gate arrays (FPGAs) can result in severe performance bottlenecks that are challenging to solve. The problem is compounded by the fact that the generated netlist details are disorganized, making them mostly unreadable and only partially visible to designers. This paper proposes an in-FPGA instrumentation method and a new framework for extracting the FPGA-cycle-accurate timing performances of OpenCL-based designs. The results clearly show that the chosen execution model for OpenCL-based designs strongly affects the timing performance when it is not properly implemented. Our framework is implemented on an HPRHC platform that contains a CPU and two Arria10 FPGAs, and it is evaluated with a wide variety of benchmarks with different complexities. After testing on the reported benchmarks, the average logic overhead for one inserted instrument is 0.2 % of the total amount of adaptive look-up tables (ALUTs) and 0.1 % of the total registers in an FPGA. This resource utilization is between 1.5 and six times lower than those reported in the best previously published works. The scalability of the framework is also evaluated by inserting up to 50 instruments. The experimental results show that the average logic utilization per instrument is 0.19 % of the ALUTs and 0.17 % of the registers in the FPGA when 50 instruments are inserted

PolyPublie

3D Execution Monitor (3D-EM): Using 3D Circuits to Detect Hardware Malicious Inclusions in General Purpose Processors

Author: Bilzor Michael
Publication venue
Publication date: 01/01/2011
Field of study

Best PhD Paper, Proceedings of the International Conference on Information Warfare and Security (ICIW), Washington, DC, USA, March 2011, Pages 289-298. [Paper] [Slides] [Abstract] [Conference

Calhoun, Institutional Archive of the Naval Postgraduate School

Recommended from our members

Logical partitioning of parallel system simulations

Author: Angepat Hari
Publication venue
Publication date: 10/10/2019
Field of study

Simulation has been a fundamental tool to prototype, hypothesize, and evaluate new ideas to continue improving system performance. However, increasing levels of processor parallelism and heterogeneity have introduced additional constraints when evaluating new designs. The work embodied in this dissertation explores how to leverage novel ideas in simulator partitioning to improve simulator speed and flexibility for simulating these new types of systems. The contribution of this work includes the introduction of optimistic partitioned simulation to improve parallelization, and the introduction of warped partitioned simulation for improved flexibility. These ideas are refined and demonstrated through the use of prototypes to demonstrate their benefits compared to state-of-the-art approaches. By leveraging partitioning in a structured manner, it is possible to design simulators that better address the open challenges of parallel and heterogeneous systems design.Electrical and Computer Engineerin

Texas ScholarWorks

Fuzz, Penetration, and AI Testing for SoC Security Verification: Challenges and Solutions

Author: Arash Vafaei
Fahim Rahman
Farimah Farahmandi
Hasan Al Shaikh
Kimia Zamiri Azar
Mark Tehranipoor
Muhammad Monir Hossain
Nurun N. Mondol
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 28/03/2022
Field of study

The ever-increasing usage and application of system-on-chips (SoCs) has resulted in the tremendous modernization of these architectures. For a modern SoC design, with the inclusion of numerous complex and heterogeneous intellectual properties (IPs), and its privacy-preserving declaration, there exists a wide variety of highly sensitive assets. These assets must be protected from any unauthorized access and against a diverse set of attacks. Attacks for obtaining such assets could be accomplished through different sources, including malicious IPs, malicious or vulnerable firmware/software, unreliable and insecure interconnection and communication protocol, and side-channel vulnerabilities through power/performance profiles. Any unauthorized access to such highly sensitive assets may result in either a breach of company secrets for original equipment manufactures (OEM) or identity theft for the end-user. Unlike the enormous advances in functional testing and verification of the SoC architecture, security verification is still on the rise, and little endeavor has been carried out by academia and industry. Unfortunately, there exists a huge gap between the modernization of the SoC architectures and their security verification approaches. With the lack of automated SoC security verification in modern electronic design automation (EDA) tools, we provide a comprehensive overview of the requirements that must be realized as the fundamentals of the SoC security verification process in this paper. By reviewing these requirements, including the creation of a unified language for SoC security verification, the definition of security policies, formulation of the security verification, etc., we put forward a realization of the utilization of self-refinement techniques, such as fuzz, penetration, and AI testing, for security verification purposes. We evaluate all the challenges and resolution possibilities, and we provide the potential approaches for the realization of SoC security verification via these self-refinement techniques

Cryptology ePrint Archive

Generation of Application Specific Hardware Extensions for Hybrid Architectures: The Development of PIRANHA - A GCC Plugin for High-Level-Synthesis

Author: Hempel Gerald
Publication venue
Publication date: 11/11/2019
Field of study

Architectures combining a field programmable gate array (FPGA) and a general-purpose processor on a single chip became increasingly popular in recent years. On the one hand, such hybrid architectures facilitate the use of application specific hardware accelerators that improve the performance of the software on the host processor. On the other hand, it obliges system designers to handle the whole process of hardware/software co-design. The complexity of this process is still one of the main reasons, that hinders the widespread use of hybrid architectures. Thus, an automated process that aids programmers with the hardware/software partitioning and the generation of application specific accelerators is an important issue. The method presented in this thesis neither requires restrictions of the used high-level-language nor special source code annotations. Usually, this is an entry barrier for programmers without deeper understanding of the underlying hardware platform. This thesis introduces a seamless programming flow that allows generating hardware accelerators for unrestricted, legacy C code. The implementation consists of a GCC plugin that automatically identifies application hot-spots and generates hardware accelerators accordingly. Apart from the accelerator implementation in a hardware description language, the compiler plugin provides the generation of a host processor interfaces and, if necessary, a prototypical integration with the host operating system. An evaluation with typical embedded applications shows general benefits of the approach, but also reveals limiting factors that hamper possible performance improvements

Technische Universität Dresden: Qucosa

High-level synthesis for FPGAs: From prototyping to deployment

Author: Fellow IEEE Bin Liu
Jason Cong
Kees Vissers
Member IEEE Juanjo Noguera
Member IEEE Zhiru Zhang
Stephen Neuendorffer
Publication venue
Publication date: 06/03/2020
Field of study

Abstract-Escalating System-on-Chip design complexity is pushing the design community to raise the level of abstraction beyond RTL. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS methodology is happening now, especially for FPGA designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper we use AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. Index Terms-Domain-specific design, field-programmable gate array (FPGA), high-level synthesis (HLS), quality of results (QoR)

CiteSeerX

High-Level Synthesis Based VLSI Architectures for Video Coding

Author: Ahmad Waqar
Publication venue: Politecnico di Torino
Publication date: 01/01/2017
Field of study

High Efficiency Video Coding (HEVC) is state-of-the-art video coding standard. Emerging applications like free-viewpoint video, 360degree video, augmented reality, 3D movies etc. require standardized extensions of HEVC. The standardized extensions of HEVC include HEVC Scalable Video Coding (SHVC), HEVC Multiview Video Coding (MV-HEVC), MV-HEVC+ Depth (3D-HEVC) and HEVC Screen Content Coding. 3D-HEVC is used for applications like view synthesis generation, free-viewpoint video. Coding and transmission of depth maps in 3D-HEVC is used for the virtual view synthesis by the algorithms like Depth Image Based Rendering (DIBR). As first step, we performed the profiling of the 3D-HEVC standard. Computational intensive parts of the standard are identified for the efficient hardware implementation. One of the computational intensive part of the 3D-HEVC, HEVC and H.264/AVC is the Interpolation Filtering used for Fractional Motion Estimation (FME). The hardware implementation of the interpolation filtering is carried out using High-Level Synthesis (HLS) tools. Xilinx Vivado Design Suite is used for the HLS implementation of the interpolation filters of HEVC and H.264/AVC. The complexity of the digital systems is greatly increased. High-Level Synthesis is the methodology which offers great benefits such as late architectural or functional changes without time consuming in rewriting of RTL-code, algorithms can be tested and evaluated early in the design cycle and development of accurate models against which the final hardware can be verified

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Online Timing Slack Measurement and its Application in Field-Programmable Gate Arrays

Author: Levine Joshua M.
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/02/2015
Field of study

Reliability, power consumption and timing performance are key concerns for today's integrated circuits. Measurement techniques capable of quantifying the timing characteristics of a circuit, while it is operating, facilitate a range of benefits. Delay variation due to environmental and operational conditions, and degradation can be monitored by tracking changes in timing performance. Using the measurements in a closed-loop to control power supply voltage or clock frequency allows for the reduction of timing safety margins, leading to improvements in power consumption or throughput performance through the exploitation of better-than worst-case operation. This thesis describes a novel online timing slack measurement method which can directly measure the timing performance of a circuit, accurately and with minimal overhead. Enhancements allow for the improvement of absolute accuracy and resolution. A compilation flow is reported that can automatically instrument arbitrary circuits on FPGAs with the measurement circuitry. On its own this measurement method is able to track the "health" of an integrated circuit, from commissioning through its lifetime, warning of impending failure or instigating pre-emptive degradation mitigation techniques. The use of the measurement method in a closed-loop dynamic voltage and frequency scaling scheme has been demonstrated, achieving significant improvements in power consumption and throughput performance.Open Acces

Spiral - Imperial College Digital Repository