Search CORE

7 research outputs found

Using Efficient Path Profiling to Optimize Memory Consumption of On-Chip Debugging for High-Level Synthesis

Author: Ferrandi Fabrizio
Fezzardi Pietro
Lattuada Marco
Publication venue
Publication date: 01/01/2017
Field of study

High-Level Synthesis (HLS) for FPGAs is attracting popularity and is increasingly used to handle complex systems with multiple integrated components. To increase performance and efficiency, HLS flows now adopt several advanced optimization techniques. Aggressive optimizations and system level integration can cause the introduction of bugs that are only observable on-chip. Debugging support for circuits generated with HLS is receiving a considerable attention. Among the data that can be collected on chip for debugging, one of the most important is the state of the Finite State Machines (FSM) controlling the components of the circuit. However, this usually requires a large amount of memory to trace the behavior during the execution. This work proposes an approach that takes advantage of the HLS information and of the structure of the FSM to compress control flow traces and to integrate optimized components for on-chip debugging. The generated checkers analyze the FSM execution on-fly, automatically notifying when a bug is detected, localizing it and providing data about its cause. The traces are compressed using a software profiling technique, called Efficient Path Profiling (EPP), adapted for the debugging of hardware accelerators generated with HLS. With this technique, the size of the memory used to store control flow traces can be reduced up to 2 orders of magnitude, compared to state-of-the-art

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

FPGAs for the Masses: Affordable Hardware Synthesis from Domain-Specific Languages

Author: George Nithin
Publication venue: Lausanne, EPFL
Publication date: 25/05/2016
Field of study

Field Programmable Gate Arrays (FPGAs) have the unique ability to be configured into application-specific architectures that are well suited to specific computing problems. This enables them to achieve performances and energy efficiencies that outclass other processor-based architectures, such as Chip Multiprocessors (CMPs), Graphic Processing Units (GPUs) and Digital Signal Processors (DSPs). Despite this, FPGAs are yet to gain widespread adoption, especially among application and software developers, because of their laborious application development process that requires hardware design expertise. In some application areas, domain-specific hardware synthesis tools alleviate this problem by using a Domain-Specific Language (DSL) to hide the low-level hardware details and also improve productivity of the developer. Additionally, these tools leverage domain knowledge to perform optimizations and produce high-quality hardware designs. While this approach holds great promise, the significant effort and cost of developing such domain-specific tools make it unaffordable in many application areas. In this thesis, we develop techniques to reduce the effort and cost of developing domain-specific hardware synthesis tools. To demonstrate our approach, we develop a toolchain to generate complete hardware systems from high-level functional specifications written in a DSL. Firstly, our approach uses language embedding and type-directed staging to develop a DSL and compiler in a cost-effective manner. To further reduce effort, we develop this compiler by composing reusable optimization modules, and integrate it with existing hardware synthesis tools. However, most synthesis tools require users to have hardware design knowledge to produce high-quality results. Therefore, secondly, to facilitate people without hardware design skills to develop domain-specific tools, we develop a methodology to generate high-quality hardware designs from well known computational patterns, such as map, zipWith, reduce and foreach; computational patterns are algorithmic methods that capture the nature of computation and communication and can be easily understood and used without expert knowledge. In our approach, we decompose the DSL specifications into constituent computational patterns and exploit the properties of these patterns, such as degree of parallelism, interdependence between operations and data-access characteristics, to generate high-quality hardware modules to implement them, and compose them into a complete system design. Lastly, we extended our methodology to automatically parallelize computations across multiple hardware modules to benefit from the spatial parallelism of the FPGA as well as overcome performance problems caused by non-sequential data access patterns and long access latency to external memory. To achieve this, we utilize the data-access properties of the computational patterns to automatically identify synchronization requirements and generate such multi-module designs from the same high-level functional specifications. Driven by power and performance constraints, today the world is turning to reconfigurable technology (i.e., FPGAs) to meet the computational needs of tomorrow. In this light, this work addresses the cardinal problem of making tomorrow's computing infrastructure programmable to application developers

Infoscience - École polytechnique fédérale de Lausanne

Recommended from our members

Performance Debugging Frameworks for FPGA High-Level Synthesis

Author: Choi Young-kyu
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Using high-level synthesis (HLS) tools for field-programmable gate array (FPGA) design is becoming an increasingly popular choice because HLS tools can generate a high-quality design in a short development time. However, current HLS tools still cannot adequately support users in understanding and fixing the performance issues of the current design. That is, current HLS tools lack in performance debugging capability. Previous work on performance debugging automates the process of inserting hardware monitors in low-level register-transfer level (RTL) languages which limits the comprehensibility of the obtained result. Instead, our HLS-based flows offer analysis on a function or loop level and provide more intuitive feedback that can be used to pinpoint the performance bottleneck of a design. In this dissertation, we present a collection of HLS-based debugging frameworks for various purposes and characteristics of the design. First, we address the problem in the HLS synthesis step, where an inaccurate cycle estimation is provided if the program has input-dependent behavior. We propose a new performance estimator that automatically instruments code that models the hardware execution behavior and interprets the information from the HLS software simulation. However, the performance estimation result of this flow may not be accurate for a type of designs that cannot be simulated correctly by existing HLS software simulators. To handle such cases, we propose a new software simulator that provides cycle-accurate result based on the HLS scheduling information. If the input dataset is not available for software simulation or high-level models do not exist for all components of the FPGA design, we also present an on-board monitoring flow for automated cycle extraction and stall analysis. Finally, we address the needs of HLS programmers to automatically find the best set of directives for FPGA designs. We propose a design space exploration (DSE) framework to optimize applications with variable loop bounds in Polybench benchmark. A quantitative comparison among the proposed frameworks is shown using the sparse matrix-vector multiplication benchmark

eScholarship - University of California

Automated Debugging Methodology for FPGA-based Systems

Author: Khan Habib ul Hasan
Publication venue
Publication date: 30/12/2019
Field of study

Electronic devices make up a vital part of our lives. These are seen from mobiles, laptops, computers, home automation, etc. to name a few. The modern designs constitute billions of transistors. However, with this evolution, ensuring that the devices fulfill the designer’s expectation under variable conditions has also become a great challenge. This requires a lot of design time and effort. Whenever an error is encountered, the process is re-started. Hence, it is desired to minimize the number of spins required to achieve an error-free product, as each spin results in loss of time and effort. Software-based simulation systems present the main technique to ensure the verification of the design before fabrication. However, few design errors (bugs) are likely to escape the simulation process. Such bugs subsequently appear during the post-silicon phase. Finding such bugs is time-consuming due to inherent invisibility of the hardware. Instead of software simulation of the design in the pre-silicon phase, post-silicon techniques permit the designers to verify the functionality through the physical implementations of the design. The main benefit of the methodology is that the implemented design in the post-silicon phase runs many order-of-magnitude faster than its counterpart in pre-silicon. This allows the designers to validate their design more exhaustively. This thesis presents five main contributions to enable a fast and automated debugging solution for reconfigurable hardware. During the research work, we used an obstacle avoidance system for robotic vehicles as a use case to illustrate how to apply the proposed debugging solution in practical environments. The first contribution presents a debugging system capable of providing a lossless trace of debugging data which permits a cycle-accurate replay. This methodology ensures capturing permanent as well as intermittent errors in the implemented design. The contribution also describes a solution to enhance hardware observability. It is proposed to utilize processor-configurable concentration networks, employ debug data compression to transmit the data more efficiently, and partially reconfiguring the debugging system at run-time to save the time required for design re-compilation as well as preserve the timing closure. The second contribution presents a solution for communication-centric designs. Furthermore, solutions for designs with multi-clock domains are also discussed. The third contribution presents a priority-based signal selection methodology to identify the signals which can be more helpful during the debugging process. A connectivity generation tool is also presented which can map the identified signals to the debugging system. The fourth contribution presents an automated error detection solution which can help in capturing the permanent as well as intermittent errors without continuous monitoring of debugging data. The proposed solution works for designs even in the absence of golden reference. The fifth contribution proposes to use artificial intelligence for post-silicon debugging. We presented a novel idea of using a recurrent neural network for debugging when a golden reference is present for training the network. Furthermore, the idea was also extended to designs where golden reference is not present

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa