899 research outputs found

    In-FPGA instrumentation framework for openCL-based designs

    Get PDF
    ABSTRACT: The productivity achieved when developing applications on high-performance reconfigurable heterogeneous computing (HPRHC) systems is increased by using the Open Computing Language (OpenCL). However, the hardware produced by OpenCL compilers in field-programmable gate arrays (FPGAs) can result in severe performance bottlenecks that are challenging to solve. The problem is compounded by the fact that the generated netlist details are disorganized, making them mostly unreadable and only partially visible to designers. This paper proposes an in-FPGA instrumentation method and a new framework for extracting the FPGA-cycle-accurate timing performances of OpenCL-based designs. The results clearly show that the chosen execution model for OpenCL-based designs strongly affects the timing performance when it is not properly implemented. Our framework is implemented on an HPRHC platform that contains a CPU and two Arria10 FPGAs, and it is evaluated with a wide variety of benchmarks with different complexities. After testing on the reported benchmarks, the average logic overhead for one inserted instrument is 0.2 % of the total amount of adaptive look-up tables (ALUTs) and 0.1 % of the total registers in an FPGA. This resource utilization is between 1.5 and six times lower than those reported in the best previously published works. The scalability of the framework is also evaluated by inserting up to 50 instruments. The experimental results show that the average logic utilization per instrument is 0.19 % of the ALUTs and 0.17 % of the registers in the FPGA when 50 instruments are inserted

    Automated Debugging Methodology for FPGA-based Systems

    Get PDF
    Electronic devices make up a vital part of our lives. These are seen from mobiles, laptops, computers, home automation, etc. to name a few. The modern designs constitute billions of transistors. However, with this evolution, ensuring that the devices fulfill the designer’s expectation under variable conditions has also become a great challenge. This requires a lot of design time and effort. Whenever an error is encountered, the process is re-started. Hence, it is desired to minimize the number of spins required to achieve an error-free product, as each spin results in loss of time and effort. Software-based simulation systems present the main technique to ensure the verification of the design before fabrication. However, few design errors (bugs) are likely to escape the simulation process. Such bugs subsequently appear during the post-silicon phase. Finding such bugs is time-consuming due to inherent invisibility of the hardware. Instead of software simulation of the design in the pre-silicon phase, post-silicon techniques permit the designers to verify the functionality through the physical implementations of the design. The main benefit of the methodology is that the implemented design in the post-silicon phase runs many order-of-magnitude faster than its counterpart in pre-silicon. This allows the designers to validate their design more exhaustively. This thesis presents five main contributions to enable a fast and automated debugging solution for reconfigurable hardware. During the research work, we used an obstacle avoidance system for robotic vehicles as a use case to illustrate how to apply the proposed debugging solution in practical environments. The first contribution presents a debugging system capable of providing a lossless trace of debugging data which permits a cycle-accurate replay. This methodology ensures capturing permanent as well as intermittent errors in the implemented design. The contribution also describes a solution to enhance hardware observability. It is proposed to utilize processor-configurable concentration networks, employ debug data compression to transmit the data more efficiently, and partially reconfiguring the debugging system at run-time to save the time required for design re-compilation as well as preserve the timing closure. The second contribution presents a solution for communication-centric designs. Furthermore, solutions for designs with multi-clock domains are also discussed. The third contribution presents a priority-based signal selection methodology to identify the signals which can be more helpful during the debugging process. A connectivity generation tool is also presented which can map the identified signals to the debugging system. The fourth contribution presents an automated error detection solution which can help in capturing the permanent as well as intermittent errors without continuous monitoring of debugging data. The proposed solution works for designs even in the absence of golden reference. The fifth contribution proposes to use artificial intelligence for post-silicon debugging. We presented a novel idea of using a recurrent neural network for debugging when a golden reference is present for training the network. Furthermore, the idea was also extended to designs where golden reference is not present

    KAPow: A System Identification Approach to Online Per-Module Power Estimation in FPGA Designs

    Get PDF
    In a modern FPGA system-on-chip design, it is often insufficient to simply assess the total power consumption of the entire circuit by design-time estimation or runtime power rail measurement. Instead, to make better runtime decisions, it is desirable to understand the power consumed by each individual module in the system. In this work, we combine boardlevel power measurements with register-level activity counting to build an online model that produces a breakdown of power consumption within the design. Online model refinement avoids the need for a time-consuming characterisation stage and also allows the model to track long-term changes to operating conditions. Our flow is named KAPow, a (loose) acronym for ‘K’ounting Activity for Power estimation, which we show to be accurate, with per-module power estimates as close to ±5mW of true measurements, and to have low overheads. We also demonstrate an application example in which a permodule power breakdown can be used to determine an efficient mapping of tasks to modules and reduce system-wide power consumption by over 8%

    Energy Transparency for Deeply Embedded Programs

    Get PDF
    Energy transparency is a concept that makes a program's energy consumption visible, from hardware up to software, through the different system layers. Such transparency can enable energy optimizations at each layer and between layers, and help both programmers and operating systems make energy-aware decisions. In this paper, we focus on deeply embedded devices, typically used for Internet of Things (IoT) applications, and demonstrate how to enable energy transparency through existing Static Resource Analysis (SRA) techniques and a new target-agnostic profiling technique, without hardware energy measurements. Our novel mapping technique enables software energy consumption estimations at a higher level than the Instruction Set Architecture (ISA), namely the LLVM Intermediate Representation (IR) level, and therefore introduces energy transparency directly to the LLVM optimizer. We apply our energy estimation techniques to a comprehensive set of benchmarks, including single- and also multi-threaded embedded programs from two commonly used concurrency patterns, task farms and pipelines. Using SRA, our LLVM IR results demonstrate a high accuracy with a deviation in the range of 1% from the ISA SRA. Our profiling technique captures the actual energy consumption at the LLVM IR level with an average error of 3%.Comment: 33 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1510.0709

    Validation and verification of the interconnection of hardware intellectual property blocks for FPGA-based packet processing systems

    Get PDF
    As networks become more versatile, the computational requirement for supporting additional functionality increases. The increasing demands of these networks can be met by Field Programmable Gate Arrays (FPGA), which are an increasingly popular technology for implementing packet processing systems. The fine-grained parallelism and density of these devices can be exploited to meet the computational requirements and implement complex systems on a single chip. However, the increasing complexity of FPGA-based systems makes them susceptible to errors and difficult to test and debug. To tackle the complexity of modern designs, system-level languages have been developed to provide abstractions suited to the domain of the target system. Unfortunately, the lack of formality in these languages can give rise to errors that are not caught until late in the design cycle. This thesis presents three techniques for verifying and validating FPGA-based packet processing systems described in a system-level description language. First, a type system is applied to the system description language to detect errors before implementation. Second, system-level transaction monitoring is used to observe high-level events on-chip following implementation. Third, the high-level information embodied in the system description language is exploited to allow the system to be automatically instrumented for on-chip monitoring. This thesis demonstrates that these techniques catch errors which are undetected by traditional verification and validation tools. The locations of faults are specified and errors are caught earlier in the design flow, which saves time by reducing synthesis iterations

    FAUST Domain Specific Audio DSP Language Compiled to WebAssembly

    Get PDF
    International audienceThis paper demonstrates how FAUST, a functional programming language for sound synthesis and audio processing, can be used to develop efficient audio code for the Web. After a brief overview of the language, its compiler and the architecture system allowing to deploy the same program as a variety of targets, the generation of WebAssembly code and the deployment of specialized WebAudio nodes will be explained. Several use cases will be presented. Extensive benchmarks to compare the performance of native and WebAssembly versions of the same set of DSP have be done and will be commente

    Towards an embedded board-level tester: study of a configurable test processor

    Get PDF
    The demand for electronic systems with more features, higher performance, and less power consumption increases continuously. This is a real challenge for design and test engineers because they have to deal with electronic systems with ever-increasing complexity maintaining production and test costs low and meeting critical time to market deadlines. For a test engineer working at the board-level, this means that manufacturing defects must be detected as soon as possible and at a low cost. However, the use of classical test techniques for testing modern printed circuit boards is not sufficient, and in the worst case these techniques cannot be used at all. This is mainly due to modern packaging technologies, a high device density, and high operation frequencies of modern printed circuit boards. This leads to very long test times, low fault coverage, and high test costs. This dissertation addresses these issues and proposes an FPGA-based test approach for printed circuit boards. The concept is based on a configurable test processor that is temporarily implemented in the on-board FPGA and provides the corresponding mechanisms to communicate to external test equipment and co-processors implemented in the FPGA. This embedded test approach provides the flexibility to implement test functions either in the external test equipment or in the FPGA. In this manner, tests are executed at-speed increasing the fault coverage, test times are reduced, and the test system can be adapted automatically to the properties of the FPGA and devices located on the board. An essential part of the FPGA-based test approach deals with the development of a test processor. In this dissertation the required properties of the processor are discussed, and it is shown that the adaptation to the specific test scenario plays a very important role for the optimization. For this purpose, the test processor is equipped with configuration parameters at the instruction set architecture and microarchitecture level. Additionally, an automatic generation process for the test system and for the computation of some of the processor’s configuration parameters is proposed. The automatic generation process uses as input a model known as the device under test model (DUT-M). In order to evaluate the entire FPGA-based test approach and the viability of a processor for testing printed circuit boards, the developed test system is used to test interconnections to two different devices: a static random memory (SRAM) and a liquid crystal display (LCD). Experiments were conducted in order to determine the resource utilization of the processor and FPGA-based test system and to measure test time when different test functions are implemented in the external test equipment or the FPGA. It has been shown that the introduced approach is suitable to test printed circuit boards and that the test processor represents a realistic alternative for testing at board-level.Der Bedarf an elektronischen Systemen mit zusätzlichen Merkmalen, höherer Leistung und geringerem Energieverbrauch nimmt ständig zu. Dies stellt eine erhebliche Herausforderung für Entwicklungs- und Testingenieure dar, weil sie sich mit elektronischen Systemen mit einer steigenden Komplexität zu befassen haben. Außerdem müssen die Herstellungs- und Testkosten gering bleiben und die Produkteinführungsfristen so kurz wie möglich gehalten werden. Daraus folgt, dass ein Testingenieur, der auf Leiterplatten-Ebene arbeitet, die Herstellungsfehler so früh wie möglich entdecken und dabei möglichst niedrige Kosten verursachen soll. Allerdings sind die klassischen Testmethoden nicht in der Lage, die Anforderungen von modernen Leiterplatten zu erfüllen und im schlimmsten Fall können diese Testmethoden überhaupt nicht verwendet werden. Dies liegt vor allem an modernen Gehäuse-Technologien, der hohen Bauteildichte und den hohen Arbeitsfrequenzen von modernen Leiterplatten. Das führt zu sehr langen Testzeiten, geringer Testabdeckung und hohen Testkosten. Die Dissertation greift diese Problematik auf und liefert einen FPGA-basierten Testansatz für Leiterplatten. Das Konzept beruht auf einem konfigurierbaren Testprozessor, welcher im On-Board-FPGA temporär implementiert wird und die entsprechenden Mechanismen für die Kommunikation mit der externen Testeinrichtung und Co-Prozessoren im FPGA bereitstellt. Dadurch ist es möglich Testfunktionen flexibel entweder auf der externen Testeinrichtung oder auf dem FPGA zu implementieren. Auf diese Weise werden Tests at-speed ausgeführt, um die Testabdeckung zu erhöhen. Außerdem wird die Testzeit verkürzt und das Testsystem automatisch an die Eigenschaften des FPGAs und anderer Bauteile auf der Leiterplatte angepasst. Ein wesentlicher Teil des FPGA-basierten Testansatzes umfasst die Entwicklung eines Testprozessors. In dieser Dissertation wird über die benötigten Eigenschaften des Prozessors diskutiert und es wird gezeigt, dass die Anpassung des Prozessors an den spezifischen Testfall von großer Bedeutung für die Optimierung ist. Zu diesem Zweck wird der Prozessor mit Konfigurationsparametern auf der Befehlssatzarchitektur-Ebene und Mikroarchitektur-Ebene ausgerüstet. Außerdem wird ein automatischer Generierungsprozess für die Realisierung des Testsystems und für die Berechnung einer Untergruppe von Konfigurationsparametern des Prozessors vorgestellt. Der automatische Generierungsprozess benutzt als Eingangsinformation ein Modell des Prüflings (device under test model, DUT-M). Das entwickelte Testsystem wurde zum Testen von Leiterplatten für Verbindungen zwischen dem FPGA und zwei Bauteilen verwendet, um den FPGA-basierten Testansatz und die Durchführbarkeit des Testprozessors für das Testen auf Leiterplatte-Ebene zu evaluieren. Die zwei Bauteile sind ein Speicher mit direktem Zugriff (static random-access memory, SRAM) und eine Flüssigkristallanzeige (liquid crystal display, LCD). Die Experimente wurden durchgeführt, um den Ressourcenverbrauch des Prozessors und Testsystems festzustellen und um die Testzeit zu messen. Dies geschah durch die Implementierung von unterschiedlichen Testfunktionen auf der externen Testeinrichtung und dem FPGA. Dadurch konnte gezeigt werden, dass der FPGA-basierte Ansatz für das Testen von Leiterplatten geeignet ist und dass der Testprozessor eine realistische Alternative für das Testen auf Leiterplatten-Ebene ist
    corecore