Search CORE

330 research outputs found

Automated Debugging Methodology for FPGA-based Systems

Author: Khan Habib ul Hasan
Publication venue
Publication date: 30/12/2019
Field of study

Electronic devices make up a vital part of our lives. These are seen from mobiles, laptops, computers, home automation, etc. to name a few. The modern designs constitute billions of transistors. However, with this evolution, ensuring that the devices fulfill the designer’s expectation under variable conditions has also become a great challenge. This requires a lot of design time and effort. Whenever an error is encountered, the process is re-started. Hence, it is desired to minimize the number of spins required to achieve an error-free product, as each spin results in loss of time and effort. Software-based simulation systems present the main technique to ensure the verification of the design before fabrication. However, few design errors (bugs) are likely to escape the simulation process. Such bugs subsequently appear during the post-silicon phase. Finding such bugs is time-consuming due to inherent invisibility of the hardware. Instead of software simulation of the design in the pre-silicon phase, post-silicon techniques permit the designers to verify the functionality through the physical implementations of the design. The main benefit of the methodology is that the implemented design in the post-silicon phase runs many order-of-magnitude faster than its counterpart in pre-silicon. This allows the designers to validate their design more exhaustively. This thesis presents five main contributions to enable a fast and automated debugging solution for reconfigurable hardware. During the research work, we used an obstacle avoidance system for robotic vehicles as a use case to illustrate how to apply the proposed debugging solution in practical environments. The first contribution presents a debugging system capable of providing a lossless trace of debugging data which permits a cycle-accurate replay. This methodology ensures capturing permanent as well as intermittent errors in the implemented design. The contribution also describes a solution to enhance hardware observability. It is proposed to utilize processor-configurable concentration networks, employ debug data compression to transmit the data more efficiently, and partially reconfiguring the debugging system at run-time to save the time required for design re-compilation as well as preserve the timing closure. The second contribution presents a solution for communication-centric designs. Furthermore, solutions for designs with multi-clock domains are also discussed. The third contribution presents a priority-based signal selection methodology to identify the signals which can be more helpful during the debugging process. A connectivity generation tool is also presented which can map the identified signals to the debugging system. The fourth contribution presents an automated error detection solution which can help in capturing the permanent as well as intermittent errors without continuous monitoring of debugging data. The proposed solution works for designs even in the absence of golden reference. The fifth contribution proposes to use artificial intelligence for post-silicon debugging. We presented a novel idea of using a recurrent neural network for debugging when a golden reference is present for training the network. Furthermore, the idea was also extended to designs where golden reference is not present

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Non-Intrusive Online Timing Analysis of Large Embedded Applications

Author: Dreyer Boris
Hochberger Christian
Publication venue: OASIcs - OpenAccess Series in Informatics. 19th International Workshop on Worst-Case Execution Time Analysis (WCET 2019)
Publication date: 01/01/2019
Field of study

A thorough understanding of the timing behavior of embedded systems software has become very important. With the advent of ever more complex embedded software e.g. in autonomous driving, the size of this software is growing at a fast pace. Execution time profiles (ETP) have proven to be a useful way to understand the timing behavior of embedded software. Collecting these ETPs was either limited to small applications or required multiple runs of the same software for calibration processes. In this contribution, we present a novel method for collecting ETPs in a single shot of the software at very high quality even for large applications

TUbiblio

Dagstuhl Research Online Publication Server

FPGA ARCHITECTURE AND VERIFICATION OF BUILT IN SELF-TEST (BIST) FOR 32-BIT ADDER/SUBTRACTER USING DE0-NANO FPGA AND ANALOG DISCOVERY 2 HARDWARE

Author: Hamzic Nedim
Publication venue: RIT Scholar Works
Publication date: 01/08/2021
Field of study

The integrated circuit (IC) is an integral part of everyday modern technology, and its application is very attractive to hardware and software design engineers because of its versatility, integration, power consumption, cost, and board area reduction. IC is available in various types such as Field Programming Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), System on Chip (SoC) architecture, Digital Signal Processing (DSP), microcontrollers (μC), and many more. With technology demand focused on faster, low power consumption, efficient IC application, design engineers are facing tremendous challenges in developing and testing integrated circuits that guaranty functionality, high fault coverage, and reliability as the transistor technology is shrinking to the point where manufacturing defects of ICs are affecting yield which associates with the increased cost of the part. The competitive IC market is pressuring manufactures of ICs to develop and market IC in a relatively quick turnaround which in return requires design and verification engineers to develop an integrated self-test structure that would ensure fault-free and the quality product is delivered on the market. 70-80% of IC design is spent on verification and testing to ensure high quality and reliability for the enduser. To test complex and sophisticated IC designs, the verification engineers must produce laborious and costly test fixtures which affect the cost of the part on the competitive market. To avoid increasing the part cost due to yield and test time to the end-user and to keep up with the competitive market many IC design engineers are deviating from complex external test fixture approach and are focusing on integrating Built-in Self-Test (BIST) or Design for Test (DFT) techniques onto IC’s which would reduce time to market but still guarantee high coverage for the product. Understanding the BIST, the architecture, as well as the application of IC, must be understood before developing IC. The architecture of FPGA is elaborated in this paper followed by several BIST techniques and applications of those BIST relative to FPGA, SoC, analog to digital (ADC), or digital to analog converters (DAC) that are integrated on IC. Paper is concluded with verification of BIST for the 32-bit adder/subtracter designed in Quartus II software using the Analog Discovery 2 module as stimulus and DE0-NANO FPGA board for verification

RIT Scholar Works

Using SRAM Based FPGAs for Power-Aware High Performance Wireless Sensor Networks

Author: Akyildiz
Andres Otero
Bellis
Cantoni
Corke
Dilmaghani
Eduardo de la Torre
Fornaciari
He
Jorge Portilla
Juan Valverde
Krasteva
Miguel Lopez
Portilla
Teresa Riesgo
Tseng
Wu
Yong
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/02/2012
Field of study

While for years traditional wireless sensor nodes have been based on ultra-low power microcontrollers with sufficient but limited computing power, the complexity and number of tasks of today’s applications are constantly increasing. Increasing the node duty cycle is not feasible in all cases, so in many cases more computing power is required. This extra computing power may be achieved by either more powerful microcontrollers, though more power consumption or, in general, any solution capable of accelerating task execution. At this point, the use of hardware based, and in particular FPGA solutions, might appear as a candidate technology, since though power use is higher compared with lower power devices, execution time is reduced, so energy could be reduced overall. In order to demonstrate this, an innovative WSN node architecture is proposed. This architecture is based on a high performance high capacity state-of-the-art FPGA, which combines the advantages of the intrinsic acceleration provided by the parallelism of hardware devices, the use of partial reconfiguration capabilities, as well as a careful power-aware management system, to show that energy savings for certain higher-end applications can be achieved. Finally, comprehensive tests have been done to validate the platform in terms of performance and power consumption, to proof that better energy efficiency compared to processor based solutions can be achieved, for instance, when encryption is imposed by the application requirements

Multidisciplinary Digital Publishing Institute

Crossref

Directory of Open Access Journals

PubMed Central

Real-Time Lossless Compression of SoC Trace Data

Author: Zhang Jing
Publication venue: Lunds universitet/Institutionen för elektro- och informationsteknik
Publication date: 01/01/2015
Field of study

Nowadays, with the increasing complexity of System-on-Chip (SoC), traditional debugging approaches are not enough in multi-core architecture systems. Hardware tracing becomes necessary for performance analysis in these systems. The problem is that the size of collected trace data through hardware-based tracing techniques is usually extremely large due to the increasing complexity of System-on-Chips. Hence on-chip trace compression performed in hardware is needed to reduce the amount of transferred or stored data. In this dissertation, the feasibility of different types of lossless data compression algorithms in hardware implementation are investigated and examined. A lossless data compression algorithm LZ77 is selected, analyzed, and optimized to Nexus traces data. In order to meet the hardware cost and compression performances requirements for the real-time compression, an optimized LZ77 compression algorithm is proposed based on the characteristics of Nexus trace data. This thesis presents a hardware implementation of LZ77 encoder described in Very High Speed Integrated Circuit Hardware Description Language (VHDL). Test results demonstrate that the compression speed can achieve16 bits/clock cycle and the average compression ratio is 1.35 for the minimal hardware cost case, which is a suitable trade-off between the hardware cost and the compression performances effectively

Continuous Non-Intrusive Hybrid WCET Estimation Using Waypoint Graphs

Author: Dreyer Boris
Hochberger Christian
Lange Alexander
Wegener Simon
Weiss Alexander
Publication venue: OASIcs - OpenAccess Series in Informatics. 16th International Workshop on Worst-Case Execution Time Analysis (WCET 2016)
Publication date: 01/01/2016
Field of study

Traditionally, the Worst-Case Execution Time (WCET) of Embedded Software has been estimated using analytical approaches. This is effective, if good models of the processor/System-on-Chip (SoC) architecture exist. Unfortunately, modern high performance SoCs often contain unpredictable and/or undocumented components that influence the timing behaviour. Thus, analytical results for such processors are unrealistically pessimistic. One possible alternative approach seems to be hybrid WCET analysis, where measurement data together with an analytical approach is used to estimate worst-case behaviour. Previously, we demonstrated how continuous evaluation of basic block trace data can be used to produce detailed statistics of basic blocks in embedded software. In the meantime it has become clear that the trace data provided by modern SoCs delivers a different type of information. In this contribution, we show that even under realistic conditions, a meaningful analysis can be conducted with the trace data

TUbiblio

Dagstuhl Research Online Publication Server

Virtual Timing Isolation Safety-Net for Multicore Processors

Author: Freitag Johannes
Publication venue
Publication date: 06/08/2020
Field of study

Multicore processors promise to offer the performance as well as the reduced space, weight and power needed by future aircrafts. However, commercial off-the-shelf multicore processors suffer from timing interferences between cores which complicates applying them in hard real-time systems like avionic applications. In this thesis, a safety-net system is proposed which enables a virtual timing isolation of applications running on one core from all other cores. The technique is based on hardware external to the multicore processor and completely transparent to the applications, i.e. no modification of the observed software is necessary. The basic idea is to apply a single-core execution based worst-case execution time analysis and to accept a predefined slowdown during multicore execution. If the slowdown exceeds the acceptable bounds, interferences will be reduced by controlling the behavior of low-critical cores to keep the main application’s progress inside the given bounds. Measuring the progress of the applications running on the main core is performed by tracking the application’s fingerprint. A fingerprint is created by extraction of the performance counters of the critical core in very small timesteps which results in a characteristic curve for every execution of a periodic program. In standalone mode, without any running applications on the other cores, a model of an application is created by clustering and combining the extracted curves. During runtime, the extracted performance counter values are compared to the model to determine the progress of the critical application. In case the progress of an application is unacceptably delayed, the cores creating the interferences are throttled. The interference creating cores are determined by the accesses of the respective cores to the shared resources. A controller that takes the progress of a critical application as well as the time until the final deadline into account throttles the low priority cores. Throttling is either performed by frequency scaling of the interfering cores or by halt and continue with a pulse width modulation scheme. The complete safety-net system was evaluated on a TACLeBench benchmark running on an NXP P4080 multicore processor observed by a Xilinx FPGA implementing a MicroBlaze soft-core microcontroller. The results show that the progress can be measured by the fingerprinting with a final deviation of less than 1% for a TACLeBench execution with running opponent cores and indicate the non-intrusiveness of the approach. Several experiments are conducted to demonstrate the effectiveness of the different throttling mechanisms. Evaluations using a real-world avionic application show that the approach can be applied to integrated modular avionic applications. The safety-net does not ensure robust partitioning in the conventional meaning. The applications on the different cores can influence each other in the timing domain, but the external safety-net ensures that the interference on the high critical application is low enough to keep the timing. This allows for an efficient utilization of the multicore processor. Every critical application is treated individually, and by relying on individual models recorded in standalone mode, the critical as well as the non-critical applications running on the other cores can be exchanged without recreating a fingerprint model. This eases the porting of legacy applications to the multicore processor and allows the exchange of applications without recertification.Der Einsatz von Multicore Prozessoren in Avioniksystemen verspricht sowohl die Performancesteigerung als auch den reduzierten Platz-, Gewichts- und Energieverbrauch, der zur Realisierung von zukünftigen Flugzeugen benötigt wird. Die Verwendung von seriengefertigten (COTS) Multicore Prozessoren in sicherheitskritischen Echtzeitsystemen ist jedoch sehr komplex, da eine gegenseitige zeitliche Beeinflussung der Anwendungen auf den unterschiedlichen Kernen nicht ausgeschlossen werden kann. In dieser Arbeit wird ein Konzept vorgestellt, das eine virtuelle zeitliche Trennung der Anwendungen, die auf einem Prozessorkern ausgeführt werden, von denen der übrigen Kerne ermöglicht. Die Grundidee besteht darin, eine auf einer Single-Core-Ausführung basierende Laufzeitanalyse (WCET) durchzuführen und eine vordefinierte Verlangsamung während der Multicore-Ausführung zu akzeptieren. Wenn die Verlangsamung die zulässige Grenze überschreitet, wird das Verhalten niedrigkritischer Kerne so gesteuert, dass der Fortschritt der Hauptanwendung innerhalb der Deadlines bleibt. Die Bestimmung des Fortschritts der kritischen Anwendungen erfolgt durch das Verfolgen eines sogenannten Fingerprints. Ein Fingerprint wird durch Auslesen der Performance Counter des kritischen Kerns in sehr kleinen Zeitschritten erzeugt, was zu einer charakteristischen Kurve für jede Ausführung eines periodischen Programms führt. Ein Modell einer Anwendung wird erstellt, indem die extrahierten Kurven gruppiert und kombiniert werden. Während der Laufzeit werden die ausgelesenen Werte mit dem Modell verglichen, um den Fortschritt zu bestimmen. Falls die zeitliche Ausführung einer ktitischen Anwendung zu stark verzögert wird, werden die Kerne gedrosselt, welche die Störungen verursachen. Das Konzept wurde mit einem TACLeBench-Benchmark evaluiert, der auf einem NXP P4080 Multicore Prozessor ausgefüht, und von einem Xilinx-FPGA beobachtet wurde. Es konnte gezeigt werden, dass der Fortschritt durch den Fingerprint mit einer endgültigen Abweichung von weniger als 1% für eine TACLeBench-Ausführung mit laufenden konkurrierenden Kernen gemessen werden kann. Die Evaluation mit einer realen Avionik-Anwendung zeigte, dass das Konzept für integrierte modulare Avionik-Anwendungen (IMA) genutzt werden kann. Der Ansatz gewährleistet keine robuste Partitionierung im herkömmlichen Sinne. Die Anwendungen auf den verschiedenen Kernen können sich zeitlich gegenseitig beeinflussen, aber ein externes Sicherheitsnetz stellt sicher, dass die Verlangsamung der hochkritischen Anwendung niedrig genug ist, um die Deadlines zu halten. Dies ermöglicht eine effiziente Auslastung des Multicore Prozessors. Außerdem wird jede kritische Anwendung einzeln behandelt und verfügt über ein individuelles Modell. Somit können die kritischen und nicht kritischen Anwendungen, die auf den anderen Kernen ausgeführt werden, ausgetauscht werden, ohne ein Modell neu zu erstellen. Dies vereinfacht die Portierung von bestehenden Anwendungen auf Multicore Prozessoren und ermöglicht den Austausch von Anwendungen ohne eine erneute Zertifizierung

OPUS Augsburg

Using embedded hardware monitor cores in critical computer systems

Author: Nikolaos Bartzoudis (7201928)
Publication venue
Publication date: 01/01/2006
Field of study

The integration of FPGA devices in many different architectures and services makes monitoring and real time detection of errors an important concern in FPGA system design. A monitor is a tool, or a set of tools, that facilitate analytic measurements in observing a given system. The goal of these observations is usually the performance analysis and optimisation, or the surveillance of the system. However, System-on-Chip (SoC) based designs leave few points to attach external tools such as logic analysers. Thus, an embedded error detection core that allows observation of critical system nodes (such as processor cores and buses) should enforce the operation of the FPGA-based system, in order to prevent system failures. The core should not interfere with system performance and must ensure timely detection of errors. This thesis is an investigation onto how a robust hardware-monitoring module can be efficiently integrated in a target PCI board (with FPGA-based application processing features) which is part of a critical computing system. [Continues.

Loughborough University Institutional Repository

IMPLEMENTASI HEVC CODEC PADA PLATFORM BERBASIS FPGA

Author: PERMATA OKTAVIA AYU
Publication venue
Publication date: 01/03/2016
Field of study

High Efficiency Video Coding (HEVC) telah di desain sebagai standar baru untuk beberapa aplikasi video dan memiliki peningkatan performa dibanding dengan standar sebelumnya. Meskipun HEVC mencapai efisiensi coding yang tinggi, namun HEVC memiliki kekurangan pada beban pemrosesan tinggi dan loading yang berat ketika melakukan proses encoding video. Untuk meningkatkan performa encoder, kami bertujuan untuk mengimplementasikan HEVC codec pada Zynq 7000 AP SoC. Kami mencoba mengimplementasikan HEVC menggunakan tiga desain sistem. Pertama, HEVC codec di implementasikan pada Zynq PS. Kedua, encoder HEVC di implementasikan dengan hardware/software co-design. Ketiga, mengimplementasikan sebagian dari encoder HEVC pada Zynq PL. Pada implementasi kami menggunakan Xilinx Vivado HLS untuk mengembangkan codec. Hasil menunjukkan bahwa HEVC codec dapat di implementasikan pada Zynq PS. Codec dapat mengurangi ukuran video dibanding ukuran asli video pada format H.264. Kualitas video hampir sama dengan format H.264. Sayangnya, kami tidak dapat menyelesaikan desain dengan hardware/software co-design karena kompleksitas coding untuk validasi kode C pada Vivado HLS. Hasil lain, sebagian dari encoder HEVC dapat di implementasikan pada Zynq PL, yaitu HEVC 2D IDCT. Dari implementasi kami dapat mengoptimalkan fungsi loop pada HEVC 2D dan 1D IDCT menggunakan pipelining. Perbandingan hasil antara pipelining inner-loop dan outer-loop menunjukkan bahwa pipelining di outer-loop dapat meningkatkan performa dilihat dari nilai latency

ITS Repository