24 research outputs found

    In-FPGA instrumentation framework for openCL-based designs

    Get PDF
    ABSTRACT: The productivity achieved when developing applications on high-performance reconfigurable heterogeneous computing (HPRHC) systems is increased by using the Open Computing Language (OpenCL). However, the hardware produced by OpenCL compilers in field-programmable gate arrays (FPGAs) can result in severe performance bottlenecks that are challenging to solve. The problem is compounded by the fact that the generated netlist details are disorganized, making them mostly unreadable and only partially visible to designers. This paper proposes an in-FPGA instrumentation method and a new framework for extracting the FPGA-cycle-accurate timing performances of OpenCL-based designs. The results clearly show that the chosen execution model for OpenCL-based designs strongly affects the timing performance when it is not properly implemented. Our framework is implemented on an HPRHC platform that contains a CPU and two Arria10 FPGAs, and it is evaluated with a wide variety of benchmarks with different complexities. After testing on the reported benchmarks, the average logic overhead for one inserted instrument is 0.2 % of the total amount of adaptive look-up tables (ALUTs) and 0.1 % of the total registers in an FPGA. This resource utilization is between 1.5 and six times lower than those reported in the best previously published works. The scalability of the framework is also evaluated by inserting up to 50 instruments. The experimental results show that the average logic utilization per instrument is 0.19 % of the ALUTs and 0.17 % of the registers in the FPGA when 50 instruments are inserted

    Automated Debugging Methodology for FPGA-based Systems

    Get PDF
    Electronic devices make up a vital part of our lives. These are seen from mobiles, laptops, computers, home automation, etc. to name a few. The modern designs constitute billions of transistors. However, with this evolution, ensuring that the devices fulfill the designer’s expectation under variable conditions has also become a great challenge. This requires a lot of design time and effort. Whenever an error is encountered, the process is re-started. Hence, it is desired to minimize the number of spins required to achieve an error-free product, as each spin results in loss of time and effort. Software-based simulation systems present the main technique to ensure the verification of the design before fabrication. However, few design errors (bugs) are likely to escape the simulation process. Such bugs subsequently appear during the post-silicon phase. Finding such bugs is time-consuming due to inherent invisibility of the hardware. Instead of software simulation of the design in the pre-silicon phase, post-silicon techniques permit the designers to verify the functionality through the physical implementations of the design. The main benefit of the methodology is that the implemented design in the post-silicon phase runs many order-of-magnitude faster than its counterpart in pre-silicon. This allows the designers to validate their design more exhaustively. This thesis presents five main contributions to enable a fast and automated debugging solution for reconfigurable hardware. During the research work, we used an obstacle avoidance system for robotic vehicles as a use case to illustrate how to apply the proposed debugging solution in practical environments. The first contribution presents a debugging system capable of providing a lossless trace of debugging data which permits a cycle-accurate replay. This methodology ensures capturing permanent as well as intermittent errors in the implemented design. The contribution also describes a solution to enhance hardware observability. It is proposed to utilize processor-configurable concentration networks, employ debug data compression to transmit the data more efficiently, and partially reconfiguring the debugging system at run-time to save the time required for design re-compilation as well as preserve the timing closure. The second contribution presents a solution for communication-centric designs. Furthermore, solutions for designs with multi-clock domains are also discussed. The third contribution presents a priority-based signal selection methodology to identify the signals which can be more helpful during the debugging process. A connectivity generation tool is also presented which can map the identified signals to the debugging system. The fourth contribution presents an automated error detection solution which can help in capturing the permanent as well as intermittent errors without continuous monitoring of debugging data. The proposed solution works for designs even in the absence of golden reference. The fifth contribution proposes to use artificial intelligence for post-silicon debugging. We presented a novel idea of using a recurrent neural network for debugging when a golden reference is present for training the network. Furthermore, the idea was also extended to designs where golden reference is not present

    Robust and reliable hardware accelerator design through high-level synthesis

    Get PDF
    System-on-chip design is becoming increasingly complex as technology scaling enables more and more functionality on a chip. This scaling-driven complexity has resulted in a variety of reliability and validation challenges including logic bugs, hot spots, wear-out, and soft errors. To make matters worse, as we reach the limits of Dennard scaling, efforts to improve system performance and energy efficiency have resulted in the integration of a wide variety of complex hardware accelerators in SoCs. Thus the challenge is to design complex, custom hardware that is efficient, but also correct and reliable. High-level synthesis shows promise to address the problem of complex hardware design by providing a bridge from the high-productivity software domain to the hardware design process. Much research has been done on high-level synthesis efficiency optimizations. This dissertation shows that high-level synthesis also has the power to address validation and reliability challenges through three automated solutions targeting three key stages in the hardware design and use cycle: pre-silicon debugging, post-silicon validation, and post-deployment error detection. Our solution for rapid pre-silicon debugging of accelerator designs is hybrid tracing: comparing a datapath-level trace of hardware execution with a reference software implementation at a fine temporal and spatial granularity to detect logic bugs. An integrated backtrace process delivers source-code meaning to the hardware designer, pinpointing the location of bug activation and providing a strong hint for potential bug fixes. Experimental results show that we are able to detect and aid in localization of logic bugs from both C/C++ specifications as well as the high-level synthesis engine itself. A variation of this solution tailored for rapid post-silicon validation of accelerator designs is hybrid hashing: inserting signature generation logic in a hardware design to create a heavily compressed signature stream that captures the internal behavior of the design at a fine temporal and spatial granularity for comparison with a reference set of signatures generated by high-level simulation to detect bugs. Using hybrid hashing, we demonstrate an improvement in error detection latency (time elapsed from when a bug is activated to when it manifests as an observable failure) of two orders of magnitude and a threefold improvement in bug coverage compared to traditional post-silicon validation techniques. Hybrid hashing also uncovered previously unknown bugs in the CHStone benchmark suite, which is widely used by the HLS community. Hybrid hashing incurs less than 10% area overhead for the accelerator it validates with negligible performance impact, and we also introduce techniques to minimize any possible intrusiveness introduced by hybrid hashing. Finally, our solution for post-deployment error detection is modulo-3 shadow datapaths: performing lightweight shadow computations in modulo-3 space for each main computation. We leverage the binding and scheduling flexibility of high-level synthesis to detect control errors through diverse binding and minimize area cost through intelligent checkpoint scheduling and modulo-3 reducer sharing. We introduce logic and dataflow optimizations to further reduce cost. We evaluated our technique with 12 high-level synthesis benchmarks from the arithmetic-oriented PolyBench benchmark suite using FPGA emulated netlist-level error injection. We observe coverages of 99.1% for stuck-at faults, 99.5% for soft errors, and 99.6% for timing errors with a 25.7% area cost and negligible performance impact. Leveraging a mean error detection latency of 12.75 cycles (4150× faster than end result check) for soft errors, we also explore a rollback recovery method with an additional area cost of 28.0%, observing a 175× increase in reliability against soft errors. While the area cost of our modulo shadow datapaths is much better than traditional modular redundancy approaches, we want to maximize the applicability of our approach. To this end, we take a dive into gate-level architectural design for modulo arithmetic functional units. We introduce new low-cost gate-level architectures for all four key functional units in a shadow datapath: (1) a modulo reduction algorithm that generates architectures consisting entirely of full-adder standard cells; (2) minimum-area modulo adder and subtractor architectures; (3) an array-based modulo multiplier design; and (4) a modulo equality comparator that handles the residue encoding produced by the above. We compare our new functional units to the previous state-of-the-art approach, observing a 12.5% reduction in area and a 47.1% reduction in delay for a 32-bit mod-3 reducer; that our reducer costs, which tend to dominate shadow datapath costs, do not increase with larger modulo bases; and that for modulo-15 and above, all of our modulo functional units have better area and delay then their previous counterparts. We also demonstrate the practicality of our approach by designing a custom shadow datapath for error detection of a multiply accumulate functional unit, which has an area overhead of only 12% for a 32-bit main datapath and 2-bit modulo-3 shadow datapath. Taking our reliability solution further, we look at the bigger picture of modulo shadow datapaths combined with other solutions at different abstraction layers, looking to answer the following question: Given all of the existing reliability improvement techniques for application-specific hardware accelerators, what techniques or combinations of techniques are the most cost-effective? To answer this question, we consider a soft error fault model and empirically evaluate cross-layer combinations of ABFT, EDDI, and modulo shadow datapaths in the context of high-level synthesis; parity in logic synthesis; and flip-flop hardening techniques at the physical design level. We measure the reliability benefit and area, energy, and performance cost of each technique individually and for interesting technique combinations through FPGA emulated fault-injection and physical place-and-route. Our results show that a combination of parity and flip-flop hardening is the most cost-effective in general with an average 1.3% area cost and 5.7% energy cost for a 50× improvement in reliability. The addition of modulo-3 shadow datapaths to this combination provides some additional benefit for some applications, even without considering its combinational logic, stuck-at fault, and timing error protection benefits. We also observe new efficiency challenges for ABFT and EDDI when used for hardware accelerators

    Advances in Logic Locking: Past, Present, and Prospects

    Get PDF
    Logic locking is a design concealment mechanism for protecting the IPs integrated into modern System-on-Chip (SoC) architectures from a wide range of hardware security threats at the IC manufacturing supply chain. Logic locking primarily helps the designer to protect the IPs against reverse engineering, IP piracy, overproduction, and unauthorized activation. For more than a decade, the research studies that carried out on this paradigm has been immense, in which the applicability, feasibility, and efficacy of the logic locking have been investigated, including metrics to assess the efficacy, impact of locking in different levels of abstraction, threat model definition, resiliency against physical attacks, tampering, and the application of machine learning. However, the security and strength of existing logic locking techniques have been constantly questioned by sophisticated logical and physical attacks that evolve in sophistication at the same rate as logic locking countermeasure approaches. By providing a comprehensive definition regarding the metrics, assumptions, and principles of logic locking, in this survey paper, we categorize the existing defenses and attacks to capture the most benefit from the logic locking techniques for IP protection, and illuminating the need for and giving direction to future research studies in this topic. This survey paper serves as a guide to quickly navigate and identify the state-of-the-art that should be considered and investigated for further studies on logic locking techniques, helping IP vendors, SoC designers, and researchers to be informed of the principles, fundamentals, and properties of logic locking

    Transient error mitigation by means of approximate logic circuits

    Get PDF
    Mención Internacional en el título de doctorThe technological advances in the manufacturing of electronic circuits have allowed to greatly improve their performance, but they have also increased the sensitivity of electronic devices to radiation-induced errors. Among them, the most common effects are the SEEs, i.e., electrical perturbations provoked by the strike of high-energy particles, which may modify the internal state of a memory element (SEU) or generate erroneous transient pulses (SET), among other effects. These events pose a threat for the reliability of electronic circuits, and therefore fault-tolerance techniques must be applied to deal with them. The most common fault-tolerance techniques are based in full replication (DWC or TMR). These techniques are able to cover a wide range of failure mechanisms present in electronic circuits. However, they suffer from high overheads in terms of area and power consumption. For this reason, lighter alternatives are often sought at the expense of slightly reducing reliability for the least critical circuit sections. In this context a new paradigm of electronic design is emerging, known as approximate computing, which is based on improving the circuit performance in change of slight modifications of the intended functionality. This is an interesting approach for the design of lightweight fault-tolerant solutions, which has not been yet studied in depth. The main goal of this thesis consists in developing new lightweight fault-tolerant techniques with partial replication, by means of approximate logic circuits. These circuits can be designed with great flexibility. This way, the level of protection as well as the overheads can be adjusted at will depending on the necessities of each application. However, finding optimal approximate circuits for a given application is still a challenge. In this thesis a method for approximate circuit generation is proposed, denoted as fault approximation, which consists in assigning constant logic values to specific circuit lines. On the other hand, several criteria are developed to generate the most suitable approximate circuits for each application, by using this fault approximation mechanism. These criteria are based on the idea of approximating the least testable sections of circuits, which allows reducing overheads while minimising the loss of reliability. Therefore, in this thesis the selection of approximations is linked to testability measures. The first criterion for fault selection developed in this thesis uses static testability measures. The approximations are generated from the results of a fault simulation of the target circuit, and from a user-specified testability threshold. The amount of approximated faults depends on the chosen threshold, which allows to generate approximate circuits with different performances. Although this approach was initially intended for combinational circuits, an extension to sequential circuits has been performed as well, by considering the flip-flops as both inputs and outputs of the combinational part of the circuit. The experimental results show that this technique achieves a wide scalability, and an acceptable trade-off between reliability versus overheads. In addition, its computational complexity is very low. However, the selection criterion based in static testability measures has some drawbacks. Adjusting the performance of the generated approximate circuits by means of the approximation threshold is not intuitive, and the static testability measures do not take into account the changes as long as faults are approximated. Therefore, an alternative criterion is proposed, which is based on dynamic testability measures. With this criterion, the testability of each fault is computed by means of an implication-based probability analysis. The probabilities are updated with each new approximated fault, in such a way that on each iteration the most beneficial approximation is chosen, that is, the fault with the lowest probability. In addition, the computed probabilities allow to estimate the level of protection against faults that the generated approximate circuits provide. Therefore, it is possible to generate circuits which stick to a target error rate. By modifying this target, circuits with different performances can be obtained. The experimental results show that this new approach is able to stick to the target error rate with reasonably good precision. In addition, the approximate circuits generated with this technique show better performance than with the approach based in static testability measures. In addition, the fault implications have been reused too in order to implement a new type of logic transformation, which consists in substituting functionally similar nodes. Once the fault selection criteria have been developed, they are applied to different scenarios. First, an extension of the proposed techniques to FPGAs is performed, taking into account the particularities of this kind of circuits. This approach has been validated by means of radiation experiments, which show that a partial replication with approximate circuits can be even more robust than a full replication approach, because a smaller area reduces the probability of SEE occurrence. Besides, the proposed techniques have been applied to a real application circuit as well, in particular to the microprocessor ARM Cortex M0. A set of software benchmarks is used to generate the required testability measures. Finally, a comparative study of the proposed approaches with approximate circuit generation by means of evolutive techniques have been performed. These approaches make use of a high computational capacity to generate multiple circuits by trial-and-error, thus reducing the possibility of falling into local minima. The experimental results demonstrate that the circuits generated with evolutive approaches are slightly better in performance than the circuits generated with the techniques here proposed, although with a much higher computational effort. In summary, several original fault mitigation techniques with approximate logic circuits are proposed. These approaches are demonstrated in various scenarios, showing that the scalability and adaptability to the requirements of each application are their main virtuesLos avances tecnológicos en la fabricación de circuitos electrónicos han permitido mejorar en gran medida sus prestaciones, pero también han incrementado la sensibilidad de los mismos a los errores provocados por la radiación. Entre ellos, los más comunes son los SEEs, perturbaciones eléctricas causadas por el impacto de partículas de alta energía, que entre otros efectos pueden modificar el estado de los elementos de memoria (SEU) o generar pulsos transitorios de valor erróneo (SET). Estos eventos suponen un riesgo para la fiabilidad de los circuitos electrónicos, por lo que deben ser tratados mediante técnicas de tolerancia a fallos. Las técnicas de tolerancia a fallos más comunes se basan en la replicación completa del circuito (DWC o TMR). Estas técnicas son capaces de cubrir una amplia variedad de modos de fallo presentes en los circuitos electrónicos. Sin embargo, presentan un elevado sobrecoste en área y consumo. Por ello, a menudo se buscan alternativas más ligeras, aunque no tan efectivas, basadas en una replicación parcial. En este contexto surge una nueva filosofía de diseño electrónico, conocida como computación aproximada, basada en mejorar las prestaciones de un diseño a cambio de ligeras modificaciones de la funcionalidad prevista. Es un enfoque atractivo y poco explorado para el diseño de soluciones ligeras de tolerancia a fallos. El objetivo de esta tesis consiste en desarrollar nuevas técnicas ligeras de tolerancia a fallos por replicación parcial, mediante el uso de circuitos lógicos aproximados. Estos circuitos se pueden diseñar con una gran flexibilidad. De este forma, tanto el nivel de protección como el sobrecoste se pueden regular libremente en función de los requisitos de cada aplicación. Sin embargo, encontrar los circuitos aproximados óptimos para cada aplicación es actualmente un reto. En la presente tesis se propone un método para generar circuitos aproximados, denominado aproximación de fallos, consistente en asignar constantes lógicas a ciertas líneas del circuito. Por otro lado, se desarrollan varios criterios de selección para, mediante este mecanismo, generar los circuitos aproximados más adecuados para cada aplicación. Estos criterios se basan en la idea de aproximar las secciones menos testables del circuito, lo que permite reducir los sobrecostes minimizando la perdida de fiabilidad. Por tanto, en esta tesis la selección de aproximaciones se realiza a partir de medidas de testabilidad. El primer criterio de selección de fallos desarrollado en la presente tesis hace uso de medidas de testabilidad estáticas. Las aproximaciones se generan a partir de los resultados de una simulación de fallos del circuito objetivo, y de un umbral de testabilidad especificado por el usuario. La cantidad de fallos aproximados depende del umbral escogido, lo que permite generar circuitos aproximados con diferentes prestaciones. Aunque inicialmente este método ha sido concebido para circuitos combinacionales, también se ha realizado una extensión a circuitos secuenciales, considerando los biestables como entradas y salidas de la parte combinacional del circuito. Los resultados experimentales demuestran que esta técnica consigue una buena escalabilidad, y unas prestaciones de coste frente a fiabilidad aceptables. Además, tiene un coste computacional muy bajo. Sin embargo, el criterio de selección basado en medidas estáticas presenta algunos inconvenientes. No resulta intuitivo ajustar las prestaciones de los circuitos aproximados a partir de un umbral de testabilidad, y las medidas estáticas no tienen en cuenta los cambios producidos a medida que se van aproximando fallos. Por ello, se propone un criterio alternativo de selección de fallos, basado en medidas de testabilidad dinámicas. Con este criterio, la testabilidad de cada fallo se calcula mediante un análisis de probabilidades basado en implicaciones. Las probabilidades se actualizan con cada nuevo fallo aproximado, de forma que en cada iteración se elige la aproximación más favorable, es decir, el fallo con menor probabilidad. Además, las probabilidades calculadas permiten estimar la protección frente a fallos que ofrecen los circuitos aproximados generados, por lo que es posible generar circuitos que se ajusten a una tasa de fallos objetivo. Modificando esta tasa se obtienen circuitos aproximados con diferentes prestaciones. Los resultados experimentales muestran que este método es capaz de ajustarse razonablemente bien a la tasa de fallos objetivo. Además, los circuitos generados con esta técnica muestran mejores prestaciones que con el método basado en medidas estáticas. También se han aprovechado las implicaciones de fallos para implementar un nuevo tipo de transformación lógica, consistente en sustituir nodos funcionalmente similares. Una vez desarrollados los criterios de selección de fallos, se aplican a distintos campos. En primer lugar, se hace una extensión de las técnicas propuestas para FPGAs, teniendo en cuenta las particularidades de este tipo de circuitos. Esta técnica se ha validado mediante experimentos de radiación, los cuales demuestran que una replicación parcial con circuitos aproximados puede ser incluso más robusta que una replicación completa, ya que un área más pequeña reduce la probabilidad de SEEs. Por otro lado, también se han aplicado las técnicas propuestas en esta tesis a un circuito de aplicación real, el microprocesador ARM Cortex M0, utilizando un conjunto de benchmarks software para generar las medidas de testabilidad necesarias. Por ´último, se realiza un estudio comparativo de las técnicas desarrolladas con la generación de circuitos aproximados mediante técnicas evolutivas. Estas técnicas hacen uso de una gran capacidad de cálculo para generar múltiples circuitos mediante ensayo y error, reduciendo la posibilidad de caer en algún mínimo local. Los resultados confirman que, en efecto, los circuitos generados mediante técnicas evolutivas son ligeramente mejores en prestaciones que con las técnicas aquí propuestas, pero con un coste computacional mucho mayor. En definitiva, se proponen varias técnicas originales de mitigación de fallos mediante circuitos aproximados. Se demuestra que estas técnicas tienen diversas aplicaciones, haciendo de la flexibilidad y adaptabilidad a los requisitos de cada aplicación sus principales virtudes.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Raoul Velazco.- Secretario: Almudena Lindoso Muñoz.- Vocal: Jaume Segura Fuste

    Embedded electronic systems driven by run-time reconfigurable hardware

    Get PDF
    Abstract This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology –available through SRAM-based FPGA/SoC devices– aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation –silicon area, processing time, complexity, flexibility, functional density, cost and power consumption– in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen Esta tesis doctoral abarca el diseño de sistemas electrónicos embebidos basados en tecnología hardware dinámicamente reconfigurable –disponible a través de dispositivos lógicos programables SRAM FPGA/SoC– que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguración que proporcione a la FPGA la capacidad de reconfiguración dinámica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicación particionada en tareas multiplexadas en tiempo y en espacio, optimizando así su implementación física –área de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipada– comparada con otras alternativas basadas en hardware estático (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalúa el flujo de diseño de dicha tecnología a través del prototipado de varias aplicaciones de ingeniería (sistemas de control, coprocesadores aritméticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotación en la industria.Resum Aquesta tesi doctoral està orientada al disseny de sistemes electrònics empotrats basats en tecnologia hardware dinàmicament reconfigurable –disponible mitjançant dispositius lògics programables SRAM FPGA/SoC– que contribueixin a la millora de la qualitat de vida de la societat. S’investiga l’arquitectura del sistema i del motor de reconfiguració que proporcioni a la FPGA la capacitat de reconfiguració dinàmica parcial dels seus recursos programables, amb l’objectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicació particionada en tasques multiplexades en temps i en espai, optimizant així la seva implementació física –àrea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potència dissipada– comparada amb altres alternatives basades en hardware estàtic (MCU, DSP, GPU, ASSP, ASIC, etc.). S’evalúa el fluxe de disseny d’aquesta tecnologia a través del prototipat de varies aplicacions d’enginyeria (sistemes de control, coprocessadors aritmètics, processadors d’imatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotació a la indústria

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    Proceedings of the 21st Conference on Formal Methods in Computer-Aided Design – FMCAD 2021

    Get PDF
    The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing
    corecore