117 research outputs found

    Proxy Circuits for Fault-Tolerant Primitive Interfacing in Reconfigurable Devices Targeting Extreme Environments

    Get PDF
    Continuous interface access to device-level primitives in reconfigurable devices in extreme environments is key to reliable operation. However, it is possible for a primitive's interface controller, which is static to be rendered non-operational by a permanent damage in the controller's circuitry. In order to mitigate this, this paper proposes the use of relocatable proxy circuits to provide remote interfacing capability to primitives from anywhere on a reconfigurable device. A demonstration with device register read controller shows that an improvement in fault-tolerance can be achieved

    A Functionality-Based Runtime Relocation System for Circuits on Heterogeneous FPGAs

    Get PDF

    Module Relocation in Heterogeneous Reconfigurable Systems-on-Chip using the Xilinx Isolation Design Flow

    Get PDF
    International audienceHeterogeneous Reconfigurable Systems-on-Chip (HRSoC) contain as their name suggests, heterogeneous processing elements in a single chip. Namely, several processors, hardware accelerators as well as communication networks between all these components. In order to leverage the programming complexity of this kind of platform, applications are described with software threads, running on processors, and hardware threads, running on FPGA partitions. Combining techniques such as dynamic and partial reconfiguration and partial readback with the knowledge of the bitstream structure offer the ability to target several partitions using a unique configuration file. Such a feature permits to save critical memory resources. In this article, we propose to tackle the issue of designing fully independent partitions, and especially to avoid the routing conflicts which can occur when using the standard Xilinx FPGA design flow. To achieve the relocation process successfully, we propose a new design flow dedicated to the module relocation, using the standard tools and based on the Isolation Design Flow (IDF), a special flow provided by Xilinx for secure FPGA applications

    Dynamic reconfiguration frameworks for high-performance reliable real-time reconfigurable computing

    Get PDF
    The sheer hardware-based computational performance and programming flexibility offered by reconfigurable hardware like Field-Programmable Gate Arrays (FPGAs) make them attractive for computing in applications that require high performance, availability, reliability, real-time processing, and high efficiency. Fueled by fabrication process scaling, modern reconfigurable devices come with ever greater quantities of on-chip resources, allowing a more complex variety of applications to be developed. Thus, the trend is that technology giants like Microsoft, Amazon, and Baidu now embrace reconfigurable computing devices likes FPGAs to meet their critical computing needs. In addition, the capability to autonomously reprogramme these devices in the field is being exploited for reliability in application domains like aerospace, defence, military, and nuclear power stations. In such applications, real-time computing is important and is often a necessity for reliability. As such, applications and algorithms resident on these devices must be implemented with sufficient considerations for real-time processing and reliability. Often, to manage a reconfigurable hardware device as a computing platform for a multiplicity of homogenous and heterogeneous tasks, reconfigurable operating systems (ROSes) have been proposed to give a software look to hardware-based computation. The key requirements of a ROS include partitioning, task scheduling and allocation, task configuration or loading, and inter-task communication and synchronization. Existing ROSes have met these requirements to varied extents. However, they are limited in reliability, especially regarding the flexibility of placing the hardware circuits of tasks on device’s chip area, the problem arising more from the partitioning approaches used. Indeed, this problem is deeply rooted in the static nature of the on-chip inter-communication among tasks, hampering the flexibility of runtime task relocation for reliability. This thesis proposes the enabling frameworks for reliable, available, real-time, efficient, secure, and high-performance reconfigurable computing by providing techniques and mechanisms for reliable runtime reconfiguration, and dynamic inter-circuit communication and synchronization for circuits on reconfigurable hardware. This work provides task configuration infrastructures for reliable reconfigurable computing. Key features, especially reliability-enabling functionalities, which have been given little or no attention in state-of-the-art are implemented. These features include internal register read and write for device diagnosis; configuration operation abort mechanism, and tightly integrated selective-area scanning, which aims to optimize access to the device’s reconfiguration port for both task loading and error mitigation. In addition, this thesis proposes a novel reliability-aware inter-task communication framework that exploits the availability of dedicated clocking infrastructures in a typical FPGA to provide inter-task communication and synchronization. The clock buffers and networks of an FPGA use dedicated routing resources, which are distinct from the general routing resources. As such, deploying these dedicated resources for communication sidesteps the restriction of static routes and allows a better relocation of circuits for reliability purposes. For evaluation, a case study that uses a NASA/JPL spectrometer data processing application is employed to demonstrate the improved reliability brought about by the implemented configuration controller and the reliability-aware dynamic communication infrastructure. It is observed that up to 74% time saving can be achieved for selective-area error mitigation when compared to state-of-the-art vendor implementations. Moreover, an improvement in overall system reliability is observed when the proposed dynamic communication scheme is deployed in the data processing application. Finally, one area of reconfigurable computing that has received insufficient attention is security. Meanwhile, considering the nature of applications which now turn to reconfigurable computing for accelerating compute-intensive processes, a high premium is now placed on security, not only of the device but also of the applications, from loading to runtime execution. To address security concerns, a novel secure and efficient task configuration technique for task relocation is also investigated, providing configuration time savings of up to 32% or 83%, depending on the device; and resource usage savings in excess of 90% compared to state-of-the-art

    Efficient runtime placement management for high performance and reliability in COTS FPGAs

    Get PDF
    Designing high-performance, fault-tolerant multisensory electronic systems for hostile environments such as nuclear plants and outer space within the constraints of cost, power and flexibility is challenging. Issues such as ionizing radiation, extreme temperature and ageing can lead to faults in the electronics of these systems. In addition, the remote nature of these environments demands a level of flexibility and autonomy in their operations. The standard practice of using specially hardened electronic devices for such systems is not only very expensive but also has limited flexibility. This thesis proposes novel techniques that promote the use of Commercial Off-The- Shelf (COTS) reconfigurable devices to meet the challenges of high-performance systems for hostile environments. Reconfigurable hardware such as Field Programmable Gate Arrays (FPGA) have a unique combination of flexibility and high performance. The flexibility offered through features such as dynamic partial reconfiguration (DPR) can be harnessed not only to achieve cost-effective designs as a smaller area can be used to execute multiple tasks, but also to improve the reliability of a system as a circuit on one portion of the device can be physically relocated to another portion in the case of fault occurrence. However, to harness these potentials for high performance and reliability in a cost-effective manner, novel runtime management tools are required. Most runtime support tools for reconfigurable devices are based on ideal models which do not adequately consider the limitations of realistic FPGAs, in particular modern FPGAs which are increasingly heterogeneous. Specifically, these tools lack efficient mechanisms for ensuring a high utilization of FPGA resources, including the FPGA area and the configuration port and clocking resources, in a reliable manner. To ensure high utilization of reconfigurable device area, placement management is a key aspect of these tools. This thesis presents novel techniques for the management of hardware task placement on COTS reconfigurable devices for high performance and reliability. To this end, it addresses design-time issues that affect efficient hardware task placement, with a focus on reliability. It also presents techniques to maximize the utilization of the FPGA area in runtime, including techniques to minimize fragmentation. Fragmentation leads to the creation of unusable areas due to dynamic placement of tasks and the heterogeneity of the resources on the chip. Moreover, this thesis also presents an efficient task reuse mechanism to improve the availability of the internal configuration infrastructure of the FPGA for critical responsibilities like error mitigation. The task reuse scheme, unlike previous approaches, also improves the utilization of the chip area by offering defragmentation. Task relocation, which involves changing the physical location of circuits is a technique for error mitigation and high performance. Hence, this thesis also provides a functionality-based relocation mechanism for improving the number of locations to which tasks can be relocated on heterogeneous FPGAs. As tasks are relocated, clock networks need to be routed to them. As such, a reliability-aware technique of clock network routing to tasks after placement is also proposed. Finally, this thesis offers a prototype implementation and characterization of a placement management system (PMS) which is an integration of the aforementioned techniques. The performance of most of the proposed techniques are tested using data processing tasks of a NASA JPL spectrometer application. The results show that the proposed techniques have potentials to improve the reliability and performance of applications in hostile environment compared to state-of-the-art techniques. The task optimization technique presented leads to better capacity to circumvent permanent faults on COTS FPGAs compared to state-of-the-art approaches (48.6% more errors were circumvented for the JPL spectrometer application). The proposed task reuse scheme leads to approximately 29% saving in the amount of configuration time. This frees up the internal configuration interface for more error mitigation operations. In addition, the proposed PMS has a worst-case latency of less than 50% of that of state-of- the-art runtime placement systems, while maintaining the same level of placement quality and resource overhead

    Using Relocatable Bitstreams for Fault Tolerance

    Get PDF
    This research develops a method for relocating reconfigurable modules on the Virtex-II (Pro) family of Field Programmable Gate Arrays (FPGAs). A bitstream translation program is developed which correctly changes the location of a partial bitstream that implements a module on the FPGA. To take advantage of relocatable modules, three fault-tolerance circuit designs are developed and tested. This circuit can operate through a fault by efficiently removing the faulty module and replacing it with a relocated module without faults. The FPGA can recover from faults at a known location, without the need for external intervention using an embedded fault recovery system. The recovery system uses an internal PowerPC to relocate the modules and reprogram the FPGA. Due to the limited architecture of the target FPGA and Xilinx tool errors, an FPGA with automatic fault recovery could not be demonstrated. However, the various components needed to do this type of recovery have been implemented and demonstrated individually

    Fault-tolerant fpga for mission-critical applications.

    Get PDF
    One of the devices that play a great role in electronic circuits design, specifically safety-critical design applications, is Field programmable Gate Arrays (FPGAs). This is because of its high performance, re-configurability and low development cost. FPGAs are used in many applications such as data processing, networks, automotive, space and industrial applications. Negative impacts on the reliability of such applications result from moving to smaller feature sizes in the latest FPGA architectures. This increases the need for fault-tolerant techniques to improve reliability and extend system lifetime of FPGA-based applications. In this thesis, two fault-tolerant techniques for FPGA-based applications are proposed with a built-in fault detection region. A low cost fault detection scheme is proposed for detecting faults using the fault detection region used in both schemes. The fault detection scheme primarily detects open faults in the programmable interconnect resources in the FPGAs. In addition, Stuck-At faults and Single Event Upsets (SEUs) fault can be detected. For fault recovery, each scheme has its own fault recovery approach. The first approach uses a spare module and a 2-to-1 multiplexer to recover from any fault detected. On the other hand, the second approach recovers from any fault detected using the property of Partial Reconfiguration (PR) in the FPGAs. It relies on identifying a Partially Reconfigurable block (P_b) in the FPGA that is used in the recovery process after the first faulty module is identified in the system. This technique uses only one location to recover from faults in any of the FPGA’s modules and the FPGA interconnects. Simulation results show that both techniques can detect and recover from open faults. In addition, Stuck-At faults and Single Event Upsets (SEUs) fault can also be detected. Finally, both techniques require low area overhead

    New Fault Detection, Mitigation and Injection Strategies for Current and Forthcoming Challenges of HW Embedded Designs

    Full text link
    Tesis por compendio[EN] Relevance of electronics towards safety of common devices has only been growing, as an ever growing stake of the functionality is assigned to them. But of course, this comes along the constant need for higher performances to fulfill such functionality requirements, while keeping power and budget low. In this scenario, industry is struggling to provide a technology which meets all the performance, power and price specifications, at the cost of an increased vulnerability to several types of known faults or the appearance of new ones. To provide a solution for the new and growing faults in the systems, designers have been using traditional techniques from safety-critical applications, which offer in general suboptimal results. In fact, modern embedded architectures offer the possibility of optimizing the dependability properties by enabling the interaction of hardware, firmware and software levels in the process. However, that point is not yet successfully achieved. Advances in every level towards that direction are much needed if flexible, robust, resilient and cost effective fault tolerance is desired. The work presented here focuses on the hardware level, with the background consideration of a potential integration into a holistic approach. The efforts in this thesis have focused several issues: (i) to introduce additional fault models as required for adequate representativity of physical effects blooming in modern manufacturing technologies, (ii) to provide tools and methods to efficiently inject both the proposed models and classical ones, (iii) to analyze the optimum method for assessing the robustness of the systems by using extensive fault injection and later correlation with higher level layers in an effort to cut development time and cost, (iv) to provide new detection methodologies to cope with challenges modeled by proposed fault models, (v) to propose mitigation strategies focused towards tackling such new threat scenarios and (vi) to devise an automated methodology for the deployment of many fault tolerance mechanisms in a systematic robust way. The outcomes of the thesis constitute a suite of tools and methods to help the designer of critical systems in his task to develop robust, validated, and on-time designs tailored to his application.[ES] La relevancia que la electrónica adquiere en la seguridad de los productos ha crecido inexorablemente, puesto que cada vez ésta copa una mayor influencia en la funcionalidad de los mismos. Pero, por supuesto, este hecho viene acompañado de una necesidad constante de mayores prestaciones para cumplir con los requerimientos funcionales, al tiempo que se mantienen los costes y el consumo en unos niveles reducidos. En este escenario, la industria está realizando esfuerzos para proveer una tecnología que cumpla con todas las especificaciones de potencia, consumo y precio, a costa de un incremento en la vulnerabilidad a múltiples tipos de fallos conocidos o la introducción de nuevos. Para ofrecer una solución a los fallos nuevos y crecientes en los sistemas, los diseñadores han recurrido a técnicas tradicionalmente asociadas a sistemas críticos para la seguridad, que ofrecen en general resultados sub-óptimos. De hecho, las arquitecturas empotradas modernas ofrecen la posibilidad de optimizar las propiedades de confiabilidad al habilitar la interacción de los niveles de hardware, firmware y software en el proceso. No obstante, ese punto no está resulto todavía. Se necesitan avances en todos los niveles en la mencionada dirección para poder alcanzar los objetivos de una tolerancia a fallos flexible, robusta, resiliente y a bajo coste. El trabajo presentado aquí se centra en el nivel de hardware, con la consideración de fondo de una potencial integración en una estrategia holística. Los esfuerzos de esta tesis se han centrado en los siguientes aspectos: (i) la introducción de modelos de fallo adicionales requeridos para la representación adecuada de efectos físicos surgentes en las tecnologías de manufactura actuales, (ii) la provisión de herramientas y métodos para la inyección eficiente de los modelos propuestos y de los clásicos, (iii) el análisis del método óptimo para estudiar la robustez de sistemas mediante el uso de inyección de fallos extensiva, y la posterior correlación con capas de más alto nivel en un esfuerzo por recortar el tiempo y coste de desarrollo, (iv) la provisión de nuevos métodos de detección para cubrir los retos planteados por los modelos de fallo propuestos, (v) la propuesta de estrategias de mitigación enfocadas hacia el tratamiento de dichos escenarios de amenaza y (vi) la introducción de una metodología automatizada de despliegue de diversos mecanismos de tolerancia a fallos de forma robusta y sistemática. Los resultados de la presente tesis constituyen un conjunto de herramientas y métodos para ayudar al diseñador de sistemas críticos en su tarea de desarrollo de diseños robustos, validados y en tiempo adaptados a su aplicación.[CA] La rellevància que l'electrònica adquireix en la seguretat dels productes ha crescut inexorablement, puix cada volta més aquesta abasta una major influència en la funcionalitat dels mateixos. Però, per descomptat, aquest fet ve acompanyat d'un constant necessitat de majors prestacions per acomplir els requeriments funcionals, mentre es mantenen els costos i consums en uns nivells reduïts. Donat aquest escenari, la indústria està fent esforços per proveir una tecnologia que complisca amb totes les especificacions de potència, consum i preu, tot a costa d'un increment en la vulnerabilitat a diversos tipus de fallades conegudes, i a la introducció de nous tipus. Per oferir una solució a les noves i creixents fallades als sistemes, els dissenyadors han recorregut a tècniques tradicionalment associades a sistemes crítics per a la seguretat, que en general oferixen resultats sub-òptims. De fet, les arquitectures empotrades modernes oferixen la possibilitat d'optimitzar les propietats de confiabilitat en habilitar la interacció dels nivells de hardware, firmware i software en el procés. Tot i això eixe punt no està resolt encara. Es necessiten avanços a tots els nivells en l'esmentada direcció per poder assolir els objectius d'una tolerància a fallades flexible, robusta, resilient i a baix cost. El treball ací presentat se centra en el nivell de hardware, amb la consideració de fons d'una potencial integració en una estratègia holística. Els esforços d'esta tesi s'han centrat en els següents aspectes: (i) la introducció de models de fallada addicionals requerits per a la representació adequada d'efectes físics que apareixen en les tecnologies de fabricació actuals, (ii) la provisió de ferramentes i mètodes per a la injecció eficient del models proposats i dels clàssics, (iii) l'anàlisi del mètode òptim per estudiar la robustesa de sistemes mitjançant l'ús d'injecció de fallades extensiva, i la posterior correlació amb capes de més alt nivell en un esforç per retallar el temps i cost de desenvolupament, (iv) la provisió de nous mètodes de detecció per cobrir els reptes plantejats pels models de fallades proposats, (v) la proposta d'estratègies de mitigació enfocades cap al tractament dels esmentats escenaris d'amenaça i (vi) la introducció d'una metodologia automatitzada de desplegament de diversos mecanismes de tolerància a fallades de forma robusta i sistemàtica. Els resultats de la present tesi constitueixen un conjunt de ferramentes i mètodes per ajudar el dissenyador de sistemes crítics en la seua tasca de desenvolupament de dissenys robustos, validats i a temps adaptats a la seua aplicació.Espinosa García, J. (2016). New Fault Detection, Mitigation and Injection Strategies for Current and Forthcoming Challenges of HW Embedded Designs [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/73146TESISCompendi

    Towards the development of a reliable reconfigurable real-time operating system on FPGAs

    Get PDF
    In the last two decades, Field Programmable Gate Arrays (FPGAs) have been rapidly developed from simple “glue-logic” to a powerful platform capable of implementing a System on Chip (SoC). Modern FPGAs achieve not only the high performance compared with General Purpose Processors (GPPs), thanks to hardware parallelism and dedication, but also better programming flexibility, in comparison to Application Specific Integrated Circuits (ASICs). Moreover, the hardware programming flexibility of FPGAs is further harnessed for both performance and manipulability, which makes Dynamic Partial Reconfiguration (DPR) possible. DPR allows a part or parts of a circuit to be reconfigured at run-time, without interrupting the rest of the chip’s operation. As a result, hardware resources can be more efficiently exploited since the chip resources can be reused by swapping in or out hardware tasks to or from the chip in a time-multiplexed fashion. In addition, DPR improves fault tolerance against transient errors and permanent damage, such as Single Event Upsets (SEUs) can be mitigated by reconfiguring the FPGA to avoid error accumulation. Furthermore, power and heat can be reduced by removing finished or idle tasks from the chip. For all these reasons above, DPR has significantly promoted Reconfigurable Computing (RC) and has become a very hot topic. However, since hardware integration is increasing at an exponential rate, and applications are becoming more complex with the growth of user demands, highlevel application design and low-level hardware implementation are increasingly separated and layered. As a consequence, users can obtain little advantage from DPR without the support of system-level middleware. To bridge the gap between the high-level application and the low-level hardware implementation, this thesis presents the important contributions towards a Reliable, Reconfigurable and Real-Time Operating System (R3TOS), which facilitates the user exploitation of DPR from the application level, by managing the complex hardware in the background. In R3TOS, hardware tasks behave just like software tasks, which can be created, scheduled, and mapped to different computing resources on the fly. The novel contributions of this work are: 1) a novel implementation of an efficient task scheduler and allocator; 2) implementation of a novel real-time scheduling algorithm (FAEDF) and two efficacious allocating algorithms (EAC and EVC), which schedule tasks in real-time and circumvent emerging faults while maintaining more compact empty areas. 3) Design and implementation of a faulttolerant microprocessor by harnessing the existing FPGA resources, such as Error Correction Code (ECC) and configuration primitives. 4) A novel symmetric multiprocessing (SMP)-based architectures that supports shared memory programing interface. 5) Two demonstrations of the integrated system, including a) the K-Nearest Neighbour classifier, which is a non-parametric classification algorithm widely used in various fields of data mining; and b) pairwise sequence alignment, namely the Smith Waterman algorithm, used for identifying similarities between two biological sequences. R3TOS gives considerably higher flexibility to support scalable multi-user, multitasking applications, whereby resources can be dynamically managed in respect of user requirements and hardware availability. Benefiting from this, not only the hardware resources can be more efficiently used, but also the system performance can be significantly increased. Results show that the scheduling and allocating efficiencies have been improved up to 2x, and the overall system performance is further improved by ~2.5x. Future work includes the development of Network on Chip (NoC), which is expected to further increase the communication throughput; as well as the standardization and automation of our system design, which will be carried out in line with the enablement of other high-level synthesis tools, to allow application developers to benefit from the system in a more efficient manner
    corecore