91 research outputs found

    Choose-Your-Own Adventure: A Lightweight, High-Performance Approach To Defect And Variation Mitigation In Reconfigurable Logic

    Get PDF
    For field-programmable gate arrays (FPGAs), fine-grained pre-computed alternative configurations, combined with simple test-based selection, produce limited per-chip specialization to counter yield loss, increased delay, and increased energy costs that come from fabrication defects and variation. This lightweight approach achieves much of the benefit of knowledge-based full specialization while reducing to practical, palatable levels the computational, testing, and load-time costs that obstruct the application of the knowledge-based approach. In practice this may more than double the power-limited computational capabilities of dies fabricated with 22nm technologies. Contributions of this work: ‱ Choose-Your-own-Adventure (CYA), a novel, lightweight, scalable methodology to achieve defect and variation mitigation ‱ Implementation of CYA, including preparatory components (generation of diverse alternative paths) and FPGA load-time components ‱ Detailed performance characterization of CYA – Comparison to conventional loading and dynamic frequency and voltage scaling (DFVS) – Limit studies to characterize the quality of the CYA implementation and identify potential areas for further optimizatio

    A PUF based Lightweight Hardware Security Architecture for IoT

    Get PDF
    With an increasing number of hand-held electronics, gadgets, and other smart devices, data is present in a large number of platforms, thereby increasing the risk of security, privacy, and safety breach than ever before. Due to the extreme lightweight nature of these devices, commonly referred to as IoT or `Internet of Things\u27, providing any kind of security is prohibitive due to high overhead associated with any traditional and mathematically robust cryptographic techniques. Therefore, researchers have searched for alternative intuitive solutions for such devices. Hardware security, unlike traditional cryptography, can provide unique device-specific security solutions with little overhead, address vulnerability in hardware and, therefore, are attractive in this domain. As Moore\u27s law is almost at its end, different emerging devices are being explored more by researchers as they present opportunities to build better application-specific devices along with their challenges compared to CMOS technology. In this work, we have proposed emerging nanotechnology-based hardware security as a security solution for resource constrained IoT domain. Specifically, we have built two hardware security primitives i.e. physical unclonable function (PUF) and true random number generator (TRNG) and used these components as part of a security protocol proposed in this work as well. Both PUF and TRNG are built from metal-oxide memristors, an emerging nanoscale device and are generally lightweight compared to their CMOS counterparts in terms of area, power, and delay. Design challenges associated with designing these hardware security primitives and with memristive devices are properly addressed. Finally, a complete security protocol is proposed where all of these different pieces come together to provide a practical, robust, and device-specific security for resource-limited IoT systems

    Design Disjunction for Resilient Reconfigurable Hardware

    Get PDF
    Contemporary reconfigurable hardware devices have the capability to achieve high performance, power efficiency, and adaptability required to meet a wide range of design goals. With scaling challenges facing current complementary metal oxide semiconductor (CMOS), new concepts and methodologies supporting efficient adaptation to handle reliability issues are becoming increasingly prominent. Reconfigurable hardware and their ability to realize self-organization features are expected to play a key role in designing future dependable hardware architectures. However, the exponential increase in density and complexity of current commercial SRAM-based field-programmable gate arrays (FPGAs) has escalated the overhead associated with dynamic runtime design adaptation. Traditionally, static modular redundancy techniques are considered to surmount this limitation; however, they can incur substantial overheads in both area and power requirements. To achieve a better trade-off among performance, area, power, and reliability, this research proposes design-time approaches that enable fine selection of redundancy level based on target reliability goals and autonomous adaptation to runtime demands. To achieve this goal, three studies were conducted: First, a graph and set theoretic approach, named Hypergraph-Cover Diversity (HCD), is introduced as a preemptive design technique to shift the dominant costs of resiliency to design-time. In particular, union-free hypergraphs are exploited to partition the reconfigurable resources pool into highly separable subsets of resources, each of which can be utilized by the same synthesized application netlist. The diverse implementations provide reconfiguration-based resilience throughout the system lifetime while avoiding the significant overheads associated with runtime placement and routing phases. Evaluation on a Motion-JPEG image compression core using a Xilinx 7-series-based FPGA hardware platform has demonstrated the potential of the proposed FT method to achieve 37.5% area saving and up to 66% reduction in power consumption compared to the frequently-used TMR scheme while providing superior fault tolerance. Second, Design Disjunction based on non-adaptive group testing is developed to realize a low-overhead fault tolerant system capable of handling self-testing and self-recovery using runtime partial reconfiguration. Reconfiguration is guided by resource grouping procedures which employ non-linear measurements given by the constructive property of f-disjunctness to extend runtime resilience to a large fault space and realize a favorable range of tradeoffs. Disjunct designs are created using the mosaic convergence algorithm developed such that at least one configuration in the library evades any occurrence of up to d resource faults, where d is lower-bounded by f. Experimental results for a set of MCNC and ISCAS benchmarks have demonstrated f-diagnosability at the individual slice level with average isolation resolution of 96.4% (94.4%) for f=1 (f=2) while incurring an average critical path delay impact of only 1.49% and area cost roughly comparable to conventional 2-MR approaches. Finally, the proposed Design Disjunction method is evaluated as a design-time method to improve timing yield in the presence of large random within-die (WID) process variations for application with a moderately high production capacity

    Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

    Get PDF
    ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

    The Customizable Virtual FPGA: Generation, System Integration and Configuration of Application-Specific Heterogeneous FPGA Architectures

    Get PDF
    In den vergangenen drei Jahrzehnten wurde die Entwicklung von Field Programmable Gate Arrays (FPGAs) stark von Moore’s Gesetz, Prozesstechnologie (Skalierung) und kommerziellen MĂ€rkten beeinflusst. State-of-the-Art FPGAs bewegen sich einerseits dem Allzweck nĂ€her, aber andererseits, da FPGAs immer mehr traditionelle DomĂ€nen der Anwendungsspezifischen integrierten Schaltungen (ASICs) ersetzt haben, steigen die Effizienzerwartungen. Mit dem Ende der Dennard-Skalierung können Effizienzsteigerungen nicht mehr auf Technologie-Skalierung allein zurĂŒckgreifen. Diese Facetten und Trends in Richtung rekonfigurierbarer System-on-Chips (SoCs) und neuen Low-Power-Anwendungen wie Cyber Physical Systems und Internet of Things erfordern eine bessere Anpassung der Ziel-FPGAs. Neben den Trends fĂŒr den Mainstream-Einsatz von FPGAs in Produkten des tĂ€glichen Bedarfs und Services wird es vor allem bei den jĂŒngsten Entwicklungen, FPGAs in Rechenzentren und Cloud-Services einzusetzen, notwendig sein, eine sofortige PortabilitĂ€t von Applikationen ĂŒber aktuelle und zukĂŒnftige FPGA-GerĂ€te hinweg zu gewĂ€hrleisten. In diesem Zusammenhang kann die Hardware-Virtualisierung ein nahtloses Mittel fĂŒr PlattformunabhĂ€ngigkeit und PortabilitĂ€t sein. Ehrlich gesagt stehen die Zwecke der Anpassung und der Virtualisierung eigentlich in einem Konfliktfeld, da die Anpassung fĂŒr die Effizienzsteigerung vorgesehen ist, wĂ€hrend jedoch die Virtualisierung zusĂ€tzlichen FlĂ€chenaufwand hinzufĂŒgt. Die Virtualisierung profitiert aber nicht nur von der Anpassung, sondern fĂŒgt auch mehr FlexibilitĂ€t hinzu, da die Architektur jederzeit verĂ€ndert werden kann. Diese Besonderheit kann fĂŒr adaptive Systeme ausgenutzt werden. Sowohl die Anpassung als auch die Virtualisierung von FPGA-Architekturen wurden in der Industrie bisher kaum adressiert. Trotz einiger existierenden akademischen Werke können diese Techniken noch als unerforscht betrachtet werden und sind aufstrebende Forschungsgebiete. Das Hauptziel dieser Arbeit ist die Generierung von FPGA-Architekturen, die auf eine effiziente Anpassung an die Applikation zugeschnitten sind. Im Gegensatz zum ĂŒblichen Ansatz mit kommerziellen FPGAs, bei denen die FPGA-Architektur als gegeben betrachtet wird und die Applikation auf die vorhandenen Ressourcen abgebildet wird, folgt diese Arbeit einem neuen Paradigma, in dem die Applikation oder Applikationsklasse fest steht und die Zielarchitektur auf die effiziente Anpassung an die Applikation zugeschnitten ist. Dies resultiert in angepassten anwendungsspezifischen FPGAs. Die drei SĂ€ulen dieser Arbeit sind die Aspekte der Virtualisierung, der Anpassung und des Frameworks. Das zentrale Element ist eine weitgehend parametrierbare virtuelle FPGA-Architektur, die V-FPGA genannt wird, wobei sie als primĂ€res Ziel auf jeden kommerziellen FPGA abgebildet werden kann, wĂ€hrend Anwendungen auf der virtuellen Schicht ausgefĂŒhrt werden. Dies sorgt fĂŒr PortabilitĂ€t und Migration auch auf Bitstream-Ebene, da die Spezifikation der virtuellen Schicht bestehen bleibt, wĂ€hrend die physische Plattform ausgetauscht werden kann. DarĂŒber hinaus wird diese Technik genutzt, um eine dynamische und partielle Rekonfiguration auf Plattformen zu ermöglichen, die sie nicht nativ unterstĂŒtzen. Neben der Virtualisierung soll die V-FPGA-Architektur auch als eingebettetes FPGA in ein ASIC integriert werden, das effiziente und dennoch flexible System-on-Chip-Lösungen bietet. Daher werden Zieltechnologie-Abbildungs-Methoden sowohl fĂŒr Virtualisierung als auch fĂŒr die physikalische Umsetzung adressiert und ein Beispiel fĂŒr die physikalische Umsetzung in einem 45 nm Standardzellen Ansatz aufgezeigt. Die hochflexible V-FPGA-Architektur kann mit mehr als 20 Parametern angepasst werden, darunter LUT-Grösse, Clustering, 3D-Stacking, Routing-Struktur und vieles mehr. Die Auswirkungen der Parameter auf FlĂ€che und Leistung der Architektur werden untersucht und eine umfangreiche Analyse von ĂŒber 1400 BenchmarklĂ€ufen zeigt eine hohe Parameterempfindlichkeit bei Abweichungen bis zu ±95, 9% in der FlĂ€che und ±78, 1% in der Leistung, was die hohe Bedeutung von Anpassung fĂŒr Effizienz aufzeigt. Um die Parameter systematisch an die BedĂŒrfnisse der Applikation anzupassen, wird eine parametrische Entwurfsraum-Explorationsmethode auf der Basis geeigneter FlĂ€chen- und Zeitmodellen vorgeschlagen. Eine Herausforderung von angepassten Architekturen ist der Entwurfsaufwand und die Notwendigkeit fĂŒr angepasste Werkzeuge. Daher umfasst diese Arbeit ein Framework fĂŒr die Architekturgenerierung, die Entwurfsraumexploration, die Anwendungsabbildung und die Evaluation. Vor allem ist der V-FPGA in einem vollstĂ€ndig synthetisierbaren generischen Very High Speed Integrated Circuit Hardware Description Language (VHDL) Code konzipiert, der sehr flexibel ist und die Notwendigkeit fĂŒr externe Codegeneratoren eliminiert. Systementwickler können von verschiedenen Arten von generischen SoC-Architekturvorlagen profitieren, um die Entwicklungszeit zu reduzieren. Alle notwendigen Konstruktionsschritte fĂŒr die Applikationsentwicklung und -abbildung auf den V-FPGA werden durch einen Tool-Flow fĂŒr Entwurfsautomatisierung unterstĂŒtzt, der eine Sammlung von vorhandenen kommerziellen und akademischen Werkzeugen ausnutzt, die durch geeignete Modelle angepasst und durch ein neues Werkzeug namens V-FPGA-Explorer ergĂ€nzt werden. Dieses neue Tool fungiert nicht nur als Back-End-Tool fĂŒr die Anwendungsabbildung auf dem V-FPGA sondern ist auch ein grafischer Konfigurations- und Layout-Editor, ein Bitstream-Generator, ein Architekturdatei-Generator fĂŒr die Place & Route Tools, ein Script-Generator und ein Testbenchgenerator. Eine Besonderheit ist die UnterstĂŒtzung der Just-in-Time-Kompilierung mit schnellen Algorithmen fĂŒr die In-System Anwendungsabbildung. Die Arbeit schliesst mit einigen AnwendungsfĂ€llen aus den Bereichen industrielle Prozessautomatisierung, medizinische Bildgebung, adaptive Systeme und Lehre ab, in denen der V-FPGA eingesetzt wird

    Integrated Circuit Wear-out Prediction and Recycling Detection using Radio-Frequency Distinct Native Attribute Features

    Get PDF
    Radio Frequency Distinct Native Attribute (RF-DNA) has shown promise for detecting differences in Integrated Circuits(IC) using features extracted from a devices Unintentional Radio Emissions (URE). This ability of RF-DNA relies upon process variation imparted to a semiconductor device during manufacturing. However, internal components in modern ICs electronically age and wear out over their operational lifetime. RF-DNA techniques are adopted from prior work and applied to MSP430 URE to address the following research goals: 1) Does device wear-out impact RF-DNA device discriminability?, 2) Can device age be continuously estimated by monitoring changes in RF-DNA features?, and 3) Can device age state (e.g., new vs. used) be reliably estimated? Conclusions include: 1) device wear-out does impact RF-DNA, with up to a 16 change in discriminability over the range of accelerated ages considered, 2) continuous(hour-by-hour) age estimation was most challenging and generally not supported, and 3) binary new vs. used age estimation was successful with 78.7 to 99.9 average discriminability for all device-age combinations considered

    A Sustainable Autonomic Architecture for Organically Reconfigurable Computing Systems

    Get PDF
    A Sustainable Autonomic Architecture for Organically Reconfigurable Computing System based on SRAM Field Programmable Gate Arrays (FPGAs) is proposed, modeled analytically, simulated, prototyped, and measured. Low-level organic elements are analyzed and designed to achieve novel self-monitoring, self-diagnosis, and self-repair organic properties. The prototype of a 2-D spatial gradient Sobel video edge-detection organic system use-case developed on a XC4VSX35 Xilinx Virtex-4 Video Starter Kit is presented. Experimental results demonstrate the applicability of the proposed architecture and provide the infrastructure to quantify the performance and overcome fault-handling limitations. Dynamic online autonomous functionality restoration after a malfunction or functionality shift due to changing requirements is achieved at a fine granularity by exploiting dynamic Partial Reconfiguration (PR) techniques. A Genetic Algorithm (GA)-based hardware/software platform for intrinsic evolvable hardware is designed and evaluated for digital circuit repair using a variety of well-accepted benchmarks. Dynamic bitstream compilation for enhanced mutation and crossover operators is achieved by directly manipulating the bitstream using a layered toolset. Experimental results on the edge-detector organic system prototype have shown complete organic online refurbishment after a hard fault. In contrast to previous toolsets requiring many milliseconds or seconds, an average of 0.47 microseconds is required to perform the genetic mutation, 4.2 microseconds to perform the single point conventional crossover, 3.1 microseconds to perform Partial Match Crossover (PMX) as well as Order Crossover (OX), 2.8 microseconds to perform Cycle Crossover (CX), and 1.1 milliseconds for one input pattern intrinsic evaluation. These represent a performance advantage of three orders of magnitude over the JBITS software framework and more than seven orders of magnitude over the Xilinx design flow. Combinatorial Group Testing (CGT) technique was combined with the conventional GA in what is called CGT-pruned GA to reduce repair time and increase system availability. Results have shown up to 37.6% convergence advantage using the pruned technique. Lastly, a quantitative stochastic sustainability model for reparable systems is formulated to evaluate the Sustainability of FPGA-based reparable systems. This model computes at design-time the resources required for refurbishment to meet mission availability and lifetime requirements in a given fault-susceptible missions. By applying this model to MCNC benchmark circuits and the Sobel Edge-Detector in a realistic space mission use-case on Xilinx Virtex-4 FPGA, we demonstrate a comprehensive model encompassing the inter-relationships between system sustainability and fault rates, utilized, and redundant hardware resources, repair policy parameters and decaying reparability

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    Improving the Scalability of High Performance Computer Systems

    Full text link
    Improving the performance of future computing systems will be based upon the ability of increasing the scalability of current technology. New paths need to be explored, as operating principles that were applied up to now are becoming irrelevant for upcoming computer architectures. It appears that scaling the number of cores, processors and nodes within an system represents the only feasible alternative to achieve Exascale performance. To accomplish this goal, we propose three novel techniques addressing different layers of computer systems. The Tightly Coupled Cluster technique significantly improves the communication for inter node communication within compute clusters. By improving the latency by an order of magnitude over existing solutions the cost of communication is considerably reduced. This enables to exploit fine grain parallelism within applications, thereby, extending the scalability considerably. The mechanism virtually moves the network interconnect into the processor, bypassing the latency of the I/O interface and rendering protocol conversions unnecessary. The technique is implemented entirely through firmware and kernel layer software utilizing off-the-shelf AMD processors. We present a proof-of-concept implementation and real world benchmarks to demonstrate the superior performance of our technique. In particular, our approach achieves a software-to-software communication latency of 240 ns between two remote compute nodes. The second part of the dissertation introduces a new framework for scalable Networks-on-Chip. A novel rapid prototyping methodology is proposed, that accelerates the design and implementation substantially. Due to its flexibility and modularity a large application space is covered ranging from Systems-on-chip, to high performance many-core processors. The Network-on-Chip compiler enables to generate complex networks in the form of synthesizable register transfer level code from an abstract design description. Our engine supports different target technologies including Field Programmable Gate Arrays and Application Specific Integrated Circuits. The framework enables to build large designs while minimizing development and verification efforts. Many topologies and routing algorithms are supported by partitioning the tasks into several layers and by the introduction of a protocol agnostic architecture. We provide a thorough evaluation of the design that shows excellent results regarding performance and scalability. The third part of the dissertation addresses the Processor-Memory Interface within computer architectures. The increasing compute power of many-core processors, leads to an equally growing demand for more memory bandwidth and capacity. Current processor designs exhibit physical limitations that restrict the scalability of main memory. To address this issue we propose a memory extension technique that attaches large amounts of DRAM memory to the processor via a low pin count interface using high speed serial transceivers. Our technique transparently integrates the extension memory into the system architecture by providing full cache coherency. Therefore, applications can utilize the memory extension by applying regular shared memory programming techniques. By supporting daisy chained memory extension devices and by introducing the asymmetric probing approach, the proposed mechanism ensures high scalability. We furthermore propose a DMA offloading technique to improve the performance of the processor memory interface. The design has been implemented in a Field Programmable Gate Array based prototype. Driver software and firmware modifications have been developed to bring up the prototype in a Linux based system. We show microbenchmarks that prove the feasibility of our design
    • 

    corecore