4,502 research outputs found

    Low-cost error detection through high-level synthesis

    Get PDF
    System-on-chip design is becoming increasingly complex as technology scaling enables more and more functionality on a chip. This scaling and complexity has resulted in a variety of reliability and validation challenges including logic bugs, hot spots, wear-out, and soft errors. To make matters worse, as we reach the limits of Dennard scaling, efforts to improve system performance and energy efficiency have resulted in the integration of a wide variety of complex hardware accelerators in SoCs. Thus the challenge is to design complex, custom hardware that is efficient, but also correct and reliable. High-level synthesis shows promise to address the problem of complex hardware design by providing a bridge from the high-productivity software domain to the hardware design process. Much research has been done on high-level synthesis efficiency optimizations. This thesis shows that high-level synthesis also has the power to address validation and reliability challenges through two solutions. One solution for circuit reliability is modulo-3 shadow datapaths: performing lightweight shadow computations in modulo-3 space for each main computation. We leverage the binding and scheduling flexibility of high-level synthesis to detect control errors through diverse binding and minimize area cost through intelligent checkpoint scheduling and modulo-3 reducer sharing. We introduce logic and dataflow optimizations to further reduce cost. We evaluated our technique with 12 high-level synthesis benchmarks from the arithmetic-oriented PolyBench benchmark suite using FPGA emulated netlist-level error injection. We observe coverages of 99.1% for stuck-at faults, 99.5% for soft errors, and 99.6% for timing errors with a 25.7% area cost and negligible performance impact. Leveraging a mean error detection latency of 12.75 cycles (4150x faster than end result check) for soft errors, we also explore a rollback recovery method with an additional area cost of 28.0%, observing a 175x increase in reliability against soft errors. Another solution for rapid post-silicon validation of accelerator designs is Hybrid Quick Error Detection (H-QED): inserting signature generation logic in a hardware design to create a heavily compressed signature stream that captures the internal behavior of the design at a fine temporal and spatial granularity for comparison with a reference set of signatures generated by high-level simulation to detect bugs. Using H-QED, we demonstrate an improvement in error detection latency (time elapsed from when a bug is activated to when it manifests as an observable failure) of two orders of magnitude and a threefold improvement in bug coverage compared to traditional post-silicon validation techniques. H-QED also uncovered previously unknown bugs in the CHStone benchmark suite, which is widely used by the HLS community. H-QED incurs less than 10% area overhead for the accelerator it validates with negligible performance impact, and we also introduce techniques to minimize any possible intrusiveness introduced by H-QED

    Innovation in Energy Systems

    Get PDF
    It has been a little over a century since the inception of interconnected networks and little has changed in the way that they are operated. Demand-supply balance methods, protection schemes, business models for electric power companies, and future development considerations have remained the same until very recently. Distributed generators, storage devices, and electric vehicles have become widespread and disrupted century-old bulk generation - bulk transmission operation. Distribution networks are no longer passive networks and now contribute to power generation. Old billing and energy trading schemes cannot accommodate this change and need revision. Furthermore, bidirectional power flow is an unprecedented phenomenon in distribution networks and traditional protection schemes require a thorough fix for proper operation. This book aims to cover new technologies, methods, and approaches developed to meet the needs of this changing field

    Synthesizing cognition in neuromorphic electronic systems

    Get PDF
    The quest to implement intelligent processing in electronic neuromorphic systems lacks methods for achieving reliable behavioral dynamics on substrates of inherently imprecise and noisy neurons. Here we report a solution to this problem that involves first mapping an unreliable hardware layer of spiking silicon neurons into an abstract computational layer composed of generic reliable subnetworks of model neurons and then composing the target behavioral dynamics as a “soft state machine” running on these reliable subnets. In the first step, the neural networks of the abstract layer are realized on the hardware substrate by mapping the neuron circuit bias voltages to the model parameters. This mapping is obtained by an automatic method in which the electronic circuit biases are calibrated against the model parameters by a series of population activity measurements. The abstract computational layer is formed by configuring neural networks as generic soft winner-take-all subnetworks that provide reliable processing by virtue of their active gain, signal restoration, and multistability. The necessary states and transitions of the desired high-level behavior are then easily embedded in the computational layer by introducing only sparse connections between some neurons of the various subnets. We demonstrate this synthesis method for a neuromorphic sensory agent that performs real-time context-dependent classification of motion patterns observed by a silicon retina

    Robust and reliable hardware accelerator design through high-level synthesis

    Get PDF
    System-on-chip design is becoming increasingly complex as technology scaling enables more and more functionality on a chip. This scaling-driven complexity has resulted in a variety of reliability and validation challenges including logic bugs, hot spots, wear-out, and soft errors. To make matters worse, as we reach the limits of Dennard scaling, efforts to improve system performance and energy efficiency have resulted in the integration of a wide variety of complex hardware accelerators in SoCs. Thus the challenge is to design complex, custom hardware that is efficient, but also correct and reliable. High-level synthesis shows promise to address the problem of complex hardware design by providing a bridge from the high-productivity software domain to the hardware design process. Much research has been done on high-level synthesis efficiency optimizations. This dissertation shows that high-level synthesis also has the power to address validation and reliability challenges through three automated solutions targeting three key stages in the hardware design and use cycle: pre-silicon debugging, post-silicon validation, and post-deployment error detection. Our solution for rapid pre-silicon debugging of accelerator designs is hybrid tracing: comparing a datapath-level trace of hardware execution with a reference software implementation at a fine temporal and spatial granularity to detect logic bugs. An integrated backtrace process delivers source-code meaning to the hardware designer, pinpointing the location of bug activation and providing a strong hint for potential bug fixes. Experimental results show that we are able to detect and aid in localization of logic bugs from both C/C++ specifications as well as the high-level synthesis engine itself. A variation of this solution tailored for rapid post-silicon validation of accelerator designs is hybrid hashing: inserting signature generation logic in a hardware design to create a heavily compressed signature stream that captures the internal behavior of the design at a fine temporal and spatial granularity for comparison with a reference set of signatures generated by high-level simulation to detect bugs. Using hybrid hashing, we demonstrate an improvement in error detection latency (time elapsed from when a bug is activated to when it manifests as an observable failure) of two orders of magnitude and a threefold improvement in bug coverage compared to traditional post-silicon validation techniques. Hybrid hashing also uncovered previously unknown bugs in the CHStone benchmark suite, which is widely used by the HLS community. Hybrid hashing incurs less than 10% area overhead for the accelerator it validates with negligible performance impact, and we also introduce techniques to minimize any possible intrusiveness introduced by hybrid hashing. Finally, our solution for post-deployment error detection is modulo-3 shadow datapaths: performing lightweight shadow computations in modulo-3 space for each main computation. We leverage the binding and scheduling flexibility of high-level synthesis to detect control errors through diverse binding and minimize area cost through intelligent checkpoint scheduling and modulo-3 reducer sharing. We introduce logic and dataflow optimizations to further reduce cost. We evaluated our technique with 12 high-level synthesis benchmarks from the arithmetic-oriented PolyBench benchmark suite using FPGA emulated netlist-level error injection. We observe coverages of 99.1% for stuck-at faults, 99.5% for soft errors, and 99.6% for timing errors with a 25.7% area cost and negligible performance impact. Leveraging a mean error detection latency of 12.75 cycles (4150× faster than end result check) for soft errors, we also explore a rollback recovery method with an additional area cost of 28.0%, observing a 175× increase in reliability against soft errors. While the area cost of our modulo shadow datapaths is much better than traditional modular redundancy approaches, we want to maximize the applicability of our approach. To this end, we take a dive into gate-level architectural design for modulo arithmetic functional units. We introduce new low-cost gate-level architectures for all four key functional units in a shadow datapath: (1) a modulo reduction algorithm that generates architectures consisting entirely of full-adder standard cells; (2) minimum-area modulo adder and subtractor architectures; (3) an array-based modulo multiplier design; and (4) a modulo equality comparator that handles the residue encoding produced by the above. We compare our new functional units to the previous state-of-the-art approach, observing a 12.5% reduction in area and a 47.1% reduction in delay for a 32-bit mod-3 reducer; that our reducer costs, which tend to dominate shadow datapath costs, do not increase with larger modulo bases; and that for modulo-15 and above, all of our modulo functional units have better area and delay then their previous counterparts. We also demonstrate the practicality of our approach by designing a custom shadow datapath for error detection of a multiply accumulate functional unit, which has an area overhead of only 12% for a 32-bit main datapath and 2-bit modulo-3 shadow datapath. Taking our reliability solution further, we look at the bigger picture of modulo shadow datapaths combined with other solutions at different abstraction layers, looking to answer the following question: Given all of the existing reliability improvement techniques for application-specific hardware accelerators, what techniques or combinations of techniques are the most cost-effective? To answer this question, we consider a soft error fault model and empirically evaluate cross-layer combinations of ABFT, EDDI, and modulo shadow datapaths in the context of high-level synthesis; parity in logic synthesis; and flip-flop hardening techniques at the physical design level. We measure the reliability benefit and area, energy, and performance cost of each technique individually and for interesting technique combinations through FPGA emulated fault-injection and physical place-and-route. Our results show that a combination of parity and flip-flop hardening is the most cost-effective in general with an average 1.3% area cost and 5.7% energy cost for a 50× improvement in reliability. The addition of modulo-3 shadow datapaths to this combination provides some additional benefit for some applications, even without considering its combinational logic, stuck-at fault, and timing error protection benefits. We also observe new efficiency challenges for ABFT and EDDI when used for hardware accelerators

    Innovation in Energy Systems

    Get PDF
    It has been a little over a century since the inception of interconnected networks and little has changed in the way that they are operated. Demand-supply balance methods, protection schemes, business models for electric power companies, and future development considerations have remained the same until very recently. Distributed generators, storage devices, and electric vehicles have become widespread and disrupted century-old bulk generation - bulk transmission operation. Distribution networks are no longer passive networks and now contribute to power generation. Old billing and energy trading schemes cannot accommodate this change and need revision. Furthermore, bidirectional power flow is an unprecedented phenomenon in distribution networks and traditional protection schemes require a thorough fix for proper operation. This book aims to cover new technologies, methods, and approaches developed to meet the needs of this changing field

    Real-time VLSI architecture for bio-medical monitoring

    Get PDF
    This paper discusses the architecture and implementation of SSS2, a high-performance real-time signal processing system developed with a hybrid ESL/RTL methodology and targeted to biomedical image processing. Traditional methodologies, as well as new tools, such as Cebatech's C2R untimed-C synthesizer have been employed in the design of the system. The SSS2 platform specifies a parametric number of scalar processing elements, based on multiple 32-bit Sparc-compliant engines, augmented with LE2, an ESL-designed 2-way LIW/SIMD accelerator. LE2, which is purely designed in C, exposes a consistent interface to its SIMD datapath directly which is directly derived from the C-source of open-source image processing codes. It is synthesized to Verilog RTL with C2R. Behaviorally-synthesized SIMD datapaths are then 'plugged-in' into the exposed LE2 datapath interface. The LE2 memory interface can be either a cache- based configurable vector load/store unit or a multi-banked, multi-channel streaming local memory system. Results drawn from this work strongly suggest a shift towards a hybrid approach in designing multi-core systems for high bandwidth streaming and for dealing with large scale medical image transfers and non-linear bio-signal processing algorithms

    Implantable Neural Probes for Brain-Machine Interfaces - Current Developments and Future Prospects

    Get PDF
    A Brain-Machine interface (BMI) allows for direct communication between the brain and machines. Neural probes for recording neural signals are among the essential components of a BMI system. In this report, we review research regarding implantable neural probes and their applications to BMIs. We first discuss conventional neural probes such as the tetrode, Utah array, Michigan probe, and electroencephalography (ECoG), following which we cover advancements in next-generation neural probes. These next-generation probes are associated with improvements in electrical properties, mechanical durability, biocompatibility, and offer a high degree of freedom in practical settings. Specifically, we focus on three key topics: (1) novel implantable neural probes that decrease the level of invasiveness without sacrificing performance, (2) multi-modal neural probes that measure both electrical and optical signals, (3) and neural probes developed using advanced materials. Because safety and precision are critical for practical applications of BMI systems, future studies should aim to enhance these properties when developing next-generation neural probes

    Constraint-Aware, Scalable, and Efficient Algorithms for Multi-Chip Power Module Layout Optimization

    Get PDF
    Moving towards an electrified world requires ultra high-density power converters. Electric vehicles, electrified aerospace, data centers, etc. are just a few fields among wide application areas of power electronic systems, where high-density power converters are essential. As a critical part of these power converters, power semiconductor modules and their layout optimization has been identified as a crucial step in achieving the maximum performance and density for wide bandgap technologies (i.e., GaN and SiC). New packaging technologies are also introduced to produce reliable and efficient multichip power module (MCPM) designs to push the current limits. The complexity of the emerging MCPM layouts is surpassing the capability of a manual, iterative design process to produce an optimum design with agile development requirements. An electronic design automation tool called PowerSynth has been introduced with ongoing research toward enhanced capabilities to speed up the optimized MCPM layout design process. This dissertation presents the PowerSynth progression timeline with the methodology updates and corresponding critical results compared to v1.1. The first released version (v1.1) of PowerSynth demonstrated the benefits of layout abstraction, and reduced-order modeling techniques to perform rapid optimization of the MCPM module compared to the traditional, manual, and iterative design approach. However, that version is limited by several key factors: layout representation technique, layout generation algorithms, iterative design-rule-checking (DRC), optimization algorithm candidates, etc. To address these limitations, and enhance PowerSynth’s capabilities, constraint-aware, scalable, and efficient algorithms have been developed and implemented. PowerSynth layout engine has evolved from v1.3 to v2.0 throughout the last five years to incorporate the algorithm updates and generate all 2D/2.5D/3D Manhattan layout solutions. These fundamental changes in the layout generation methodology have also called for updates in the performance modeling techniques and enabled exploring different optimization algorithms. The latest PowerSynth 2 architecture has been implemented to enable electro-thermo-mechanical and reliability optimization on 2D/2.5D/3D MCPM layouts, and set up a path toward cabinet-level optimization. PowerSynth v2.0 computer-aided design (CAD) flow has been hardware-validated through manufacturing and testing of an optimized novel 3D MCPM layout. The flow has shown significant speedup compared to the manual design flow with a comparable optimization result

    Evolution of optogenetic microdevices

    Full text link
    Implementation of optogenetic techniques is a recent addition to the neuroscientists\u27 preclinical research arsenal, helping to expose the intricate connectivity of the brain and allowing for on-demand direct modulation of specific neural pathways. Developing an optogenetic system requires thorough investigation of the optogenetic technique and of previously fabricated devices, which this review accommodates. Many experiments utilize bench-top systems that are bulky, expensive, and necessitate tethering to the animal. However, these bench-top systems can make use of power-demanding technologies, such as concurrent electrical recording. Newer portable microdevices and implantable systems carried by freely moving animals are being fabricated that take advantage of wireless energy harvesting to power a system and allow for natural movements that are vital for behavioral testing and analysis. An investigation of the evolution of tethered, portable, and implantable optogenetic microdevices is presented, and an analysis of benefits and detriments of each system, including optical power output, device dimensions, electrode width, and weight is given. Opsins, light sources, and optical fiber coupling are also discussed to optimize device parameters and maximize efficiency from the light source to the fiber, respectively. These attributes are important considerations when designing and developing improved optogenetic microdevices
    corecore