4,839 research outputs found

    On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation

    Get PDF
    Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due to the inherently compute- and power-intensive structure of Neural Networks (NNs), hardware accelerators emerge as a promising solution. However, with technology node scaling below 10nm, hardware accelerators become more susceptible to faults, which in turn can impact the NN accuracy. In this paper, we study the resilience aspects of Register-Transfer Level (RTL) model of NN accelerators, in particular, fault characterization and mitigation. By following a High-Level Synthesis (HLS) approach, first, we characterize the vulnerability of various components of RTL NN. We observed that the severity of faults depends on both i) application-level specifications, i.e., NN data (inputs, weights, or intermediate), NN layers, and NN activation functions, and ii) architectural-level specifications, i.e., data representation model and the parallelism degree of the underlying accelerator. Second, motivated by characterization results, we present a low-overhead fault mitigation technique that can efficiently correct bit flips, by 47.3% better than state-of-the-art methods.Comment: 8 pages, 6 figure

    Power-Aware Design Methodologies for FPGA-Based Implementation of Video Processing Systems

    Get PDF
    The increasing capacity and capabilities of FPGA devices in recent years provide an attractive option for performance-hungry applications in the image and video processing domain. FPGA devices are often used as implementation platforms for image and video processing algorithms for real-time applications due to their programmable structure that can exploit inherent spatial and temporal parallelism. While performance and area remain as two main design criteria, power consumption has become an important design goal especially for mobile devices. Reduction in power consumption can be achieved by reducing the supply voltage, capacitances, clock frequency and switching activities in a circuit. Switching activities can be reduced by architectural optimization of the processing cores such as adders, multipliers, multiply and accumulators (MACS), etc. This dissertation research focuses on reducing the switching activities in digital circuits by considering data dependencies in bit level, word level and block level neighborhoods in a video frame. The bit level data neighborhood dependency consideration for power reduction is illustrated in the design of pipelined array, Booth and log-based multipliers. For an array multiplier, operands of the multipliers are partitioned into higher and lower parts so that the probability of the higher order parts being zero or one increases. The gating technique for the pipelined approach deactivates part(s) of the multiplier when the above special values are detected. For the Booth multiplier, the partitioning and gating technique is integrated into the Booth recoding scheme. In addition, a delay correction strategy is developed for the Booth multiplier to reduce the switching activities of the sign extension part in the partial products. A novel architecture design for the computation of log and inverse-log functions for the reduction of power consumption in arithmetic circuits is also presented. This also utilizes the proposed partitioning and gating technique for further dynamic power reduction by reducing the switching activities. The word level and block level data dependencies for reducing the dynamic power consumption are illustrated by presenting the design of a 2-D convolution architecture. Here the similarities of the neighboring pixels in window-based operations of image and video processing algorithms are considered for reduced switching activities. A partitioning and detection mechanism is developed to deactivate the parallel architecture for window-based operations if higher order parts of the pixel values are the same. A neighborhood dependent approach (NDA) is incorporated with different window buffering schemes. Consideration of the symmetry property in filter kernels is also applied with the NDA method for further reduction of switching activities. The proposed design methodologies are implemented and evaluated in a FPGA environment. It is observed that the dynamic power consumption in FPGA-based circuit implementations is significantly reduced in bit level, data level and block level architectures when compared to state-of-the-art design techniques. A specific application for the design of a real-time video processing system incorporating the proposed design methodologies for low power consumption is also presented. An image enhancement application is considered and the proposed partitioning and gating, and NDA methods are utilized in the design of the enhancement system. Experimental results show that the proposed multi-level power aware methodology achieves considerable power reduction. Research work is progressing In utilizing the data dependencies in subsequent frames in a video stream for the reduction of circuit switching activities and thereby the dynamic power consumption

    An AER handshake-less modular infrastructure PCB with x8 2.5Gbps LVDS serial links

    Get PDF
    Nowadays spike-based brain processing emulation is taking off. Several EU and others worldwide projects are demonstrating this, like SpiNNaker, BrainScaleS, FACETS, or NeuroGrid. The larger the brain process emulation on silicon is, the higher the communication performance of the hosting platforms has to be. Many times the bottleneck of these system implementations is not on the performance inside a chip or a board, but in the communication between boards. This paper describes a novel modular Address-Event-Representation (AER) FPGA-based (Spartan6) infrastructure PCB (the AER-Node board) with 2.5Gbps LVDS high speed serial links over SATA cables that offers a peak performance of 32-bit 62.5Meps (Mega events per second) on board-to-board communications. The board allows back compatibility with parallel AER devices supporting up to x2 28-bit parallel data with asynchronous handshake. These boards also allow modular expansion functionality through several daughter boards. The paper is focused on describing in detail the LVDS serial interface and presenting its performance.Ministerio de Ciencia e InnovaciĂłn TEC2009-10639-C04-02/01Ministerio de EconomĂ­a y Competitividad TEC2012-37868-C04-02/01Junta de AndalucĂ­a TIC-6091Ministerio de EconomĂ­a y Competitividad PRI-PIMCHI-2011-076

    Memory Reliability Enhancement against Multiple Cell Upsets Using Decimal Matrix Code for 32-Bit Data

    Get PDF
    An important issue in the reliability of memories exposed to radiation environment is transient multiple cells upsets (MCUs). To protect the memory data from radiations and transients many improved packaging techniques are available. But, a particular packaging provides protection from a limited variation of radiations. Today the devices are exposed to a very wide range of environment radiations due to increasing applications in the field of wireless communication. So some additional data preservation techniques are always preferred for authenticating the data before it is processed. Some of these techniques use encoded data to be stored in memories. These techniques are error correction codes (ECCs). It is always preferred to implement an error correction code that requires a less number of redundant bits to be stored and a minimized delay overhead in data correction. This paper presents an FPGA based implementation of memory data error detection and correction code that involves simple decimal addition algorithm in the encoding of data that is to be stored in memory. The decoding of the data for error detection and correction is based on the Hamming Code. This technique involves divide-symbol concept to represent the linear data in groups to make symbolic code. The length of the symbol is inversely proportional to the delay overhead of the cod

    A Hybrid Fault-Tolerant LEON3 Soft Core Processor Implemented in Low-End SRAM FPGA

    Get PDF
    In this work we implemented a hybrid fault-tolerant LEON3 soft-core processor in a low-end FPGA (Artix-7) and evaluated its error detection capabilities through neutron irradiation and fault injection in an incremental manner. The error mitigation approach combines the use of SEC/DED codes for memories, a hardware monitor to detect control-flow errors, software-based techniques to detect data errors and configuration memory scrubbing with repair to avoid error accumulation. The proposed solution can significantly improve fault tolerance and can be fully embedded in a low-end FPGA, with reduced overhead and low intrusiveness

    Space division multiplexing chip-to-chip quantum key distribution

    Get PDF
    Quantum cryptography is set to become a key technology for future secure communications. However, to get maximum benefit in communication networks, transmission links will need to be shared among several quantum keys for several independent users. Such links will enable switching in quantum network nodes of the quantum keys to their respective destinations. In this paper we present an experimental demonstration of a photonic integrated silicon chip quantum key distribution protocols based on space division multiplexing (SDM), through multicore fiber technology. Parallel and independent quantum keys are obtained, which are useful in crypto-systems and future quantum network

    Equalization of Third-Order Intermodulation Products in Wideband Direct Conversion Receivers

    Get PDF
    This paper reports a SAW-less direct-conversion receiver which utilizes a mixed-signal feedforward path to regenerate and adaptively cancel IM3 products, thus accomplishing system-level linearization. The receiver system performance is dominated by a custom integrated RF front end implemented in 130-nm CMOS and achieves an uncorrected out-of-band IIP3 of -7.1 dBm under the worst-case UMTS FDD Region 1 blocking specifications. Under IM3 equalization, the receiver achieves an effective IIP3 of +5.3 dBm and meets the UMTS BER sensitivity requirement with 3.7 dB of margin

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    Energy-efficient wireless communication

    Get PDF
    In this chapter we present an energy-efficient highly adaptive network interface architecture and a novel data link layer protocol for wireless networks that provides Quality of Service (QoS) support for diverse traffic types. Due to the dynamic nature of wireless networks, adaptations in bandwidth scheduling and error control are necessary to achieve energy efficiency and an acceptable quality of service. In our approach we apply adaptability through all layers of the protocol stack, and provide feedback to the applications. In this way the applications can adapt the data streams, and the network protocols can adapt the communication parameters
    • 

    corecore