4,839 research outputs found
On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation
Machine Learning (ML) is making a strong resurgence in tune with the massive
generation of unstructured data which in turn requires massive computational
resources. Due to the inherently compute- and power-intensive structure of
Neural Networks (NNs), hardware accelerators emerge as a promising solution.
However, with technology node scaling below 10nm, hardware accelerators become
more susceptible to faults, which in turn can impact the NN accuracy. In this
paper, we study the resilience aspects of Register-Transfer Level (RTL) model
of NN accelerators, in particular, fault characterization and mitigation. By
following a High-Level Synthesis (HLS) approach, first, we characterize the
vulnerability of various components of RTL NN. We observed that the severity of
faults depends on both i) application-level specifications, i.e., NN data
(inputs, weights, or intermediate), NN layers, and NN activation functions, and
ii) architectural-level specifications, i.e., data representation model and the
parallelism degree of the underlying accelerator. Second, motivated by
characterization results, we present a low-overhead fault mitigation technique
that can efficiently correct bit flips, by 47.3% better than state-of-the-art
methods.Comment: 8 pages, 6 figure
Power-Aware Design Methodologies for FPGA-Based Implementation of Video Processing Systems
The increasing capacity and capabilities of FPGA devices in recent years provide an attractive option for performance-hungry applications in the image and video processing domain. FPGA devices are often used as implementation platforms for image and video processing algorithms for real-time applications due to their programmable structure that can exploit inherent spatial and temporal parallelism. While performance and area remain as two main design criteria, power consumption has become an important design goal especially for mobile devices. Reduction in power consumption can be achieved by reducing the supply voltage, capacitances, clock frequency and switching activities in a circuit. Switching activities can be reduced by architectural optimization of the processing cores such as adders, multipliers, multiply and accumulators (MACS), etc. This dissertation research focuses on reducing the switching activities in digital circuits by considering data dependencies in bit level, word level and block level neighborhoods in a video frame.
The bit level data neighborhood dependency consideration for power reduction is illustrated in the design of pipelined array, Booth and log-based multipliers. For an array multiplier, operands of the multipliers are partitioned into higher and lower parts so that the probability of the higher order parts being zero or one increases. The gating technique for the pipelined approach deactivates part(s) of the multiplier when the above special values are detected. For the Booth multiplier, the partitioning and gating technique is integrated into the Booth recoding scheme. In addition, a delay correction strategy is developed for the Booth multiplier to reduce the switching activities of the sign extension part in the partial products. A novel architecture design for the computation of log and inverse-log functions for the reduction of power consumption in arithmetic circuits is also presented. This also utilizes the proposed partitioning and gating technique for further dynamic power reduction by reducing the switching activities.
The word level and block level data dependencies for reducing the dynamic power consumption are illustrated by presenting the design of a 2-D convolution architecture. Here the similarities of the neighboring pixels in window-based operations of image and video processing algorithms are considered for reduced switching activities. A partitioning and detection mechanism is developed to deactivate the parallel architecture for window-based operations if higher order parts of the pixel values are the same. A neighborhood dependent approach (NDA) is incorporated with different window buffering schemes. Consideration of the symmetry property in filter kernels is also applied with the NDA method for further reduction of switching activities.
The proposed design methodologies are implemented and evaluated in a FPGA environment. It is observed that the dynamic power consumption in FPGA-based circuit implementations is significantly reduced in bit level, data level and block level architectures when compared to state-of-the-art design techniques. A specific application for the design of a real-time video processing system incorporating the proposed design methodologies for low power consumption is also presented. An image enhancement application is considered and the proposed partitioning and gating, and NDA methods are utilized in the design of the enhancement system. Experimental results show that the proposed multi-level power aware methodology achieves considerable power reduction. Research work is progressing In utilizing the data dependencies in subsequent frames in a video stream for the reduction of circuit switching activities and thereby the dynamic power consumption
An AER handshake-less modular infrastructure PCB with x8 2.5Gbps LVDS serial links
Nowadays spike-based brain processing emulation is
taking off. Several EU and others worldwide projects are
demonstrating this, like SpiNNaker, BrainScaleS, FACETS, or
NeuroGrid. The larger the brain process emulation on silicon is,
the higher the communication performance of the hosting
platforms has to be. Many times the bottleneck of these system
implementations is not on the performance inside a chip or a
board, but in the communication between boards. This paper
describes a novel modular Address-Event-Representation (AER)
FPGA-based (Spartan6) infrastructure PCB (the AER-Node
board) with 2.5Gbps LVDS high speed serial links over SATA
cables that offers a peak performance of 32-bit 62.5Meps (Mega
events per second) on board-to-board communications. The
board allows back compatibility with parallel AER devices
supporting up to x2 28-bit parallel data with asynchronous
handshake. These boards also allow modular expansion
functionality through several daughter boards. The paper is
focused on describing in detail the LVDS serial interface and
presenting its performance.Ministerio de Ciencia e InnovaciĂłn TEC2009-10639-C04-02/01Ministerio de EconomĂa y Competitividad TEC2012-37868-C04-02/01Junta de AndalucĂa TIC-6091Ministerio de EconomĂa y Competitividad PRI-PIMCHI-2011-076
Memory Reliability Enhancement against Multiple Cell Upsets Using Decimal Matrix Code for 32-Bit Data
An important issue in the reliability of memories exposed to radiation environment is transient multiple cells upsets (MCUs). To protect the memory data from radiations and transients many improved packaging techniques are available. But, a particular packaging provides protection from a limited variation of radiations. Today the devices are exposed to a very wide range of environment radiations due to increasing applications in the field of wireless communication. So some additional data preservation techniques are always preferred for authenticating the data before it is processed. Some of these techniques use encoded data to be stored in memories. These techniques are error correction codes (ECCs). It is always preferred to implement an error correction code that requires a less number of redundant bits to be stored and a minimized delay overhead in data correction. This paper presents an FPGA based implementation of memory data error detection and correction code that involves simple decimal addition algorithm in the encoding of data that is to be stored in memory. The decoding of the data for error detection and correction is based on the Hamming Code. This technique involves divide-symbol concept to represent the linear data in groups to make symbolic code. The length of the symbol is inversely proportional to the delay overhead of the cod
A Hybrid Fault-Tolerant LEON3 Soft Core Processor Implemented in Low-End SRAM FPGA
In this work we implemented a hybrid fault-tolerant LEON3 soft-core processor in a low-end FPGA (Artix-7) and evaluated its error detection capabilities through neutron irradiation and fault injection in an incremental manner. The error mitigation approach combines the use of SEC/DED codes for memories, a hardware monitor to detect control-flow errors, software-based techniques to detect data errors and configuration memory scrubbing with repair to avoid error accumulation. The proposed solution can significantly improve fault tolerance and can be fully embedded in a low-end FPGA, with reduced overhead and low intrusiveness
Space division multiplexing chip-to-chip quantum key distribution
Quantum cryptography is set to become a key technology for future secure
communications. However, to get maximum benefit in communication networks,
transmission links will need to be shared among several quantum keys for
several independent users. Such links will enable switching in quantum network
nodes of the quantum keys to their respective destinations. In this paper we
present an experimental demonstration of a photonic integrated silicon chip
quantum key distribution protocols based on space division multiplexing (SDM),
through multicore fiber technology. Parallel and independent quantum keys are
obtained, which are useful in crypto-systems and future quantum network
Equalization of Third-Order Intermodulation Products in Wideband Direct Conversion Receivers
This paper reports a SAW-less direct-conversion receiver which utilizes a mixed-signal feedforward path to regenerate and adaptively cancel IM3 products, thus accomplishing system-level linearization. The receiver system performance is dominated by a custom integrated RF front end implemented in 130-nm CMOS and achieves an uncorrected out-of-band IIP3 of -7.1 dBm under the worst-case UMTS FDD Region 1 blocking specifications. Under IM3 equalization, the receiver achieves an effective IIP3 of +5.3 dBm and meets the UMTS BER sensitivity requirement with 3.7 dB of margin
DeSyRe: on-Demand System Reliability
The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
Energy-efficient wireless communication
In this chapter we present an energy-efficient highly adaptive network interface architecture and a novel data link layer protocol for wireless networks that provides Quality of Service (QoS) support for diverse traffic types. Due to the dynamic nature of wireless networks, adaptations in bandwidth scheduling and error control are necessary to achieve energy efficiency and an acceptable quality of service. In our approach we apply adaptability through all layers of the protocol stack, and provide feedback to the applications. In this way the applications can adapt the data streams, and the network protocols can adapt the communication parameters
- âŠ