Search CORE

4,839 research outputs found

On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation

Author: Cristal Adrian
Salami Behzad
Unsal Osman
Publication venue
Publication date: 14/06/2018
Field of study

Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due to the inherently compute- and power-intensive structure of Neural Networks (NNs), hardware accelerators emerge as a promising solution. However, with technology node scaling below 10nm, hardware accelerators become more susceptible to faults, which in turn can impact the NN accuracy. In this paper, we study the resilience aspects of Register-Transfer Level (RTL) model of NN accelerators, in particular, fault characterization and mitigation. By following a High-Level Synthesis (HLS) approach, first, we characterize the vulnerability of various components of RTL NN. We observed that the severity of faults depends on both i) application-level specifications, i.e., NN data (inputs, weights, or intermediate), NN layers, and NN activation functions, and ii) architectural-level specifications, i.e., data representation model and the parallelism degree of the underlying accelerator. Second, motivated by characterization results, we present a low-overhead fault mitigation technique that can efficiently correct bit flips, by 47.3% better than state-of-the-art methods.Comment: 8 pages, 6 figure

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

Power-Aware Design Methodologies for FPGA-Based Implementation of Video Processing Systems

Author: Ngo Hau Trung
Publication venue: ODU Digital Commons
Publication date: 01/01/2007
Field of study

The increasing capacity and capabilities of FPGA devices in recent years provide an attractive option for performance-hungry applications in the image and video processing domain. FPGA devices are often used as implementation platforms for image and video processing algorithms for real-time applications due to their programmable structure that can exploit inherent spatial and temporal parallelism. While performance and area remain as two main design criteria, power consumption has become an important design goal especially for mobile devices. Reduction in power consumption can be achieved by reducing the supply voltage, capacitances, clock frequency and switching activities in a circuit. Switching activities can be reduced by architectural optimization of the processing cores such as adders, multipliers, multiply and accumulators (MACS), etc. This dissertation research focuses on reducing the switching activities in digital circuits by considering data dependencies in bit level, word level and block level neighborhoods in a video frame. The bit level data neighborhood dependency consideration for power reduction is illustrated in the design of pipelined array, Booth and log-based multipliers. For an array multiplier, operands of the multipliers are partitioned into higher and lower parts so that the probability of the higher order parts being zero or one increases. The gating technique for the pipelined approach deactivates part(s) of the multiplier when the above special values are detected. For the Booth multiplier, the partitioning and gating technique is integrated into the Booth recoding scheme. In addition, a delay correction strategy is developed for the Booth multiplier to reduce the switching activities of the sign extension part in the partial products. A novel architecture design for the computation of log and inverse-log functions for the reduction of power consumption in arithmetic circuits is also presented. This also utilizes the proposed partitioning and gating technique for further dynamic power reduction by reducing the switching activities. The word level and block level data dependencies for reducing the dynamic power consumption are illustrated by presenting the design of a 2-D convolution architecture. Here the similarities of the neighboring pixels in window-based operations of image and video processing algorithms are considered for reduced switching activities. A partitioning and detection mechanism is developed to deactivate the parallel architecture for window-based operations if higher order parts of the pixel values are the same. A neighborhood dependent approach (NDA) is incorporated with different window buffering schemes. Consideration of the symmetry property in filter kernels is also applied with the NDA method for further reduction of switching activities. The proposed design methodologies are implemented and evaluated in a FPGA environment. It is observed that the dynamic power consumption in FPGA-based circuit implementations is significantly reduced in bit level, data level and block level architectures when compared to state-of-the-art design techniques. A specific application for the design of a real-time video processing system incorporating the proposed design methodologies for low power consumption is also presented. An image enhancement application is considered and the proposed partitioning and gating, and NDA methods are utilized in the design of the enhancement system. Experimental results show that the proposed multi-level power aware methodology achieves considerable power reduction. Research work is progressing In utilizing the data dependencies in subsequent frames in a video stream for the reduction of circuit switching activities and thereby the dynamic power consumption

Old Dominion University

An AER handshake-less modular infrastructure PCB with x8 2.5Gbps LVDS serial links

Author: Iakymchuk T.
Jiménez Fernández Ángel Francisco
Jiménez Moreno Gabriel
Linares Barranco Alejandro
Linares Barranco Bernabé
Rosado A.
Serrano Gotarredona María Teresa
Publication venue: IEEE Computer Society
Publication date: 01/01/2014
Field of study

Nowadays spike-based brain processing emulation is taking off. Several EU and others worldwide projects are demonstrating this, like SpiNNaker, BrainScaleS, FACETS, or NeuroGrid. The larger the brain process emulation on silicon is, the higher the communication performance of the hosting platforms has to be. Many times the bottleneck of these system implementations is not on the performance inside a chip or a board, but in the communication between boards. This paper describes a novel modular Address-Event-Representation (AER) FPGA-based (Spartan6) infrastructure PCB (the AER-Node board) with 2.5Gbps LVDS high speed serial links over SATA cables that offers a peak performance of 32-bit 62.5Meps (Mega events per second) on board-to-board communications. The board allows back compatibility with parallel AER devices supporting up to x2 28-bit parallel data with asynchronous handshake. These boards also allow modular expansion functionality through several daughter boards. The paper is focused on describing in detail the LVDS serial interface and presenting its performance.Ministerio de Ciencia e Innovación TEC2009-10639-C04-02/01Ministerio de Economía y Competitividad TEC2012-37868-C04-02/01Junta de Andalucía TIC-6091Ministerio de Economía y Competitividad PRI-PIMCHI-2011-076

idUS. Depósito de Investigación Universidad de Sevilla

Memory Reliability Enhancement against Multiple Cell Upsets Using Decimal Matrix Code for 32-Bit Data

Author: Satyabala, Assistant Prof. Akhilesh Jain
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/04/2016
Field of study

An important issue in the reliability of memories exposed to radiation environment is transient multiple cells upsets (MCUs). To protect the memory data from radiations and transients many improved packaging techniques are available. But, a particular packaging provides protection from a limited variation of radiations. Today the devices are exposed to a very wide range of environment radiations due to increasing applications in the field of wireless communication. So some additional data preservation techniques are always preferred for authenticating the data before it is processed. Some of these techniques use encoded data to be stored in memories. These techniques are error correction codes (ECCs). It is always preferred to implement an error correction code that requires a less number of redundant bits to be stored and a minimized delay overhead in data correction. This paper presents an FPGA based implementation of memory data error detection and correction code that involves simple decimal addition algorithm in the encoding of data that is to be stored in memory. The decoding of the data for error detection and correction is based on the Hamming Code. This technique involves divide-symbol concept to represent the linear data in groups to make symbolic code. The length of the symbol is inversely proportional to the delay overhead of the cod

International Journal on Recent and Innovation Trends in Computing and Communication

A Hybrid Fault-Tolerant LEON3 Soft Core Processor Implemented in Low-End SRAM FPGA

Author: Entrena Arrontes Luis Alfonso
García Valderas Mario
Lindoso Muñoz Almudena
Parra Avellaneda Luis Isaías
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

In this work we implemented a hybrid fault-tolerant LEON3 soft-core processor in a low-end FPGA (Artix-7) and evaluated its error detection capabilities through neutron irradiation and fault injection in an incremental manner. The error mitigation approach combines the use of SEC/DED codes for memories, a hardware monitor to detect control-flow errors, software-based techniques to detect data errors and configuration memory scrubbing with repair to avoid error accumulation. The proposed solution can significantly improve fault tolerance and can be fully embedded in a low-end FPGA, with reduced overhead and low intrusiveness

Universidad Carlos III de Madrid e-Archivo

Space division multiplexing chip-to-chip quantum key distribution

Author: Bacco Davide
Dalgaard Kjeld
Ding Yunhong
Oxenløwe Leif Katsuo
Rottwit Karsten
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Quantum cryptography is set to become a key technology for future secure communications. However, to get maximum benefit in communication networks, transmission links will need to be shared among several quantum keys for several independent users. Such links will enable switching in quantum network nodes of the quantum keys to their respective destinations. In this paper we present an experimental demonstration of a photonic integrated silicon chip quantum key distribution protocols based on space division multiplexing (SDM), through multicore fiber technology. Parallel and independent quantum keys are obtained, which are useful in crypto-systems and future quantum network

arXiv.org e-Print Archive

Directory of Open Access Journals

Online Research Database In Technology

Equalization of Third-Order Intermodulation Products in Wideband Direct Conversion Receivers

Author: Hajimiri Ali
Keehr Edward A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2008
Field of study

This paper reports a SAW-less direct-conversion receiver which utilizes a mixed-signal feedforward path to regenerate and adaptively cancel IM3 products, thus accomplishing system-level linearization. The receiver system performance is dominated by a custom integrated RF front end implemented in 130-nm CMOS and achieves an uncorrected out-of-band IIP3 of -7.1 dBm under the worst-case UMTS FDD Region 1 blocking specifications. Under IM3 equalization, the receiver achieves an effective IIP3 of +5.3 dBm and meets the UMTS BER sensitivity requirement with 3.7 dB of margin

Caltech Authors

DeSyRe: on-Demand System Reliability

Author: Armato Antonino
Bouganis Christos-Savvas
Falsafi Babak
Gaydadjiev Georgi
Isaza Sebastian
Malek Alirad
Mariani Riccardo
Pnevmatikatos Dionisios N
Pradhan Dhiraj K
Rauwerda Gerard
Seepers Robert
Shafik Rishad Ahmed
Sourdis Ioannis
Strydis Christos
Sunesen Kim
Theodoropoulos Dimitris
Tzilis Stavros
Vavouras Michail
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

Southampton (e-Prints Soton)

EUR Research Repository

Chalmers Research

Chalmers Publication Library

Explore Bristol Research

Energy-efficient wireless communication

Author: Havinga Paul Johannes Mattheus
Publication venue: University of Twente
Publication date: 01/01/2000
Field of study

In this chapter we present an energy-efficient highly adaptive network interface architecture and a novel data link layer protocol for wireless networks that provides Quality of Service (QoS) support for diverse traffic types. Due to the dynamic nature of wireless networks, adaptations in bandwidth scheduling and error control are necessary to achieve energy efficiency and an acceptable quality of service. In our approach we apply adaptability through all layers of the protocol stack, and provide feedback to the applications. In this way the applications can adapt the data streams, and the network protocols can adapt the communication parameters

University of Twente Research Information