3,314 research outputs found

    Cross-Layer Automated Hardware Design for Accuracy-Configurable Approximate Computing

    Get PDF
    Approximate Computing trades off computation accuracy against performance or energy efficiency. It is a design paradigm that arose in the last decade as an answer to diminishing returns from Dennard\u27s scaling and a shift in the prominent workloads. A range of modern workloads, categorized mainly as recognition, mining, and synthesis, features an inherent tolerance to approximations. Their characteristics, such as redundancies in their input data and robust-to-noise algorithms, allow them to produce outputs of acceptable quality, despite an approximation in some of their computations. Approximate Computing leverages the application tolerance by relaxing the exactness in computation towards primary design goals of increasing performance or improving energy efficiency. Existing techniques span across the abstraction layers of computer systems where cross-layer techniques are shown to offer a larger design space and yield higher savings. Currently, the majority of the existing work aims at meeting a single accuracy. The extent of approximation tolerance, however, significantly varies with a change in input characteristics and applications. In this dissertation, methods and implementations are presented for cross-layer and automated design of accuracy-configurable Approximate Computing to maximally exploit the performance and energy benefits. In particular, this dissertation addresses the following challenges and introduces novel contributions: A main Approximate Computing category in hardware is to scale either voltage or frequency beyond the safe limits for power or performance benefits, respectively. The rationale is that timing errors would be gradual and for an initial range tolerable. This scaling enables a fine-grain accuracy-configurability by varying the timing error occurrence. However, conventional synthesis tools aim at meeting a single delay for all paths within the circuit. Subsequently, with voltage or frequency scaling, either all paths succeed, or a large number of paths fail simultaneously, with a steep increase in error rate and magnitude. This dissertation presents an automated method for minimizing path delays by individually constraining the primary outputs of combinational circuits. As a result, it reduces the number of failing paths and makes the timing errors significantly more gradual, and also rarer and smaller on average. Additionally, it reveals that delays can be significantly reduced towards the least significant bit (LSB) and allows operating at a higher frequency when small operands are computed. Precision scaling, i.e., reducing the representation of data and its accuracy is widely used in multiple abstraction layers in Approximate Computing. Reducing data precision also reduces the transistor toggles, and therefore the dynamic power consumption. Application and architecture level precision scaling results in using only LSBs of the circuit. Arithmetic circuits often have less complexity and logic depth in LSBs compared to most significant bits (MSB). To take advantage of this circuit property, a delay-altering synthesis methodology is proposed. The method finds energy-optimal delay values under configurable precision usage and assigns them to primary outputs used for different precisions. Thereby, it enables dynamic frequency-precision scalable circuits for energy efficiency. Within the hardware architecture, it is possible to instantiate multiple units with the same functionality with different fixed approximation levels, where each block benefits from having fewer transistors and also synthesis relaxations. These blocks can be selected dynamically and thus allow to configure the accuracy during runtime. Instantiating such approximate blocks can be a lower dynamic power but higher area and leakage cost alternative to the current state-of-the-art gating mechanisms which switch off a group of paths in the circuit to reduce the toggling activity. Jointly, instantiating multiple blocks and gating mechanisms produce a large design space of accuracy-configurable hardware, where energy-optimal solutions require a cross-layer search in architecture and circuit levels. To that end, an approximate hardware synthesis methodology is proposed with joint optimizations in architecture and circuit for dynamic accuracy scaling, and thereby it enables energy vs. area trade-offs

    Enhancing Real-time Embedded Image Processing Robustness on Reconfigurable Devices for Critical Applications

    Get PDF
    Nowadays, image processing is increasingly used in several application fields, such as biomedical, aerospace, or automotive. Within these fields, image processing is used to serve both non-critical and critical tasks. As example, in automotive, cameras are becoming key sensors in increasing car safety, driving assistance and driving comfort. They have been employed for infotainment (non-critical), as well as for some driver assistance tasks (critical), such as Forward Collision Avoidance, Intelligent Speed Control, or Pedestrian Detection. The complexity of these algorithms brings a challenge in real-time image processing systems, requiring high computing capacity, usually not available in processors for embedded systems. Hardware acceleration is therefore crucial, and devices such as Field Programmable Gate Arrays (FPGAs) best fit the growing demand of computational capabilities. These devices can assist embedded processors by significantly speeding-up computationally intensive software algorithms. Moreover, critical applications introduce strict requirements not only from the real-time constraints, but also from the device reliability and algorithm robustness points of view. Technology scaling is highlighting reliability problems related to aging phenomena, and to the increasing sensitivity of digital devices to external radiation events that can cause transient or even permanent faults. These faults can lead to wrong information processed or, in the worst case, to a dangerous system failure. In this context, the reconfigurable nature of FPGA devices can be exploited to increase the system reliability and robustness by leveraging Dynamic Partial Reconfiguration features. The research work presented in this thesis focuses on the development of techniques for implementing efficient and robust real-time embedded image processing hardware accelerators and systems for mission-critical applications. Three main challenges have been faced and will be discussed, along with proposed solutions, throughout the thesis: (i) achieving real-time performances, (ii) enhancing algorithm robustness, and (iii) increasing overall system's dependability. In order to ensure real-time performances, efficient FPGA-based hardware accelerators implementing selected image processing algorithms have been developed. Functionalities offered by the target technology, and algorithm's characteristics have been constantly taken into account while designing such accelerators, in order to efficiently tailor algorithm's operations to available hardware resources. On the other hand, the key idea for increasing image processing algorithms' robustness is to introduce self-adaptivity features at algorithm level, in order to maintain constant, or improve, the quality of results for a wide range of input conditions, that are not always fully predictable at design-time (e.g., noise level variations). This has been accomplished by measuring at run-time some characteristics of the input images, and then tuning the algorithm parameters based on such estimations. Dynamic reconfiguration features of modern reconfigurable FPGA have been extensively exploited in order to integrate run-time adaptivity into the designed hardware accelerators. Tools and methodologies have been also developed in order to increase the overall system dependability during reconfiguration processes, thus providing safe run-time adaptation mechanisms. In addition, taking into account the target technology and the environments in which the developed hardware accelerators and systems may be employed, dependability issues have been analyzed, leading to the development of a platform for quickly assessing the reliability and characterizing the behavior of hardware accelerators implemented on reconfigurable FPGAs when they are affected by such faults

    Digital implementation of the cellular sensor-computers

    Get PDF
    Two different kinds of cellular sensor-processor architectures are used nowadays in various applications. The first is the traditional sensor-processor architecture, where the sensor and the processor arrays are mapped into each other. The second is the foveal architecture, in which a small active fovea is navigating in a large sensor array. This second architecture is introduced and compared here. Both of these architectures can be implemented with analog and digital processor arrays. The efficiency of the different implementation types, depending on the used CMOS technology, is analyzed. It turned out, that the finer the technology is, the better to use digital implementation rather than analog

    Event-based neuromorphic stereo vision

    Full text link

    Canadian Hydrogen Intensity Mapping Experiment (CHIME) Pathfinder

    Full text link
    A pathfinder version of CHIME (the Canadian Hydrogen Intensity Mapping Experiment) is currently being commissioned at the Dominion Radio Astrophysical Observatory (DRAO) in Penticton, BC. The instrument is a hybrid cylindrical interferometer designed to measure the large scale neutral hydrogen power spectrum across the redshift range 0.8 to 2.5. The power spectrum will be used to measure the baryon acoustic oscillation (BAO) scale across this poorly probed redshift range where dark energy becomes a significant contributor to the evolution of the Universe. The instrument revives the cylinder design in radio astronomy with a wide field survey as a primary goal. Modern low-noise amplifiers and digital processing remove the necessity for the analog beamforming that characterized previous designs. The Pathfinder consists of two cylinders 37\,m long by 20\,m wide oriented north-south for a total collecting area of 1,500 square meters. The cylinders are stationary with no moving parts, and form a transit instrument with an instantaneous field of view of ∌\sim100\,degrees by 1-2\,degrees. Each CHIME Pathfinder cylinder has a feedline with 64 dual polarization feeds placed every ∌\sim30\,cm which Nyquist sample the north-south sky over much of the frequency band. The signals from each dual-polarization feed are independently amplified, filtered to 400-800\,MHz, and directly sampled at 800\,MSps using 8 bits. The correlator is an FX design, where the Fourier transform channelization is performed in FPGAs, which are interfaced to a set of GPUs that compute the correlation matrix. The CHIME Pathfinder is a 1/10th scale prototype version of CHIME and is designed to detect the BAO feature and constrain the distance-redshift relation.Comment: 20 pages, 12 figures. submitted to Proc. SPIE, Astronomical Telescopes + Instrumentation (2014

    Null Convention Logic applications of asynchronous design in nanotechnology and cryptographic security

    Get PDF
    This dissertation presents two Null Convention Logic (NCL) applications of asynchronous logic circuit design in nanotechnology and cryptographic security. The first application is the Asynchronous Nanowire Reconfigurable Crossbar Architecture (ANRCA); the second one is an asynchronous S-Box design for cryptographic system against Side-Channel Attacks (SCA). The following are the contributions of the first application: 1) Proposed a diode- and resistor-based ANRCA (DR-ANRCA). Three configurable logic block (CLB) structures were designed to efficiently reconfigure a given DR-PGMB as one of the 27 arbitrary NCL threshold gates. A hierarchical architecture was also proposed to implement the higher level logic that requires a large number of DR-PGMBs, such as multiple-bit NCL registers. 2) Proposed a memristor look-up-table based ANRCA (MLUT-ANRCA). An equivalent circuit simulation model has been presented in VHDL and simulated in Quartus II. Meanwhile, the comparison between these two ANRCAs have been analyzed numerically. 3) Presented the defect-tolerance and repair strategies for both DR-ANRCA and MLUT-ANRCA. The following are the contributions of the second application: 1) Designed an NCL based S-Box for Advanced Encryption Standard (AES). Functional verification has been done using Modelsim and Field-Programmable Gate Array (FPGA). 2) Implemented two different power analysis attacks on both NCL S-Box and conventional synchronous S-Box. 3) Developed a novel approach based on stochastic logics to enhance the resistance against DPA and CPA attacks. The functionality of the proposed design has been verified using an 8-bit AES S-box design. The effects of decision weight, bitstream length, and input repetition times on error rates have been also studied. Experimental results shows that the proposed approach enhances the resistance to against the CPA attack by successfully protecting the hidden key --Abstract, page iii

    The S2 VLBI Correlator: A Correlator for Space VLBI and Geodetic Signal Processing

    Get PDF
    We describe the design of a correlator system for ground and space-based VLBI. The correlator contains unique signal processing functions: flexible LO frequency switching for bandwidth synthesis; 1 ms dump intervals, multi-rate digital signal-processing techniques to allow correlation of signals at different sample rates; and a digital filter for very high resolution cross-power spectra. It also includes autocorrelation, tone extraction, pulsar gating, signal-statistics accumulation.Comment: 44 pages, 13 figure
    • 

    corecore