2,390 research outputs found

    Number Systems for Deep Neural Network Architectures: A Survey

    Full text link
    Deep neural networks (DNNs) have become an enabling component for a myriad of artificial intelligence applications. DNNs have shown sometimes superior performance, even compared to humans, in cases such as self-driving, health applications, etc. Because of their computational complexity, deploying DNNs in resource-constrained devices still faces many challenges related to computing complexity, energy efficiency, latency, and cost. To this end, several research directions are being pursued by both academia and industry to accelerate and efficiently implement DNNs. One important direction is determining the appropriate data representation for the massive amount of data involved in DNN processing. Using conventional number systems has been found to be sub-optimal for DNNs. Alternatively, a great body of research focuses on exploring suitable number systems. This article aims to provide a comprehensive survey and discussion about alternative number systems for more efficient representations of DNN data. Various number systems (conventional/unconventional) exploited for DNNs are discussed. The impact of these number systems on the performance and hardware design of DNNs is considered. In addition, this paper highlights the challenges associated with each number system and various solutions that are proposed for addressing them. The reader will be able to understand the importance of an efficient number system for DNN, learn about the widely used number systems for DNN, understand the trade-offs between various number systems, and consider various design aspects that affect the impact of number systems on DNN performance. In addition, the recent trends and related research opportunities will be highlightedComment: 28 page

    Integrated 2-D Optical Flow Sensor

    Get PDF
    I present a new focal-plane analog VLSI sensor that estimates optical flow in two visual dimensions. The chip significantly improves previous approaches both with respect to the applied model of optical flow estimation as well as the actual hardware implementation. Its distributed computational architecture consists of an array of locally connected motion units that collectively solve for the unique optimal optical flow estimate. The novel gradient-based motion model assumes visual motion to be translational, smooth and biased. The model guarantees that the estimation problem is computationally well-posed regardless of the visual input. Model parameters can be globally adjusted, leading to a rich output behavior. Varying the smoothness strength, for example, can provide a continuous spectrum of motion estimates, ranging from normal to global optical flow. Unlike approaches that rely on the explicit matching of brightness edges in space or time, the applied gradient-based model assures spatiotemporal continuity on visual information. The non-linear coupling of the individual motion units improves the resulting optical flow estimate because it reduces spatial smoothing across large velocity differences. Extended measurements of a 30x30 array prototype sensor under real-world conditions demonstrate the validity of the model and the robustness and functionality of the implementation

    Design of Novel Algorithm and Architecture for Gaussian Based Color Image Enhancement System for Real Time Applications

    Full text link
    This paper presents the development of a new algorithm for Gaussian based color image enhancement system. The algorithm has been designed into architecture suitable for FPGA/ASIC implementation. The color image enhancement is achieved by first convolving an original image with a Gaussian kernel since Gaussian distribution is a point spread function which smoothen the image. Further, logarithm-domain processing and gain/offset corrections are employed in order to enhance and translate pixels into the display range of 0 to 255. The proposed algorithm not only provides better dynamic range compression and color rendition effect but also achieves color constancy in an image. The design exploits high degrees of pipelining and parallel processing to achieve real time performance. The design has been realized by RTL compliant Verilog coding and fits into a single FPGA with a gate count utilization of 321,804. The proposed method is implemented using Xilinx Virtex-II Pro XC2VP40-7FF1148 FPGA device and is capable of processing high resolution color motion pictures of sizes of up to 1600x1200 pixels at the real time video rate of 116 frames per second. This shows that the proposed design would work for not only still images but also for high resolution video sequences.Comment: 15 pages, 15 figure

    Design and Implementation of FPGA-based Hardware Accelerator for Bayesian Confidence Propagation Neural Network

    Get PDF
    The Bayesian confidence propagation neural network (BCPNN) has been widely used for neural computation and machine learning domains. However, the current implementations of BCPNN are not computationally efficient enough, especially in the update of synaptic state variables. This thesis proposes a hardware accelerator for the training and inference process of BCPNN. In the hardware design, several techniques are employed, including a hybrid update mechanism, customized LUT-based design for exponential operations, and optimized design that maximizes parallelism. The proposed hardware accelerator is implemented on an FPGA device. The results show that the computing speed of the accelerator can improve the CPU counterpart by two orders of magnitude. In addition, the computational modules of the accelerator can be reused to reduce hardware overheads while achieving comparable computing performance. The accelerator's potential to facilitate the efficient implementation for large-scale BCPNN neural networks opens up the possibility to realize higher-level cognitive phenomena, such as associative memory and working memory
    • …
    corecore