7 research outputs found

    FPGA-based real-time lane detection for advanced driver assistance systems

    No full text
    1

    Approximate Radix-4 Booth Multiplication Circuit

    No full text
    Many modern applications, such as object recognition using deep neural networks, require extremely large numbers of multiplications, but can sacrifice accuracy in order to achieve lower power usage and faster operation. This paper proposes a new approximate multiplier design based on radix-4 Booth encoding. The key novel aspect of the proposed design is that approximate circuits are designed to create intermediate terms, which are then used as the common inputs to almost all of the logic within one entire row of a partial product array, resulting in a multi-level logic circuit implementation with extremely low delay and power usage characteristics. The proposed 8-bit (16-bit) design improves the power delay product by 17.1% to 30.3% (88.9% to 96.4%) over the previous best designs. By using accurate, approximated, and truncated regions, a wide range of approximate multiplier designs with different error characteristics are possible. Using normalized mean error distance and relative error distance metrics, simulations using synthesized circuits are used to show that the proposed designs have significantly improved power/accuracy tradeoffs over the previous best designs.11Nsciescopuskc

    Hierarchical Approximate Memory for Deep Neural Network Applications

    No full text
    Power consumed by a computer memory system can be significantly reduced if a certain level of error is permitted in the data stored in memory. Such an approximate memory approach is viable for use in applications developed using deep neural networks (DNNs) because such applications are typically error-resilient. In this paper, the use of hierarchical approximate memory for DNNs is studied and modeled. Although previous research has focused on approximate memory for specific memory technologies, this work proposes to consider approximate memory for the entire memory hierarchy of a computer system by considering the error budget for a given target application. This paper proposes a system model based on the error budget (amount by which the memory error rate can be permitted to rise to) for a target application and the power usage characteristics of the constituent memory technologies of a memory hierarchy. Using DNN case studies involving SRAM, DRAM, and NAND, this paper shows that the overall memory power consumption can be reduced by up to 43.38% by using the proposed model to optimally divide up the available error budget.1

    High-Throughput and Low-Latency Digital Baseband Architecture for Energy-Efficient Wireless VR Systems

    No full text
    This paper presents a novel baseband architecture that supports high-speed wireless VR solutions using 60 GHz RF circuits. Based on the experimental observations by our previous 60 GHz transceiver circuits, the efficient baseband architecture is proposed to enhance the quality of transmission. To achieve a zero-latency transmission, we define an (106,920, 95,040) interleaved-BCH error-correction code (ECC), which removes iterative processing steps in the previous LDPC ECC standardized for the near-field wireless communication. Introducing the block-level interleaving, the proposed baseband processing successfully scatters the existing burst errors to the small-sized component codes, and recovers up to 1080 consecutive bit errors in a data frame of 106,920 bits. To support the high-speed wireless VR system, we also design the massive-parallel BCH encoder and decoder, which is tightly connected to the block-level interleaver and de-interleaver. Including the high-speed analog interfaces for the external devices, the proposed baseband architecture is designed in 65 nm CMOS, supporting a data rate of up to 12.8 Gbps. Experimental results show that the proposed wireless VR solution can transfer up to 4 K high-resolution video streams without using time-consuming compression and decompression, successfully achieving a transfer latency of 1 ms

    An energy-optimized (37840, 34320) symmetric BC-BCH decoder for healthy mobile storages

    No full text
    1

    Energy-Efficient Symmetric BC-BCH Decoder Architecture for Mobile Storages

    No full text
    Recently, symmetric block-wise concatenated-BCH (SBC-BCH) codes are proposed as strong error-correcting codes (ECCs) based on hard-decision channel outputs, which is especially suited for storages using NAND flash memories. Targeting energy-efficient NAND flash memory applications, this paper presents an energy-optimized decoder architecture which includes an iterative decoder for a SBC-BCH code as a main decoder and a low-complexity auxiliary decoder for a block-wise single parity-check (BSPC) code. The auxiliary decoder is opportunistically in action to break the dominant error bound associated with the SBC-BCH code, which allows one to lower the uncorrectable bit-error-rate (UBER) to 10(-15) in an energy efficient way. This work presents several design-level optimizations for further enhancing the energy-efficiency of the iterative SBC-BCH decoder. More precisely, the new initialization scheme is proposed for ensuring the energy-efficient seamless decoding scenario. The syndrome tracking is applied to eliminate the previous syndrome calculation and the reordered Chien search further enhances the energy-efficiency as well as the decoding throughput. Targeting a 0.9-rate 4KB SBC-BCH code for commercialized storages using NAND flash memories, a prototype decoder consisting of both the iterative main and auxiliary decoders is designed in a 65-nm CMOS process. By applying the proposed optimizations, the prototype decoder achieves an energy-efficiency of 3.43 pJ/b while providing a decoding throughput of 13.2 Gb/s, which is superior to the previous state-of-the-art decoders for mobile storages.11Nsciescopu
    corecore