738 research outputs found

    Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

    Full text link
    The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.Comment: Under Review at ACM Computing Survey

    Low-Power Embedded Design Solutions and Low-Latency On-Chip Interconnect Architecture for System-On-Chip Design

    Get PDF
    This dissertation presents three design solutions to support several key system-on-chip (SoC) issues to achieve low-power and high performance. These are: 1) joint source and channel decoding (JSCD) schemes for low-power SoCs used in portable multimedia systems, 2) efficient on-chip interconnect architecture for massive multimedia data streaming on multiprocessor SoCs (MPSoCs), and 3) data processing architecture for low-power SoCs in distributed sensor network (DSS) systems and its implementation. The first part includes a low-power embedded low density parity check code (LDPC) - H.264 joint decoding architecture to lower the baseband energy consumption of a channel decoder using joint source decoding and dynamic voltage and frequency scaling (DVFS). A low-power multiple-input multiple-output (MIMO) and H.264 video joint detector/decoder design that minimizes energy for portable, wireless embedded systems is also designed. In the second part, a link-level quality of service (QoS) scheme using unequal error protection (UEP) for low-power network-on-chip (NoC) and low latency on-chip network designs for MPSoCs is proposed. This part contains WaveSync, a low-latency focused network-on-chip architecture for globally-asynchronous locally-synchronous (GALS) designs and a simultaneous dual-path routing (SDPR) scheme utilizing path diversity present in typical mesh topology network-on-chips. SDPR is akin to having a higher link width but without the significant hardware overhead associated with simple bus width scaling. The last part shows data processing unit designs for embedded SoCs. We propose a data processing and control logic design for a new radiation detection sensor system generating data at or above Peta-bits-per-second level. Implementation results show that the intended clock rate is achieved within the power target of less than 200mW. We also present a digital signal processing (DSP) accelerator supporting configurable MAC, FFT, FIR, and 3-D cross product operations for embedded SoCs. It consumes 12.35mW along with 0.167mm2 area at 333MHz

    Energy autonomous systems : future trends in devices, technology, and systems

    Get PDF
    The rapid evolution of electronic devices since the beginning of the nanoelectronics era has brought about exceptional computational power in an ever shrinking system footprint. This has enabled among others the wealth of nomadic battery powered wireless systems (smart phones, mp3 players, GPS, …) that society currently enjoys. Emerging integration technologies enabling even smaller volumes and the associated increased functional density may bring about a new revolution in systems targeting wearable healthcare, wellness, lifestyle and industrial monitoring applications

    Power-Aware Design Methodologies for FPGA-Based Implementation of Video Processing Systems

    Get PDF
    The increasing capacity and capabilities of FPGA devices in recent years provide an attractive option for performance-hungry applications in the image and video processing domain. FPGA devices are often used as implementation platforms for image and video processing algorithms for real-time applications due to their programmable structure that can exploit inherent spatial and temporal parallelism. While performance and area remain as two main design criteria, power consumption has become an important design goal especially for mobile devices. Reduction in power consumption can be achieved by reducing the supply voltage, capacitances, clock frequency and switching activities in a circuit. Switching activities can be reduced by architectural optimization of the processing cores such as adders, multipliers, multiply and accumulators (MACS), etc. This dissertation research focuses on reducing the switching activities in digital circuits by considering data dependencies in bit level, word level and block level neighborhoods in a video frame. The bit level data neighborhood dependency consideration for power reduction is illustrated in the design of pipelined array, Booth and log-based multipliers. For an array multiplier, operands of the multipliers are partitioned into higher and lower parts so that the probability of the higher order parts being zero or one increases. The gating technique for the pipelined approach deactivates part(s) of the multiplier when the above special values are detected. For the Booth multiplier, the partitioning and gating technique is integrated into the Booth recoding scheme. In addition, a delay correction strategy is developed for the Booth multiplier to reduce the switching activities of the sign extension part in the partial products. A novel architecture design for the computation of log and inverse-log functions for the reduction of power consumption in arithmetic circuits is also presented. This also utilizes the proposed partitioning and gating technique for further dynamic power reduction by reducing the switching activities. The word level and block level data dependencies for reducing the dynamic power consumption are illustrated by presenting the design of a 2-D convolution architecture. Here the similarities of the neighboring pixels in window-based operations of image and video processing algorithms are considered for reduced switching activities. A partitioning and detection mechanism is developed to deactivate the parallel architecture for window-based operations if higher order parts of the pixel values are the same. A neighborhood dependent approach (NDA) is incorporated with different window buffering schemes. Consideration of the symmetry property in filter kernels is also applied with the NDA method for further reduction of switching activities. The proposed design methodologies are implemented and evaluated in a FPGA environment. It is observed that the dynamic power consumption in FPGA-based circuit implementations is significantly reduced in bit level, data level and block level architectures when compared to state-of-the-art design techniques. A specific application for the design of a real-time video processing system incorporating the proposed design methodologies for low power consumption is also presented. An image enhancement application is considered and the proposed partitioning and gating, and NDA methods are utilized in the design of the enhancement system. Experimental results show that the proposed multi-level power aware methodology achieves considerable power reduction. Research work is progressing In utilizing the data dependencies in subsequent frames in a video stream for the reduction of circuit switching activities and thereby the dynamic power consumption

    Timing-Error Tolerance Techniques for Low-Power DSP: Filters and Transforms

    Get PDF
    Low-power Digital Signal Processing (DSP) circuits are critical to commercial System-on-Chip design for battery powered devices. Dynamic Voltage Scaling (DVS) of digital circuits can reclaim worst-case supply voltage margins for delay variation, reducing power consumption. However, removing static margins without compromising robustness is tremendously challenging, especially in an era of escalating reliability concerns due to continued process scaling. The Razor DVS scheme addresses these concerns, by ensuring robustness using explicit timing-error detection and correction circuits. Nonetheless, the design of low-complexity and low-power error correction is often challenging. In this thesis, the Razor framework is applied to fixed-precision DSP filters and transforms. The inherent error tolerance of many DSP algorithms is exploited to achieve very low-overhead error correction. Novel error correction schemes for DSP datapaths are proposed, with very low-overhead circuit realisations. Two new approximate error correction approaches are proposed. The first is based on an adapted sum-of-products form that prevents errors in intermediate results reaching the output, while the second approach forces errors to occur only in less significant bits of each result by shaping the critical path distribution. A third approach is described that achieves exact error correction using time borrowing techniques on critical paths. Unlike previously published approaches, all three proposed are suitable for high clock frequency implementations, as demonstrated with fully placed and routed FIR, FFT and DCT implementations in 90nm and 32nm CMOS. Design issues and theoretical modelling are presented for each approach, along with SPICE simulation results demonstrating power savings of 21 – 29%. Finally, the design of a baseband transmitter in 32nm CMOS for the Spectrally Efficient FDM (SEFDM) system is presented. SEFDM systems offer bandwidth savings compared to Orthogonal FDM (OFDM), at the cost of increased complexity and power consumption, which is quantified with the first VLSI architecture

    Embracing Low-Power Systems with Improvement in Security and Energy-Efficiency

    Get PDF
    As the economies around the world are aligning more towards usage of computing systems, the global energy demand for computing is increasing rapidly. Additionally, the boom in AI based applications and services has already invited the pervasion of specialized computing hardware architectures for AI (accelerators). A big chunk of research in the industry and academia is being focused on providing energy efficiency to all kinds of power hungry computing architectures. This dissertation adds to these efforts. Aggressive voltage underscaling of chips is one the effective low power paradigms of providing energy efficiency. This dissertation identifies and deals with the reliability and performance problems associated with this paradigm and innovates novel energy efficient approaches. Specifically, the properties of a low power security primitive have been improved and, higher performance has been unlocked in an AI accelerator (Google TPU) in an aggressively voltage underscaled environment. And, novel power saving opportunities have been unlocked by characterizing the usage pattern of a baseline TPU with rigorous mathematical analysis

    Variation-aware high-level DSP circuit design optimisation framework for FPGAs

    Get PDF
    The constant technology shrinking and the increasing demand for systems that operate under different power profiles with the maximum performance, have motivated the work in this thesis. Modern design tools that target FPGA devices take a conservative approach in the estimation of the maximum performance that can be achieved by a design when it is placed on a device, accounting for any variability in the fabrication process of the device. The work presented here takes a new view on the performance improvement of DSP designs by pushing them into the error-prone regime, as defined by the synthesis tools, and by investigating methodologies that reduce the impact of timing errors at the output of the system. In this work two novel error reduction techniques are proposed to address this problem. One is based on reduced-precision redundancy and the other on an error optimisation framework that uses information from a prior characterisation of the device. The first one is a generic architecture that is appended to existing arithmetic operators. The second defines the high-level parameters of the algorithm without using extra resources. Both of these methods allow to achieve graceful degradation whilst variation increases. A comparison of the new methods is laid against the existing methodologies, and conclusions drawn on the tradeoffs between their cost, in terms of resources and errors, and their benefits in terms of throughput. In some cases it is possible to double the performance of the design while still producing valid results.Open Acces
    corecore