10 research outputs found

    Propagation of Delay in Probabilistic CMOS Systems

    Get PDF
    Future low voltage noise dominated designs render probabilistic behavior of CMOS. This is acceptable as far as applications’ intrinsic error resilience allows quantified inaccuracy in results to save energy consumption, such as in applications like audio/video processing and sky image formation in radio astronomy. This introduces the trade-off between energy consumption (E) and probability of correctness (p) that provides an opportunity for inexact computing to attain higher energy efficiency. Efforts have been made in the last decade to model probabilistic CMOS (PCMOS) keeping in view the noise variance and to establish its feasibility for error resilient applications focused on the nominal voltage range. However, exploiting the near threshold voltage (NTV) range is quite a promising energy efficient design technique that operates the hardware at relatively slower pace while retaining the deterministic property of computations. We propose to take the advantage of energy efficiency at NTV while retaining the speed as constant, sacrificing p to the extent allowed by applications resilience. In this regard, we investigated the impact of NTV operation on PCMOS where more energy can be saved with less accurate results. Our simulation results of an inverter and a 4-bit ripple carry adder in Cadence showed the shortcomings of current analytical models for probability of correctness at NTV and lower voltage supplies. We further investigated the impact of delay propagation in a digital system composed of probabilistic building blocks, which provides a clear insight of timing delay affecting the higher significant computational bits more than its lower significant counterparts and hence contributing considerably to the total error

    Energy-Efficient Approximate Least Squares Accelerator:A Case Study of Radio Astronomy Calibration Processing

    Get PDF
    Approximate computing allows the introduction of inaccuracy in the computation for cost savings, such as energy consumption, chip-area, and latency. Targeting energy efficiency, approximate designs for multipliers, adders, and multiply-accumulate (MAC) have been extensively investigated in the past decade. However, accelerator designs for relatively bigger architectures have been of less attention yet. The Least Squares (LS) algorithm is widely used in digital signal processing applications, e.g., image reconstruction. This work proposes a novel LS accelerator design based on a heterogeneous architecture, where the heterogeneity is introduced using accurate and approximate processing cores. We have considered a case study of radio astronomy calibration processing that employs a complex-input iterative LS algorithm. Our proposed methodology exploits the intrinsic error-resilience of the aforesaid algorithm, where initial iterations are processed on approximate modules while the later ones on accurate modules. Our energy-quality experiments have shown up to 24% of energy savings as compared to an accurate (optimized) counterpart for biased designs and up to 29% energy savings when unbiasing is introduced. The proposed LS accelerator design does not increase the number of iterations and provides sufficient precision to converge to an acceptable solution

    Exploiting error resilience for hardware efficiency: targeting iterative and accumulation based algorithms

    Get PDF
    Computing devices have been constantly challenged by resource-hungry applications such as scientific computing. These applications demand high hardware efficiency and thus pose a challenge to reduce energy/power consumption, latency, and chip-area to process a required task. Therefore, an increase in hardware efficiency is one of the major goals to innovate computing devices. On the other hand, improvements in process technology have played an important role to tackle such challenges by increasing the performance and transistor density of integrated circuits while keeping their power density constant. In the last couple of decades, however, the efficiency gains due to process technology improvements are reaching the fundamental limits of computing. For instance, the power density is not scaling as well as compared to the transistor density. Hence, posing a further challenge to control the power-/thermal-budget of the integrated circuits. Keeping in view that many applications/algorithms are error-resilient, emerging paradigms like approximate computing come to rescue by offering promising efficiency gains especially in terms of power-efficiency. An application/algorithm can be regarded as error-resilient or error-tolerant when it provides an outcome with a required accuracy while utilizing processing-components that do not always compute accurately. There can be multiple reasons that an algorithm is tolerant of errors, for instance, an algorithm may have noisy or redundant inputs and/or a range of acceptable outcomes. Examples of such applications are machine learning, scientific computing, and search engines. Approximate computing techniques exploit the intrinsic error tolerance of such applications to optimize the computing systems at software-, architecture- and circuit-level to achieve efficiency gains. However, the state-of-the-art approximate computing methodologies do not sufficiently address the accelerator designs for iterative and accumulation based algorithms. Taking into account a wide range of such algorithms in digital signal processing, this thesis investigates approximation methodologies to achieve high-efficiency accelerator architectures for iterative and accumulation based algorithms. As a case study, we apply our proposed approximate computing methodologies to radio astronomy calibration processing which results in a more effective quality-efficiency trade-off as compared to the state-of-the-art approximate computing methodologies

    Go Green Radio Astronomy: Approximate Computing Perspective: Opportunities and Challenges: POSTER

    Get PDF
    Modern radio telescopes require highly energy/power-efficient computing systems. Signal processing pipelines of such radio telescopes are dominated by accumulation based iterative processes. As the input signal received at a radio telescope is regarded as Gaussian noise, employing approximate computing looks promising. Therefore, we present opportunities and challenges offered by the approximate computing paradigm to achieve the required efficiency targets

    MACISH: Designing Approximate MAC Accelerators with Internal-Self-Healing

    Get PDF
    Approximate computing studies the quality-efficiency trade-off to attain a best-efficiency (e.g., area, latency, and power) design for a given quality constraint and vice versa. Recently, self-healing methodologies for approximate computing have emerged that showed an effective quality-efficiency tradeoff as compared to the conventional error-restricted approximate computing methodologies. However, state-of-the-art self-healing methodologies are constrained to highly parallel implementations with similar modules (or parts of a datapath) in multiples of two and for square-accumulate functions through the pairing of mirror versions to achieve error cancellation. In this article, we propose a novel methodology for InternalSelf-Healing (ISH) that allows exploiting self-healing within a computing element internally without requiring a paired, parallel module, which extends the applicability to irregular/asymmetric datapaths while relieving the restriction of multiples of two for modules in a given datapath, as well as going beyond square functions. We employ our ISH methodology to design an approximate multiply-accumulate (xMAC), wherein the multiplier is regarded as an approximation stage and the accumulator as a healing stage. We propose to approximate a recursive multiplier in such a way that a near-to-zero average error is achieved for a given input distribution to cancel out the error at an accurate accumulation stage. To increase the efficacy of such a multiplier, we propose a novel 2 Ă— 2 approximate multiplier design that alleviates the overflow problem within an n Ă— n approximate recursive multiplier. The proposed ISH methodology shows a more effective quality-efficiency trade-off for an xMAC as compared to the conventional error-restricted methodologies for random inputs and for radio-astronomy calibration processing (up to 55% better quality output for equivalent-efficiency designs)

    Squash: Approximate Square-Accumulate With Self-Healing

    Get PDF
    Approximate computing strives to achieve the highest performance-, area-, and power-efficiency for a given quality constraint and vice versa. Conventional approximate design methodology restricts the introduction of errors to avoid a high loss in quality. However, this limits the computing efficiency and the number of pareto-optimal design alternatives for a quality-efficiency tradeoff. This paper presents a novel self-healing (SH) methodology for an approximate square-accumulate (SAC) architecture. SAC refers to a hardware architecture that computes the inner product of a vector with itself. SH exploits the algorithmic error resilience of the SAC structure to ensure an effective quality-efficiency tradeoff, wherein the squarer is regarded as an approximation stage, and the accumulator as a healing stage. We propose to deploy an approximate squarer mirror pair , such that the error introduced by one approximate squarer mirrors the error introduced by the other, i.e., the errors generated by the approximate squarers are approximately the additive inverse of each other. This helps the healing stage (accumulator) to automatically average out the error originated in the approximation stage, and thereby to minimize the quality loss. For random input vectors, SH demonstrates up to 25% and 18.6% better area and power efficiency, respectively, with a better quality output than the conventional approximate computing methodology. As a case study, SH is applied to one of the computationally expensive components (SAC) of the radio astronomy calibration application, where it shows up to 46.7% better quality for equivalent computing efficiency as that of conventional methodology

    Exploiting Errors for Efficiency: A Survey from Circuits to Applications

    Get PDF
    When a computational task tolerates a relaxation of its specification or when an algorithm tolerates the effects of noise in its execution, hardware, system software, and programming language compilers or their runtime systems can trade deviations from correct behavior for lower resource usage. We present, for the first time, a synthesis of research results on computing systems that only make as many errors as their end-to-end applications can tolerate. The results span the disciplines of computer-aided design of circuits, digital system design, computer architecture, programming languages, operating systems, and information theory. Rather than over-provisioning the resources controlled by each of these layers of abstraction to avoid errors, it can be more efficient to exploit the masking of errors occurring at one layer and thereby prevent those errors from propagating to a higher layer.We demonstrate the potential benefits of end-to-end approaches using two illustrative examples. We introduce a formalization of terminology that allows us to present a coherent view across the techniques traditionally used by different research communities in their individual layer of focus. Using this formalization, we survey tradeoffs for individual layers of computing systems at the circuit, architecture, operating system, and programming language levels as well as fundamental information-theoretic limits to tradeoffs between resource usage and correctness
    corecore