320 research outputs found

    Customizing Fixed-Point and Floating-Point Arithmetic - A Case Study in K-Means Clustering

    Get PDF
    International audienceThis paper presents a comparison between custom fixed-point (FxP) and floating-point (FlP) arithmetic, applied to bidimensional K-means clustering algorithm. After a discussion on the K-means clustering algorithm and arithmetic characteristics, hardware implementations of FxP and FlP arithmetic operators are compared in terms of area, delay and energy, for different bitwidth, using the ApxPerf2.0 framework. Finally, both are compared in the context of K-means clustering. The direct comparison shows the large difference between 8-to-16-bit FxP and FlP operators, FlP adders consuming 5-12Ă— more energy than FxP adders, and multipliers 2-10Ă— more. However, when applied to K-means clustering algorithm, the gap between FxP and FlP tightens. Indeed, the accuracy improvements brought by FlP make the computation more accurate and lead to an accuracy equivalent to FxP with less iterations of the algorithm, proportionally reducing the global energy spent. The 8-bit version of the algorithm becomes more profitable using FlP, which is 80% more accurate with only 1.6Ă— more energy. This paper finally discusses the stake of custom FlP for low-energy general-purpose computation, thanks to its ease of use, supported by an energy overhead lower than what could have been expected

    Playing with number representations and operator-level approximations

    Get PDF
    International audienceEnergy consumption is one of the major issues in computing today shared by all domains in computer science, from high-performance computing to embedded systems. The two main factors that influence energy consumption is the execution time and data volume. In the recent years, approximation is receiving renewed interests to improve both speed and energy consumption in embedded systems. Many applications in embedded systems do not require high precision/accuracy, and both software designers and hardware designers often seek for a golden point of the compromise between accuracy, speed, energy, and area cost in several layers with a broad range from application, software levels to architecture, circuit levels. Various techniques for approximate computing (AC) augment the design space by providing another set of design knobs for performance-accuracy trade-off. Stochastic computing (SC) is also seen as an alternative to conventional computing, since requiring less hardware and being more tolerant to soft errors at the expense of higher latency. SC uses a probabilistic model of computation and requires less hardware to implement complex operations. This talk will review the main techniques for operator-level approximations using various number representations and by playing with data word-length and types of operators, to show their benefit and drawbacks in terms of energy efficiency. We will also introduce the basic concepts of stochastic computing as well as its advantages in terms of robustness to errors and fair limitations

    Power-Adaptive Computing System Design for Solar-Energy-Powered Embedded Systems

    Get PDF

    Analysis of fixed-point and floating-point arithmetic representations’ impact on synthesized area of a digital integrated circuit

    Get PDF
    Abstract. This thesis compared fixed-point and floating-point representations, using signal-to-quantization-noise-ratio (SQNR) and synthesized area as key comparison methods. Good-enough SQNR was set to 40 dB, and the goal was to choose area that was as small as possible, but still had sufficient dynamic range (DR), and also fulfilled the SQNR requirement. Quantization models for both representations were implemented with Matlab. For examination of the SQNR, an algorithm was chosen and aforementioned quantization models were added inside it. The chosen algorithm was memory-based 64-point FFT, implemented with radix-2 butterfly. The performance drop inside algorithm caused by arithmetic representation quantization was examined using SQNR. To be able to calculate the error value, a reference model was implemented, and that was done using FFT-function of Matlab. When SQNR-analysis had been executed, synthesis was run for arithmetic operation models, for area and power estimate calculation. From these results, a conclusion of impact on area of FXP and FLP on different FFT models was done and a superiority comparison was possible.Kiinteän pilkun luvun ja liukuvan pilkun luvun aritmeettisten esitystapojen vaikutusten analysointi digitaalisen mikropiirin syntetisoituun pinta-alaan. Tiivistelmä. Tässä työssä vertailtiin kiinteän pilkun lukuja ja liukuvan pilkun lukuja, käyttäen tärkeimpinä vertailuparametreina signaalikvantisointikohinasuhdetta (SQNR) sekä synteesistä saatavaa pinta-alaa. SQNR tavoitearvoksi asetettiin 40 dB ja tavoitteena oli valita mahdollisimman pieni pinta-ala, jolla vielä saavutettiin tarpeeksi suuri dynaaminen alue (DR) ja SQNR tavoite täyttyi. SQNR:n laskentaan tarvittiin molemmille aritmeettisille esitystavoille kvantisointimallit, jotka tehtiin Matlab-ohjelmalla. Lopulta kvantisointikohinan tarkempaan tarkasteluun valittiin algoritmi, jonka sisälle edellä mainitut kvantisointimallit asetettiin. Valittu algoritmi oli muistipohjainen 64-näytteinen FFT, joka on toteutettu radix-2 perhoslaskennalla. Algoritmin sisällä tapahtuvaa aritmeettisesta esitystavasta johtuvaa suorituskyvyn muutosta tutkittiin SQNR:n avulla. Jotta virhe voitiin laskea, myös referenssimalli täytyi implementoida, ja siihen käytettiin Matlabin valmista FFT-funktiota. Kun SQNR-analyysi oli suoritettu, ajettiin aritmeettisille operaatio malleille synteesit, joista voitiin laskea algoritmin vaatima pinta-ala. Näistä tuloksista voitiin yhteenvetää liukuvan pilkun ja kiinteän pilkun lukujen vaikutukset FFT mallien pinta-aloihin, ja siten tehdä paremmuusvertailua niiden välillä

    Accuracy-guaranteed bit-width optimization

    No full text
    Published versio

    Exploring Hardware Fault Impacts on Different Real Number Representations of the Structural Resilience of TCUs in GPUs

    Get PDF
    The most recent generations of graphics processing units (GPUs) boost the execution of convolutional operations required by machine learning applications by resorting to specialized and efficient in-chip accelerators (Tensor Core Units or TCUs) that operate on matrix multiplication tiles. Unfortunately, modern cutting-edge semiconductor technologies are increasingly prone to hardware defects, and the trend to highly stress TCUs during the execution of safety-critical and high-performance computing (HPC) applications increases the likelihood of TCUs producing different kinds of failures. In fact, the intrinsic resiliency to hardware faults of arithmetic units plays a crucial role in safety-critical applications using GPUs (e.g., in automotive, space, and autonomous robotics). Recently, new arithmetic formats have been proposed, particularly those suited to neural network execution. However, the reliability characterization of TCUs supporting different arithmetic formats was still lacking. In this work, we quantitatively assessed the impact of hardware faults in TCU structures while employing two distinct formats (floating-point and posit) and using two different configurations (16 and 32 bits) to represent real numbers. For the experimental evaluation, we resorted to an architectural description of a TCU core (PyOpenTCU) and performed 120 fault simulation campaigns, injecting around 200,000 faults per campaign and requiring around 32 days of computation. Our results demonstrate that the posit format of TCUs is less affected by faults than the floating-point one (by up to three orders of magnitude for 16 bits and up to twenty orders for 32 bits). We also identified the most sensible fault locations (i.e., those that produce the largest errors), thus paving the way to adopting smart hardening solutions

    Approximate computing design exploration through data lifetime metrics

    Get PDF
    When designing an approximate computing system, the selection of the resources to modify is key. It is important that the error introduced in the system remains reasonable, but the size of the design exploration space can make this extremely difficult. In this paper, we propose to exploit a new metric for this selection: data lifetime. The concept comes from the field of reliability, where it can guide selective hardening: the more often a resource handles "live" data, the more critical it be-comes, the more important it will be to protect it. In this paper, we propose to use this same metric in a new way: identify the less critical resources as approximation targets in order to minimize the impact on the global system behavior and there-fore decrease the impact of approximation while increasing gains on other criteria

    A transprecision floating-point cluster for efficient near-sensor data analytics

    Full text link
    Recent applications in the domain of near-sensor computing require the adoption of floating-point arithmetic to reconcile high precision results with a wide dynamic range. In this paper, we propose a multi-core computing cluster that leverages the fined-grained tunable principles of transprecision computing to provide support to near-sensor applications at a minimum power budget. Our design - based on the open-source RISC-V architecture - combines parallelization and sub-word vectorization with near-threshold operation, leading to a highly scalable and versatile system. We perform an exhaustive exploration of the design space of the transprecision cluster on a cycle-accurate FPGA emulator, with the aim to identify the most efficient configurations in terms of performance, energy efficiency, and area efficiency. We also provide a full-fledged software stack support, including a parallel runtime and a compilation toolchain, to enable the development of end-to-end applications. We perform an experimental assessment of our design on a set of benchmarks representative of the near-sensor processing domain, complementing the timing results with a post place-&-route analysis of the power consumption. Finally, a comparison with the state-of-the-art shows that our solution outperforms the competitors in energy efficiency, reaching a peak of 97 Gflop/s/W on single-precision scalars and 162 Gflop/s/W on half-precision vectors

    Design, Verification, Test and In-Field Implications of Approximate Computing Systems

    Get PDF
    Today, the concept of approximation in computing is becoming more and more a “hot topic” to investigate how computing systems can be more energy efficient, faster, and less complex. Intuitively, instead of performing exact computations and, consequently, requiring a high amount of resources, Approximate Computing aims at selectively relaxing the specifications, trading accuracy off for efficiency. While Approximate Computing gives several promises when looking at systems’ performance, energy efficiency and complexity, it poses significant challenges regarding the design, the verification, the test and the in-field reliability of Approximate Computing systems. This tutorial paper covers these aspects leveraging the experience of the authors in the field to present state-of-the-art solutions to apply during the different development phases of an Approximate Computing system
    • …
    corecore