152 research outputs found
Privacy Leakages in Approximate Adders
Approximate computing has recently emerged as a promising method to meet the
low power requirements of digital designs. The erroneous outputs produced in
approximate computing can be partially a function of each chip's process
variation. We show that, in such schemes, the erroneous outputs produced on
each chip instance can reveal the identity of the chip that performed the
computation, possibly jeopardizing user privacy. In this work, we perform
simulation experiments on 32-bit Ripple Carry Adders, Carry Lookahead Adders,
and Han-Carlson Adders running at over-scaled operating points. Our results
show that identification is possible, we contrast the identifiability of each
type of adder, and we quantify how success of identification varies with the
extent of over-scaling and noise. Our results are the first to show that
approximate digital computations may compromise privacy. Designers of future
approximate computing systems should be aware of the possible privacy leakages
and decide whether mitigation is warranted in their application.Comment: 2017 IEEE International Symposium on Circuits and Systems (ISCAS
Demonstration of Inexact Computing Implemented in the JPEG Compression Algorithm using Probabilistic Boolean Logic applied to CMOS Components
Probabilistic computing offers potential improvements in energy, performance, and area compared with traditional digital design. This dissertation quantifies energy and energy-delay tradeoffs in digital adders, multipliers, and the JPEG image compression algorithm. This research shows that energy demand can be cut in half with noisesusceptible16-bit Kogge-Stone adders that deviate from the correct value by an average of 3 in 14 nanometer CMOS FinFET technology, while the energy-delay product (EDP) is reduced by 38 . This is achieved by reducing the power supply voltage which drives the noisy transistors. If a 19 average error is allowed, the adders are 13 times more energy-efficient and the EDP is reduced by 35 . This research demonstrates that 92 of the color space transform and discrete cosine transform circuits within the JPEG algorithm can be built from inexact components, and still produce readable images. Given the case in which each binary logic gate has a 1 error probability, the color space transformation has an average pixel error of 5.4 and a 55 energy reduction compared to the error-free circuit, and the discrete cosine transformation has a 55 energy reduction with an average pixel error of 20
Design of approximate overclocked datapath
Embedded applications can often demand stringent latency requirements. While high degrees of parallelism within custom FPGA-based accelerators may help to some extent, it may also be necessary to limit the precision used in the datapath to boost the operating frequency of the implementation. However, by reducing the precision, the engineer introduces quantisation error into the design.
In this thesis, we describe an alternative circuit design methodology when considering trade-offs between accuracy, performance and silicon area. We compare two different approaches that could trade accuracy for performance. One is the traditional approach where the precision used in the datapath is limited to meet a target latency. The other is a proposed new approach which simply allows the datapath to operate without timing closure. We demonstrate analytically and experimentally that for many applications it would be preferable to simply overclock the design and accept that timing violations may arise. Since the errors introduced by timing violations occur rarely, they will cause less noise than quantisation errors.
Furthermore, we show that conventional forms of computer arithmetic do not fail gracefully when pushed beyond the deterministic clocking region. In this thesis we take a fresh look at Online Arithmetic, originally proposed for digit serial operation, and synthesize unrolled digit parallel online arithmetic operators to allow for graceful degradation. We quantify the impact of timing violations on key arithmetic primitives, and show that substantial performance benefits can be obtained in comparison to binary arithmetic. Since timing errors are caused by long carry chains, these result in errors in least significant digits with online arithmetic, causing less impact than conventional implementations.Open Acces
Propagation of Delay in Probabilistic CMOS Systems
Future low voltage noise dominated designs render probabilistic behavior of CMOS. This is acceptable as far as applications’ intrinsic error resilience allows quantified inaccuracy in results to save energy consumption, such as in applications like audio/video processing and sky image formation in radio astronomy. This introduces the trade-off between energy consumption (E) and probability of correctness (p) that provides an opportunity for inexact computing to attain higher energy efficiency. Efforts have been made in the last decade to model probabilistic CMOS (PCMOS) keeping in view the noise variance and to establish its feasibility for error resilient applications focused on the nominal voltage range. However, exploiting the near threshold voltage (NTV) range is quite a promising energy efficient design technique that operates the hardware at relatively slower pace while retaining the deterministic property of computations. We propose to take the advantage of energy efficiency at NTV while retaining the speed as constant, sacrificing p to the extent allowed by applications resilience. In this regard, we investigated the impact of NTV operation on PCMOS where more energy can be saved with less accurate results. Our simulation results of an inverter and a 4-bit ripple carry adder in Cadence showed the shortcomings of current analytical models for probability of correctness at NTV and lower voltage supplies. We further investigated the impact of delay propagation in a digital system composed of probabilistic building blocks, which provides a clear insight of timing delay affecting the higher significant computational bits more than its lower significant counterparts and hence contributing considerably to the total error
Recommended from our members
Design of Hardware with Quantifiable Security against Reverse Engineering
Semiconductors are a 412 billion dollar industry and integrated circuits take on important roles in human life, from everyday use in smart-devices to critical applications like healthcare and aviation. Saving today\u27s hardware systems from attackers can be a huge concern considering the budget spent on designing these chips and the sensitive information they may contain. In particular, after fabrication, the chip can be subject to a malicious reverse engineer that tries to invasively figure out the function of the chip or other sensitive data. Subsequent to an attack, a system can be subject to cloning, counterfeiting, or IP theft. This dissertation addresses some issues concerning the security of hardware systems in such scenarios.
First, the issue of privacy risks from approximate computing is investigated in Chapter 2. Simulation experiments show that the erroneous outputs produced on each chip instance can reveal the identity of the chip that performed the computation, which jeopardizes user privacy.
The next two chapters deal with camouflaging, which is a technique to prevent reverse engineering from extracting circuit information from the layout. Chapter 3 provides a design automation method to protect camouflaged circuits against an adversary with prior knowledge about the circuit\u27s viable functions. Chapter 4 provides a method to reverse engineer camouflaged circuits. The proposed reverse engineering formulation uses Boolean Satisfiability (SAT) solving in a way that incorporates laser fault injection and laser voltage probing capabilities to figure out the function of an aggressively camouflaged circuit with unknown gate functions and connections.
Chapter 5 addresses the challenge of secure key storage in hardware by proposing a new key storage method that applies threshold-defined behavior of memory cells to store secret information in a way that achieves a high degree of protection against invasive reverse engineering. This approach requires foundry support to encode the secrets as threshold voltage offsets in transistors. In Chapter 6, a secret key storage approach is introduced that does not rely on a trusted foundry. This approach only relies on the foundry to fabricate the hardware infrastructure for key generation but not to encode the secret key. The key is programmed by the IP integrator or the user after fabrication via directed accelerated aging of transistors. Additionally, this chapter presents the design of a working hardware prototype on PCB that demonstrates this scheme.
Finally, chapter 7 concludes the dissertation and summarizes possible future research
Design and Applications of Approximate Circuits by Gate-Level Pruning
Energy-efficiency is a critical concern for many systems, ranging from Internet of things objects and mobile devices to high-performance computers. Moreover, after 40 years of prosperity, Moore's law is starting to show its economic and technical limits. Noticing that many circuits are over-engineered and that many applications are error-resilient or require less precision than offered by the existing hardware, approximate computing has emerged as a potential solution to pursue improvements of digital circuits. In this regard, a technique to systematically tradeoff accuracy in exchange for area, power, and delay savings in digital circuits is proposed: gate-level pruning (GLP). A CAD tool is build and integrated into a standard digital flow to offer a wide range of cost-accuracy tradeoffs for any conventional design. The methodology is first demonstrated on adders, achieving up to 78% energy-delay-area reduction for 10% mean relative error. It is then detailed how this methodology can be applied on a more complex system composed of a multitude of arithmetic blocks and memory: the discrete cosine transform (DCT), which is a key building block for image and video processing applications. Even though arithmetic circuits represent less than 4% of the entire DCT area, it is shown that the GLP technique can lead to 21% energy-delay-area savings over the entire system for a reasonable image quality loss of 24 dB. This significant saving is achieved thanks to the pruned arithmetic circuits, which sets some nodes at constant values, enabling the synthesis tool to further simplify the circuit and memory
A Genetic-algorithm-based Approach to the Design of DCT Hardware Accelerators
As modern applications demand an unprecedented level of computational resources, traditional computing system design paradigms are no longer adequate to guarantee significant performance enhancement at an affordable cost. Approximate Computing (AxC) has been introduced as a potential candidate to achieve better computational performances by relaxing non-critical functional system specifications. In this article, we propose a systematic and high-abstraction-level approach allowing the automatic generation of near Pareto-optimal approximate configurations for a Discrete Cosine Transform (DCT) hardware accelerator. We obtain the approximate variants by using approximate operations, having configurable approximation degree, rather than full-precise ones. We use a genetic searching algorithm to find the appropriate tuning of the approximation degree, leading to optimal tradeoffs between accuracy and gains. Finally, to evaluate the actual HW gains, we synthesize non-dominated approximate DCT variants for two different target technologies, namely, Field Programmable Gate Arrays (FPGAs) and Application Specific Integrated Circuits (ASICs). Experimental results show that the proposed approach allows performing a meaningful exploration of the design space to find the best tradeoffs in a reasonable time. Indeed, compared to the state-of-the-art work on approximate DCT, the proposed approach allows an 18% average energy improvement while providing at the same time image quality improvement
Recommended from our members
Modeling and synthesis of approximate digital circuits
textEnergy minimization has become an ever more important concern in the design of very large scale integrated circuits (VLSI). In recent years, approximate computing, which is based on the idea of trading off computational accuracy for improved energy efficiency, has attracted significant attention. Applications that are both compute-intensive and error-tolerant are most suitable to adopt approximation strategies. This includes digital signal processing, data mining, machine learning or search algorithms. Such approximations can be achieved at several design levels, ranging from software, algorithm and architecture, down to logic or transistor levels. This dissertation investigates two research threads for the derivation of approximate digital circuits at the logic level: 1) modeling and synthesis of fundamental arithmetic building blocks; 2) automated techniques for synthesizing arbitrary approximate logic circuits under general error specifications. The first thread investigates elementary arithmetic blocks, such as adders and multipliers, which are at the core of all data processing and often consume most of the energy in a circuit. An optimal strategy is developed to reduce energy consumption in timing-starved adders under voltage over-scaling. This allows a formal demonstration that, under quadratic error measures prevalent in signal processing applications, an adder design strategy that separates the most significant bits (MSBs) from the least significant bits (LSBs) is optimal. An optimal conditional bounding (CB) logic is further proposed for the LSBs, which selectively compensates for the occurrence of errors in the MSB part. There is a rich design space of optimal adders defined by different CB solutions. The other thread considers the problem of approximate logic synthesis (ALS) in two-level form. ALS is concerned with formally synthesizing a minimum-cost approximate Boolean function, whose behavior deviates from a specified exact Boolean function in a well-constrained manner. It is established that the ALS problem un-constrained by the frequency of errors is isomorphic to a Boolean relation (BR) minimization problem, and hence can be efficiently solved by existing BR minimizers. An efficient heuristic is further developed which iteratively refines the magnitude-constrained solution to arrive at a two-level representation also satisfying error frequency constraints. To extend the two-level solution into an approach for multi-level approximate logic synthesis (MALS), Boolean network simplifications allowed by external don't cares (EXDCs) are used. The key contribution is in finding non-trivial EXDCs that can maximally approach the external BR and, when applied to the Boolean network, solve the MALS problem constrained by magnitude only. The algorithm then ensures compliance to error frequency constraints by recovering the correct outputs on the sought number of error-producing inputs while aiming to minimize the network cost increase. Experiments have demonstrated the effectiveness of the proposed techniques in deriving approximate circuits. The approximate adders can save up to 60% energy compared to exact adders for a reasonable accuracy. When used in larger systems implementing image-processing algorithms, energy savings of 40% are possible. The logic synthesis approaches generally can produce approximate Boolean functions or networks with complexity reductions ranging from 30% to 50% under small error constraints.Electrical and Computer Engineerin
- …