Search CORE

526 research outputs found

APPROXIMATE COMPUTING BASED PROCESSING OF MEA SIGNALS ON FPGA

Author: Hassan Mohammad Emad
Publication venue: Scholarworks@UAEU
Publication date: 01/04/2023
Field of study

The Microelectrode Array (MEA) is a collection of parallel electrodes that may measure the extracellular potential of nearby neurons. It is a crucial tool in neuroscience for researching the structure, operation, and behavior of neural networks. Using sophisticated signal processing techniques and architectural templates, the task of processing and evaluating the data streams obtained from MEAs is a computationally demanding one that needs time and parallel processing.This thesis proposes enhancing the capability of MEA signal processing systems by using approximate computing-based algorithms. These algorithms can be implemented in systems that process parallel MEA channels using the Field Programmable Gate Arrays (FPGAs). In order to develop approximate signal processing algorithms, three different types of approximate adders are investigated in various configurations. The objective is to maximize performance improvements in terms of area, power consumption, and latency associated with real-time processing while accepting lower output accuracy within certain bounds. On FPGAs, the methods are utilized to construct approximate processing systems, which are then contrasted with the precise system. Real biological signals are used to evaluate both precise and approximative systems, and the findings reveal notable improvements, especially in terms of speed and area. Processing speed enhancements reach up to 37.6%, and area enhancements reach 14.3% in some approximate system modes without sacrificing accuracy. Additional cases demonstrate how accuracy, area, and processing speed may be traded off. Using approximate computing algorithms allows for the design of real-time MEA processing systems with higher speeds and more parallel channels. The application of approximate computing algorithms to process biological signals on FPGAs in this thesis is a novel idea that has not been explored before

United Arab Emirates University: Scholarworks@UAEU / جامعة الامارات

Practical Techniques for Improving Performance and Evaluating Security on Circuit Designs

Author: Xu Wenbin
Publication venue
Publication date: 20/11/2019
Field of study

As the modern semiconductor technology approaches to nanometer era, integrated circuits (ICs) are facing more and more challenges in meeting performance demand and security. With the expansion of markets in mobile and consumer electronics, the increasing demands require much faster delivery of reliable and secure IC products. In order to improve the performance and evaluate the security of emerging circuits, we present three practical techniques on approximate computing, split manufacturing and analog layout automation. Approximate computing is a promising approach for low-power IC design. Although a few accuracy-configurable adder (ACA) designs have been developed in the past, these designs tend to incur large area overheads as they rely on either redundant computing or complicated carry prediction. We investigate a simple ACA design that contains no redundancy or error detection/correction circuitry and uses very simple carry prediction. The simulation results show that our design dominates the latest previous work on accuracy-delay-power tradeoff while using 39% less area. One variant of this design provides finer-grained and larger tunability than that of the previous works. Moreover, we propose a delay-adaptive self-configuration technique to further improve the accuracy-delay-power tradeoff. Split manufacturing prevents attacks from an untrusted foundry. The untrusted foundry has front-end-of-line (FEOL) layout and the original circuit netlist and attempts to identify critical components on the layout for Trojan insertion. Although defense methods for this scenario have been developed, the corresponding attack technique is not well explored. Hence, the defense methods are mostly evaluated with the k-security metric without actual attacks. We develop a new attack technique based on structural pattern matching. Experimental comparison with existing attack shows that the new attack technique achieves about the same success rate with much faster speed for cases without the k-security defense, and has a much better success rate at the same runtime for cases with the k-security defense. The results offer an alternative and practical interpretation for k-security in split manufacturing. Analog layout automation is still far behind its digital counterpart. We develop the layout automation framework for analog/mixed-signal ICs. A hierarchical layout synthesis flow which works in bottom-up manner is presented. To ensure the qualified layouts for better circuit performance, we use the constraint-driven placement and routing methodology which employs the expert knowledge via design constraints. The constraint-driven placement uses simulated annealing process to find the optimal solution. The packing represented by sequence pairs and constraint graphs can simultaneously handle different kinds of placement constraints. The constraint-driven routing consists of two stages, integer linear programming (ILP) based global routing and sequential detailed routing. The experiment results demonstrate that our flow can handle complicated hierarchical designs with multiple design constraints. Furthermore, the placement performance can be further improved by using mixed-size block placement which works on large blocks in priority

A Study on Efficient Designs of Approximate Arithmetic Circuits

Author: Venkatachalam Suganthi
Publication venue: 'University of Saskatchewan Library'
Publication date: 03/01/2019
Field of study

Approximate computing is a popular field where accuracy is traded with energy. It can benefit applications such as multimedia, mobile computing and machine learning which are inherently error resilient. Error introduced in these applications to a certain degree is beyond human perception. This flexibility can be exploited to design area, delay and power efficient architectures. However, care must be taken on how approximation compromises the correctness of results. This research work aims to provide approximate hardware architectures with error metrics and design metrics analyzed and their effects in image processing applications. Firstly, we study and propose unsigned array multipliers based on probability statistics and with approximate 4-2 compressors, full adders and half adders. This work deals with a new design approach for approximation of multipliers. The partial products of the multiplier are altered to introduce varying probability terms. Logic complexity of approximation is varied for the accumulation of altered partial products based on their probability. The proposed approximation is utilized in two variants of 16-bit multipliers. Synthesis results reveal that two proposed multipliers achieve power savings of 72% and 38% respectively compared to an exact multiplier. They have better precision when compared to existing approximate multipliers. Mean relative error distance (MRED) figures are as low as 7.6% and 0.02% for the proposed approximate multipliers, which are better than the previous state-of-the-art works. Performance of the proposed multipliers is evaluated with geometric mean filtering application, where one of the proposed models achieves the highest peak signal to noise ratio (PSNR). Second, approximation is proposed for signed Booth multiplication. Approximation is introduced in partial product generation and partial product accumulation circuits. In this work, three multipliers (ABM-M1, ABM-M2, and ABM-M3) are proposed in which the modified Booth algorithm is approximated. In all three designs, approximate Booth partial product generators are designed with different variations of approximation. The approximations are performed by reducing the logic complexity of the Booth partial product generator, and the accumulation of partial products is slightly modified to improve circuit performance. Compared to the exact Booth multiplier, ABM-M1 achieves up to 15% reduction in power consumption with an MRED value of 7.9 × 10-4. ABM-M2 has power savings of up to 60% with an MRED of 1.1 × 10-1. ABM-M3 has power savings of up to 50% with an MRED of 3.4 × 10-3. Compared to existing approximate Booth multipliers, the proposed multipliers ABM-M1 and ABM-M3 achieve up to a 41% reduction in power consumption while exhibiting very similar error metrics. Image multiplication and matrix multiplication are used as case studies to illustrate the high performance of the proposed approximate multipliers. Third, distributed arithmetic based sum of products units approximation is analyzed. Sum of products units are key elements in many digital signal processing applications. Three approximate sum of products models which are based on distributed arithmetic are proposed. They are designed for different levels of accuracy. First model of approximate sum of products achieves an improvement up to 64% on area and 70% on power, when compared to conventional unit. Other two models provide an improvement of 32% and 48% on area and 54% and 58% on power, respectively, with a reduced error rate compared to the first model. Third model achieves MRED and normalized mean error distance (NMED) as low as 0.05% and 0.009%. Performance of approximate units is evaluated with a noisy image smoothing application, where the proposed models are capable of achieving higher PSNR than existing state of the art techniques. Fourth, approximation is applied in division architecture. Two approximation models are proposed for restoring divider. In the first design, approximation is performed at circuit level, where approximate divider cells are utilized in place of exact ones by simplifying the logic equations. In the second model, restoring divider is analyzed strategically and number of restoring divider cells are reduced by finding the portions of divisor and dividend with significant information. An approximation factor

p

is used in both designs. In model 1, the design with p=8 has a 58% reduction in both area and power consumption compared to exact design, with a Q-MRED of 1.909 × 10-2 and Q-NMED of 0.449 × 10-2. The second model with an approximation factor p=4 has 54% area savings and 62% power savings compared to exact design. The proposed models are found to have better error metrics compared to existing designs, with better performance at similar error values. A change detection image processing application is used for real time assessment of proposed and existing approximate dividers and one of the models achieves a PSNR of 54.27 dB

eCommons@USASK

University of Saskatchewan Research Archive

An Energy-Efficient Generic Accuracy Configurable Multiplier Based on Block-Level Voltage Overscaling

Author: Akbari Omid
Bahoo Ali Akbar
Shafique Muhammad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/07/2023
Field of study

Voltage Overscaling (VOS) is one of the well-known techniques to increase the energy efficiency of arithmetic units. Also, it can provide significant lifetime improvements, while still meeting the accuracy requirements of inherently error-resilient applications. This paper proposes a generic accuracy-configurable multiplier that employs the VOS at a coarse-grained level (block-level) to reduce the control logic required for applying VOS and its associated overheads, thus enabling a high degree of trade-off between energy consumption and output quality. The proposed configurable Block-Level VOS-based (BL-VOS) multiplier relies on employing VOS in a multiplier composed of smaller blocks, where applying VOS in different blocks results in structures with various output accuracy levels. To evaluate the proposed concept, we implement 8-bit and 16-bit BL-VOS multipliers with various blocks width in a 15-nm FinFET technology. The results show that the proposed multiplier achieves up to 15% lower energy consumption and up to 21% higher output accuracy compared to the state-of-the-art VOS-based multipliers. Also, the effects of Process Variation (PV) and Bias Temperature Instability (BTI) induced delay on the proposed multiplier are investigated. Finally, the effectiveness of the proposed multiplier is studied for two different image processing applications, in terms of quality and energy efficiency.Comment: This paper has been published in IEEE Transactions on Emerging Topics in Computin

arXiv.org e-Print Archive

Investigation of reconfigurable-accuracy approximate adder designs for image processing applications

Author: Al-Ma'aitah Khaled Suleiman
Publication venue: Newcastle University
Publication date: 01/01/2019
Field of study

Ph. D. Thesis.In the last decades, integrated circuits with CMOS technology show progressive scaling challenges of both increased power density and power dissipation. Meanwhile, high-performance requirements of current and future application operations show rapid demands of computing resources like power. This design conflict has pushed much effort to search for high performance and energy efficient design approach, such as approximate computing. Approximate computing exploits the error resilience of compute- intensive applications such as image processing applications to implement approximation design techniques with different levels of abstractions and scalability. The basic principle is to relax the strict accuracy requirements in favour of a lower design complexity, thereby achieving more computational performance (i.e., speed) and energy saving. The adder arithmetic unit is considered one of the essential computational blocks in most of the applications. As such, much effort has explored new designs of an efficient approximate adder design. This thesis presents an investigation into design enhancement, novel approximate adder designs and implementation approaches. The first approach introduces a modification to the error detection technique of a popular configurable-accuracy approximate adder design. The proposed lightweight error detection technique reduces the required gates of the error detection circuit, thus, mitigating the design area overhead. Furthermore, at the error correction process of the adder, we have proposed an extensive error detection while activating more than one correction stage concurrently. As a result, this ensures achieving an optimum accuracy of outputs for the worst case of quality requirements. In general, approximate (speculative) adder designs use the seg- mentation technique to divide the adder into multiple short length sub-adders which operate in parallel. Hence, this would limit the long chains of carry propagation and result in a better performance operations. However, the use of overlapped parts of sub-adders regarding a better carry speculation and then more accuracy be- comes a significant challenge of a large design area overhead. The second approach continues mitigating this challenge by present- ing a novel and simpler adder dividing technique to a number of sub-adders. The new method uses what is known as the carry-kill signal for both limiting the carry propagation and applying adder segmentation. Further, between every two adjacent sub-adders, one AND gate and one XOR gate are used for carry speculation and error (i.e., carry propagation) detection respectively. Thus, a significant reduction of the design overhead has been achieved, yet, with acceptable levels of output results accuracy. In the third final approach, simple logic OR gates are used to build the approximate adder while compensating the conventional full adders operation. The resulted approximate adder design presents very low complex- ity, high speed, and low power consumption. Furthermore, instead of augmenting error recovery circuit, short bit-length exact adders are used as correction stages to control the general level of output quality (i.e., without error detection overhead). At the final correc- tion stage, the proposed design would operate the same as an exact adder. To validate the efficiency of these approaches, a number of adders with different bit-widths are designed and synthesized showing considerable reductions in the critical delay, silicon area and more savings in energy consumption, compared to other existing ap- proaches. In addition to acceptable levels or output errors, which are extensively analysed for each proposed design. In this study, the proposed configurable adder designs exhibit energy/quality trade-offs at a different number of correction stages. These trade-offs can be effectively exploited to implement adders in applications, where energy can be gracefully minimised within the envelope of quality requirements. As such, designs implemen- tation in an image processing application known as Gaussian blur filter was introduced, demonstrating the loss in the image quality at each error correction stage. The output images showed promis- ing results to use the proposed designs for more energy-efficient applications, where output quality requirements can be relaxed.Mutah Universit

Newcastle University eTheses

Efficient machine learning: models and accelerations

Author: Li Zhe
Publication venue: SURFACE at Syracuse University
Publication date: 21/12/2018
Field of study

One of the key enablers of the recent unprecedented success of machine learning is the adoption of very large models. Modern machine learning models typically consist of multiple cascaded layers such as deep neural networks, and at least millions to hundreds of millions of parameters (i.e., weights) for the entire model. The larger-scale model tend to enable the extraction of more complex high-level features, and therefore, lead to a significant improvement of the overall accuracy. On the other side, the layered deep structure and large model sizes also demand to increase computational capability and memory requirements. In order to achieve higher scalability, performance, and energy efficiency for deep learning systems, two orthogonal research and development trends have attracted enormous interests. The first trend is the acceleration while the second is the model compression. The underlying goal of these two trends is the high quality of the models to provides accurate predictions. In this thesis, we address these two problems and utilize different computing paradigms to solve real-life deep learning problems. To explore in these two domains, this thesis first presents the cogent confabulation network for sentence completion problem. We use Chinese language as a case study to describe our exploration of the cogent confabulation based text recognition models. The exploration and optimization of the cogent confabulation based models have been conducted through various comparisons. The optimized network offered a better accuracy performance for the sentence completion. To accelerate the sentence completion problem in a multi-processing system, we propose a parallel framework for the confabulation recall algorithm. The parallel implementation reduce runtime, improve the recall accuracy by breaking the fixed evaluation order and introducing more generalization, and maintain a balanced progress in status update among all neurons. A lexicon scheduling algorithm is presented to further improve the model performance. As deep neural networks have been proven effective to solve many real-life applications, and they are deployed on low-power devices, we then investigated the acceleration for the neural network inference using a hardware-friendly computing paradigm, stochastic computing. It is an approximate computing paradigm which requires small hardware footprint and achieves high energy efficiency. Applying this stochastic computing to deep convolutional neural networks, we design the functional hardware blocks and optimize them jointly to minimize the accuracy loss due to the approximation. The synthesis results show that the proposed design achieves the remarkable low hardware cost and power/energy consumption. Modern neural networks usually imply a huge amount of parameters which cannot be fit into embedded devices. Compression of the deep learning models together with acceleration attracts our attention. We introduce the structured matrices based neural network to address this problem. Circulant matrix is one of the structured matrices, where a matrix can be represented using a single vector, so that the matrix is compressed. We further investigate a more flexible structure based on circulant matrix, called block-circulant matrix. It partitions a matrix into several smaller blocks and makes each submatrix is circulant. The compression ratio is controllable. With the help of Fourier Transform based equivalent computation, the inference of the deep neural network can be accelerated energy efficiently on the FPGAs. We also offer the optimization for the training algorithm for block circulant matrices based neural networks to obtain a high accuracy after compression

Syracuse University Research Facility and Collaborative Environment

Practical Techniques for Improving Performance and Evaluating Security on Circuit Designs

Author: Xu Wenbin
Publication venue
Publication date: 20/11/2019
Field of study

Texas A&M Repository

Performance Enhancement of Power System Operation and Planning through Advanced Advisory Mechanisms

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: This research develops decision support mechanisms for power system operation and planning practices. Contemporary industry practices rely on deterministic approaches to approximate system conditions and handle growing uncertainties from renewable resources. The primary purpose of this research is to identify soft spots of the contemporary industry practices and propose innovative algorithms, methodologies, and tools to improve economics and reliability in power systems. First, this dissertation focuses on transmission thermal constraint relaxation practices. Most system operators employ constraint relaxation practices, which allow certain constraints to be relaxed for penalty prices, in their market models. A proper selection of penalty prices is imperative due to the influence that penalty prices have on generation scheduling and market settlements. However, penalty prices are primarily decided today based on stakeholder negotiations or system operator’s judgments. There is little to no methodology or engineered approach around the determination of these penalty prices. This work proposes new methods that determine the penalty prices for thermal constraint relaxations based on the impact overloading can have on the residual life of the line. This study evaluates the effectiveness of the proposed methods in the short-term operational planning and long-term transmission expansion planning studies. The second part of this dissertation investigates an advanced methodology to handle uncertainties associated with high penetration of renewable resources, which poses new challenges to power system reliability and calls attention to include stochastic modeling within resource scheduling applications. However, the inclusion of stochastic modeling within mathematical programs has been a challenge due to computational complexities. Moreover, market design issues due to the stochastic market environment make it more challenging. Given the importance of reliable and affordable electric power, such a challenge to advance existing deterministic resource scheduling applications is critical. This ongoing and joint research attempts to overcome these hurdles by developing a stochastic look-ahead commitment tool, which is a stand-alone advisory tool. This dissertation contributes to the derivation of a mathematical formulation for the extensive form two-stage stochastic programming model, the utilization of Progressive Hedging decomposition algorithm, and the initial implementation of the Progressive Hedging subproblem along with various heuristic strategies to enhance the computational performance.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository

Recommended from our members

Continuous-Time and Companding Digital Signal Processors Using Adaptivity and Asynchronous Techniques

Author: Vezyrtzis Christos
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

The fully synchronous approach has been the norm for digital signal processors (DSPs) for many decades. Due to its simplicity, the classical DSP structure has been used in many applications. However, due to its rigid discrete-time operation, a classical DSP has limited efficiency or inadequate resolution for some emerging applications, such as processing of multimedia and biological signals. This thesis proposes fundamentally new approaches to designing DSPs, which are different from the classical scheme. The defining characteristic of all new DSPs examined in this thesis is the notion of "adaptivity" or "adaptability." Adaptive DSPs dynamically change their behavior to adjust to some property of their input stream, for example the rate of change of the input. This thesis presents both enhancements to existing adaptive DSPs, as well as new adaptive DSPs. The main class of DSPs that are examined throughout the thesis are continuous-time (CT) DSPs. CT DSPs are clock-less and event-driven; they naturally adapt their activity and power consumption to the rate of their inputs. The absence of a clock also provides a complete avoidance of aliasing in the frequency domain, hence improved signal fidelity. The core of this thesis deals with the complete and systematic design of a truly general-purpose CT DSP. A scalable design methodology for CT DSPs is presented. This leads to the main contribution of this thesis, namely a new CT DSP chip. This chip is the first general-purpose CT DSP chip, able to process many different classes of CT and synchronous signals. The chip has the property of handling various types of signals, i.e. various different digital modulations, both synchronous and asynchronous, without requiring any reconfiguration; such property is presented for the first time CT DSPs and is impossible for classical DSPs. As opposed to previous CT DSPs, which were limited to using only one type of digital format, and whose design was hard to scale for different bandwidths and bit-widths, this chip has a formal, robust and scalable design, due to the systematic usage of asynchronous design techniques. The second contribution of this thesis is a complete methodology to design adaptive delay lines. In particular, it is shown how to make the granularity, i.e. the number of stages, adaptive in a real-time delay line. Adaptive granularity brings about a significant improvement in the line's power consumption, up to 70% as reported by simulations on two design examples. This enhancement can have a direct large power impact on any CT DSP, since a delay line consumes the majority of a CT DSP's power. The robust methodology presented in this thesis allows safe dynamic reconfiguration of the line's granularity, on-the-fly and according to the input traffic. As a final contribution, the thesis also examines two additional DSPs: one operating the CT domain and one using the companding technique. The former operates only on level-crossing samples; the proposed methodology shows a potential for high-quality outputs by using a complex interpolation function. Finally, a companding DSP is presented for MPEG audio. Companding DSPs adapt their dynamic range to the amplitude of their input; the resulting can offer high-quality outputs even for small inputs. By applying companding to MPEG DSPs, it is shown how the DSP distortion can be made almost inaudible, without requiring complex arithmetic hardware

Columbia University Academic Commons