Search CORE

19 research outputs found

Low power two-channel PR QMF bank using CSD coefficients and FPGA implementation

Author: Zong Hongmei
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2008
Field of study

Finite impulse response (FIR) filter is a fundamental component in digital signal processing. Two-channel perfect reconstruction (PR) QMF banks are widely used in many applications, such as image coding, speech processing and communications. A practical lattice realization of two-channel QMF bank is developed in this thesis for dealing with the wide dynamic range of intermediate results in lattice structure. To achieve low complexity and low power consumption of two-channel perfect reconstruction QMF bank, canonical signed digit (CSD) number system is used for representing lattice coefficients in FPGA implementation. Utilization of CSD number system in lattice structures leads to more efficient hardware implementation. Many fixed-point simulations were done in Matlab in order to obtain the proper fixed-point word-length for different signals. Finally, FPGA implementation results show that perfect reconstruction signal is obtained by using the proposed method. Furthermore, the power consumption using CSD number system for representing lattice coefficients is less than that obtained by using two\u27s complement number system in two-channel QMF bank. A low complexity and low power two-channel PR QMF bank using CSD coefficients was realized

Scholarship at UWindsor

LOW POWER MULTIPLIER USING ALGORITHMIC NOISE TOLERANT ARCHITECTURE

Author: Priyanka Indiga Jyothi
Sekhar K Chandra
Publication venue: International Journal of Innovative Technology and Research
Publication date: 23/04/2017
Field of study

: A multiplier is one of the key hardware blocks in most digital signal processing (DSP) systems. Typical DSP applications where a multiplier plays an important role include digital filtering, digital communications and spectral analysis (Ayman.A et al (2001)). Many current DSP applications are targeted at portable, battery-operated systems, so that power dissipation becomes one of the primary design constraints. Since multipliers are rather complex circuits and must typically operate at a high system clock rate, reducing the delay of a multiplier is an essential part of satisfying the overall design. In this project a multiplier block has been designed through the algorithmic noise tolerance architectures (ANT) by using Wallace multiplier. A reliable low power multiplier design with the fixed width multiplier block through the reduced precision replica redundancy (RPR) and main block design with Wallace multiplier . The new architecture can meet the high accuracy, low power consumption and area efficiency when compared with previous multiplier circuit

International Journal of Innovative Technology and Research (IJITR)

IMPLEMENTATION OF HIGH-SPEED MULTIPLIER FILTERS USING A MODIFIED NON RECURSIVE COMMON DADA MULTIPLIER

Author: Kalpana Padathala
Kedareswararao M.
Publication venue: International Journal of Innovative Technology and Research
Publication date: 23/04/2017
Field of study

A multiplier is one of the key hardware blocks in most digital signal processing (DSP) systems. Typical DSP applications where a multiplier plays an important role include digital filtering, digital communications and spectral analysis (Ayman.A et al (2001)). Many current DSP applications are targeted at portable, battery-operated systems, so that power dissipation becomes one of the primary design constraints. Since multipliers are rather complex circuits and must typically operate at a high system clock rate, reducing the delay of a multiplier is an essential part of satisfying the overall design. In this project two different multipliers are designed which are array multiplier and modified dada multiplier along with the combination of truncated multiplier. The comparison is carried out using the EDA tool XILINX ISE 12.3i by developing the RTL (Register Transfer Level) using the VERILOG HDL

International Journal of Innovative Technology and Research (IJITR)

Energy-efficient Hardware Accelerator Design for Convolutional Neural Network

Author: Kang Yesung
Publication venue: Graduate School of UNIST
Publication date: 01/08/2019
Field of study

Department of Electrical EngineeringConvolutional neural network (CNN) is a class of deep neural networks, which shows a superior performance in handling images. Due to its performance, CNN is widely used to classify an object or detect the position of object from an image. CNN can be implemented on either edge devices or cloud servers. Since the cloud servers have high computational capabilities, CNN on cloud can perform a large number of tasks at once with a high throughput. However, CNN on the cloud requires a long round-trip time. To infer an image picture, data from a sensor should be uploaded to the cloud server, and processed information from CNN is transferred to the user. If an application requires a rapid response in a certain situation, the long round-trip time of cloud is a critical issue. On the other hand, an edge device has a very short latency, even though it has limited computing resources. In addition, since the edge device does not require the transmission of images over network, its performance would not be affected by the bandwidth of network. Because of these features, it is ef???cient to use the cloud for CNN computing in most cases, but the edge device is preferred in some applications. For example, CNN algorithm for autonomous car requires rapid responses. CNN on cloud requires transmission and reception of images through network, and it cannot respond quickly to users. This problem becomes more serious when a high-resolution input image is required. On the other hand, the edge device does not require the data transmission, and it can response very quickly. Edge devices would be also suitable for CNN applications involving privacy or security. However, the edge device has limited energy resource, the energy ef???ciency of the CNN accelerator is a very important issue. Embedded CNN accelerator consists of off-chip memory, host CPU and a hardware accelerator. The hardware accelerator consists of the main controller, global buffer and arrays of processing elements (PE). It also has a separate compression module and activation module. In this dissertation, we propose energy-ef???cient design in three different parts. First, we propose a time-multiplexing PE to increase the energy ef???ciency of multipliers. From the fact the feature maps have small values which are de???ned as non-outliers, we increase the energy ef???ciency for computing non-outliers. For further improving the energy efficiency of PE, approximate computing is also introduced. Method to optimize the trade-off between accuracy and energy is also proposed. Second, we investigate the energy-ef???cient accuracy recovery circuit. For the implementation of CNN on edge, CNN loops are usually tiled. During tiling of CNN loops, accuracy can be degraded. We analyze the accuracy reduction due to tiling and recover accuracy by extending et al. of partial sums with very small energy overhead. Third, we reduce energy consumption for DRAM accessing. CNN requires massive data transmission between on-chip and off-chip memory. The energy consumption of data transmission accounts for a large portion of total energy consumption. We propose a spatial correlation-aware compression algorithm to reduce the transmission of feature maps. In each of these three levels, this dissertation proposes novel optimization and design ???ows which increase the energy ef???ciency of CNN accelerator on edge.clos

ScholarWorks@UNIST

Linear-Phase FIR Digital Filter ‎Design with Reduced Hardware Complexity using Discrete Differential Evolution

Author: Rehan Muhammed Kunwar
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2016
Field of study

Optimal design of xed coe cient nite word length linear phase FIR digital lters for custom ICs has been the focus of research in the past decade. With the ever increasing demands for high throughput and low power circuits, the need to design lters with reduced hardware complexity has become more crucial. Multiplierless lters provide substantial saving in hardware by using a shift add network to generate the lter coe cients. In this thesis, the multiplierless lter design problem is modeled as combinatorial optimization problem and is solved using a discrete Di erential Evolution algorithm. The Di erential Evolution algorithm\u27s population representation adapted for the nite word length lter design problem is developed and the mutation operator is rede ned for discrete valued parameters. Experiments show that the method is able to design lters up to a length of 300 taps with reduced hardware and shorter design times

Scholarship at UWindsor

Digital Filters

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The new technology advances provide that a great number of system signals can be easily measured with a low cost. The main problem is that usually only a fraction of the signal is useful for different purposes, for example maintenance, DVD-recorders, computers, electric/electronic circuits, econometric, optimization, etc. Digital filters are the most versatile, practical and effective methods for extracting the information necessary from the signal. They can be dynamic, so they can be automatically or manually adjusted to the external and internal conditions. Presented in this book are the most advanced digital filters including different case studies and the most relevant literature

Directory of Open Access Books (DOAB)

Design and Implementation of Complexity Reduced Digital Signal Processors for Low Power Biomedical Applications

Author: Eminaga Y.
Eminaga Y.
Publication venue
Publication date: 01/01/2019
Field of study

Wearable health monitoring systems can provide remote care with supervised, inde-pendent living which are capable of signal sensing, acquisition, local processing and transmission. A generic biopotential signal (such as Electrocardiogram (ECG), and Electroencephalogram (EEG)) processing platform consists of four main functional components. The signals acquired by the electrodes are ampliﬁed and preconditioned by the (1) Analog-Front-End (AFE) which are then digitized via the (2) Analog-to-Digital Converter (ADC) for further processing. The local digital signal processing is usually handled by a custom designed (3) Digital Signal Processor (DSP) which is responsible for either anyone or combination of signal processing algorithms such as noise detection, noise/artefact removal, feature extraction, classiﬁcation and compres-sion. The digitally processed data is then transmitted via the (4) transmitter which is renown as the most power hungry block in the complete platform. All the afore-mentioned components of the wearable systems are required to be designed and ﬁtted into an integrated system where the area and the power requirements are stringent. Therefore, hardware complexity and power dissipation of each functional component are crucial aspects while designing and implementing a wearable monitoring platform. The work undertaken focuses on reducing the hardware complexity of a biosignal DSP and presents low hardware complexity solutions that can be employed in the aforemen-tioned wearable platforms. A typical state-of-the-art system utilizes Sigma Delta (Σ∆) ADCs incorporating a Σ∆ modulator and a decimation ﬁlter whereas the state-of-the-art decimation ﬁlters employ linear phase Finite-Impulse-Response (FIR) ﬁlters with high orders that in-crease the hardware complexity [1–5]. In this thesis, the novel use of minimum phase Inﬁnite-Impulse-Response (IIR) decimators is proposed where the hardware complexity is massively reduced compared to the conventional FIR decimators. In addition, the non-linear phase eﬀects of these ﬁlters are also investigated since phase non-linearity may distort the time domain representation of the signal being ﬁltered which is un-desirable eﬀect for biopotential signals especially when the ﬁducial characteristics carry diagnostic importance. In the case of ECG monitoring systems the eﬀect of the IIR ﬁlter phase non-linearity is minimal which does not aﬀect the diagnostic accuracy of the signals. The work undertaken also proposes two methods for reducing the hardware complexity of the popular biosignal processing tool, Discrete Wavelet Transform (DWT). General purpose multipliers are known to be hardware and power hungry in terms of the number of addition operations or their underlying building blocks like full adders or half adders required. Higher number of adders leads to an increase in the power consumption which is directly proportional to the clock frequency, supply voltage, switching activity and the resources utilized. A typical Field-Programmable-Gate-Array’s (FPGA) resources are Look-up Tables (LUTs) whereas a custom Digital Signal Processor’s (DSP) are gate-level cells of standard cell libraries that are used to build adders [6]. One of the proposed methods is the replacement of the hardware and power hungry general pur-pose multipliers and the coeﬃcient memories with reconﬁgurable multiplier blocks that are composed of simple shift-add networks and multiplexers. This method substantially reduces the resource utilization as well as the power consumption of the system. The second proposed method is the design and implementation of the DWT ﬁlter banks using IIR ﬁlters which employ less number of arithmetic operations compared to the state-of-the-art FIR wavelets. This reduces the hardware complexity of the analysis ﬁlter bank of the DWT and can be employed in applications where the reconstruction is not required. However, the synthesis ﬁlter bank for the IIR wavelet transform has a higher computational complexity compared to the conventional FIR wavelet synthesis ﬁlter banks since re-indexing of the ﬁltered data sequence is required that can only be achieved via the use of extra registers. Therefore, this led to the proposal of a novel design which replaces the complex IIR based synthesis ﬁlter banks with FIR ﬁl-ters which are the approximations of the associated IIR ﬁlters. Finally, a comparative study is presented where the hybrid IIR/FIR and FIR/FIR wavelet ﬁlter banks are de-ployed in a typical noise reduction scenario using the wavelet thresholding techniques. It is concluded that the proposed hybrid IIR/FIR wavelet ﬁlter banks provide better denoising performance, reduced computational complexity and power consumption in comparison to their IIR/IIR and FIR/FIR counterparts

WestminsterResearch

Cross-Layer Automated Hardware Design for Accuracy-Configurable Approximate Computing

Author: Alan Tanfer
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 11/10/2021
Field of study

Approximate Computing trades off computation accuracy against performance or energy efficiency. It is a design paradigm that arose in the last decade as an answer to diminishing returns from Dennard\u27s scaling and a shift in the prominent workloads. A range of modern workloads, categorized mainly as recognition, mining, and synthesis, features an inherent tolerance to approximations. Their characteristics, such as redundancies in their input data and robust-to-noise algorithms, allow them to produce outputs of acceptable quality, despite an approximation in some of their computations. Approximate Computing leverages the application tolerance by relaxing the exactness in computation towards primary design goals of increasing performance or improving energy efficiency. Existing techniques span across the abstraction layers of computer systems where cross-layer techniques are shown to offer a larger design space and yield higher savings. Currently, the majority of the existing work aims at meeting a single accuracy. The extent of approximation tolerance, however, significantly varies with a change in input characteristics and applications. In this dissertation, methods and implementations are presented for cross-layer and automated design of accuracy-configurable Approximate Computing to maximally exploit the performance and energy benefits. In particular, this dissertation addresses the following challenges and introduces novel contributions: A main Approximate Computing category in hardware is to scale either voltage or frequency beyond the safe limits for power or performance benefits, respectively. The rationale is that timing errors would be gradual and for an initial range tolerable. This scaling enables a fine-grain accuracy-configurability by varying the timing error occurrence. However, conventional synthesis tools aim at meeting a single delay for all paths within the circuit. Subsequently, with voltage or frequency scaling, either all paths succeed, or a large number of paths fail simultaneously, with a steep increase in error rate and magnitude. This dissertation presents an automated method for minimizing path delays by individually constraining the primary outputs of combinational circuits. As a result, it reduces the number of failing paths and makes the timing errors significantly more gradual, and also rarer and smaller on average. Additionally, it reveals that delays can be significantly reduced towards the least significant bit (LSB) and allows operating at a higher frequency when small operands are computed. Precision scaling, i.e., reducing the representation of data and its accuracy is widely used in multiple abstraction layers in Approximate Computing. Reducing data precision also reduces the transistor toggles, and therefore the dynamic power consumption. Application and architecture level precision scaling results in using only LSBs of the circuit. Arithmetic circuits often have less complexity and logic depth in LSBs compared to most significant bits (MSB). To take advantage of this circuit property, a delay-altering synthesis methodology is proposed. The method finds energy-optimal delay values under configurable precision usage and assigns them to primary outputs used for different precisions. Thereby, it enables dynamic frequency-precision scalable circuits for energy efficiency. Within the hardware architecture, it is possible to instantiate multiple units with the same functionality with different fixed approximation levels, where each block benefits from having fewer transistors and also synthesis relaxations. These blocks can be selected dynamically and thus allow to configure the accuracy during runtime. Instantiating such approximate blocks can be a lower dynamic power but higher area and leakage cost alternative to the current state-of-the-art gating mechanisms which switch off a group of paths in the circuit to reduce the toggling activity. Jointly, instantiating multiple blocks and gating mechanisms produce a large design space of accuracy-configurable hardware, where energy-optimal solutions require a cross-layer search in architecture and circuit levels. To that end, an approximate hardware synthesis methodology is proposed with joint optimizations in architecture and circuit for dynamic accuracy scaling, and thereby it enables energy vs. area trade-offs

KITopen

Optimizing Dataflow Programs for Hardware Synthesis

Author: Ab Rahman Ab Al Hadi Bin
Publication venue: Lausanne, EPFL
Publication date: 14/01/2014
Field of study

Infoscience - École polytechnique fédérale de Lausanne

The automated compilation of comprehensive hardware design search spaces of algorithmic-based implementations for FPGA design exploration

Author: Balog Michael
Publication venue: Drexel University
Publication date
Field of study

Over the past few years FPGA hardware has become a logical choice for implementing cutting-edge signal processing applications. While there have been advances in FPGA technology, the common process of creating specialized hardware implementations for them is a manual one involving extensive design exploration. Design exploration is a process that requires a designer to look for designs that ¯t a set of performance characteristics such as size, throughput, or power depending on the application and it can be the most time consuming step when creating FPGA hardware. This process is a nontrivial task that requires extensive background knowledgeof both FPGA hardware and the application being implemented. While advances have been made in automating the process of design, there is still a gap between the application writers and hardware engineers that can be filled.This thesis presents a novel approach for automating the generation of hardware design search spaces that contain a comprehensive set of ways to implement signal processing algorithms with FPGAs. To accomplish this we generate a set of equivalent mathematical representations for an input equation via a novel declarative programming language that avoids a number of di±culties associated with the imperative languages used by previous approaches. We show that this equation space is bounded in terms of bracketing and ordering of mathematical operations, and that by changing the way an equation is written we can generate unique hardware instantiations (designs). The generated instantiations are mapped to heterogeneous computing architectures and written in a structural hardware descriptive language style to ensure that the intended instantiation will behave as predicted in hardware.A software system was created based on this approach that generates an equation space for varying numbers of summed multiplications and converts each representation into a comprehensive hardware design search space that can be analyzed for performance characteristics such as size, throughput, latency, and power.Ph.D., Electrical Engineering -- Drexel University, 200

Drexel Libraries E-Repository and Archives