Search CORE

221 research outputs found

Hardware implementation of daubechies wavelet transforms using folded AIQ mapping

Author: Islam Md Ashraful
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

The Discrete Wavelet Transform (DWT) is a popular tool in the field of image and video compression applications. Because of its multi-resolution representation capability, the DWT has been used effectively in applications such as transient signal analysis, computer vision, texture analysis, cell detection, and image compression. Daubechies wavelets are one of the popular transforms in the wavelet family. Daubechies filters provide excellent spatial and spectral locality-properties which make them useful in image compression. In this thesis, we present an efficient implementation of a shared hardware core to compute two 8-point Daubechies wavelet transforms. The architecture is based on a new two-level folded mapping technique, an improved version of the Algebraic Integer Quantization (AIQ). The scheme is developed on the factorization and decomposition of the transform coefficients that exploits the symmetrical and wrapping structure of the matrices. The proposed architecture is parallel, pipelined, and multiplexed. Compared to existing designs, the proposed scheme reduces significantly the hardware cost, critical path delay and power consumption with a higher throughput rate. Later, we have briefly presented a new mapping scheme to error-freely compute the Daubechies-8 tap wavelet transform, which is the next transform of Daubechies-6 in the Daubechies wavelet series. The multidimensional technique maps the irrational transformation basis coefficients with integers and results in considerable reduction in hardware and power consumption, and significant improvement in image reconstruction quality

eCommons@USASK

University of Saskatchewan Research Archive

A low power design for arithmetic and logic unit

Author: NG KAR SIN
Publication venue
Publication date: 30/12/2004
Field of study

Master'sMASTER OF ENGINEERIN

Numerical models for the large-scale simulation of fault and fracture mechanics

Author: Franceschini Andrea
Publication venue
Publication date: 06/01/2018
Field of study

The possible activation of pre-existing faults and the generation of new fractures in the subsurface may play a critical role in several fields of great social interest, such as the management and the exploitation of groundwater resources, especially in arid areas, the hydrocarbon recovery and storage, and the monitoring of the seismic activity in the Earth’s crust. The sliding and/or opening of a fault can create preferential leakage paths for the pore fluid escape, causing a matter of great concern in the process of storing fluids and hydrocarbons underground. The most challenging effect connected to a fault activation is the possible earthquake triggering. Many earthquakes associated with the production and injection of fluids have been recently reported. Similar issues arise also in the development of unconventional hydrocarbon reservoirs, that has recently experienced a dramatic increase thanks to the deployment of the “fracking” technology, which is based on the massive generation of fractures through the injection of fluids at high pressures. The use of this technique in densely populated areas has raised a large scientific debate on the possible connected environmental risks. The over-exploitation of fresh aquifers in arid regions has caused the generation of significant ground fissures. In this thesis, a novel formulation based on the use of Lagrange multipliers has been developed for the stable and robust numerical modeling of fault mechanics. A fault or fracture is simulated as a pair of inner surfaces included in a 3D geological formation where Lagrange multipliers are used to prescribe the contact constraints. The standard variational formulation of the contact problem with Lagrange multipliers is modified to take into account the energy dissipated by the frictional work along the activated fault portion. This term is computed by making use of the principle of maximum plastic dissipation, whose application defines the direction of the limiting shear stress vector. The novel approach has been verified against analytical solutions and applied in a number of real-world problems. In particular, we test the novel approach in four cases: (i) mechanics of two adjacent blocks, to investigate the numerical properties of the algorithm; (ii-iii) ground fractures due to groundwater withdrawal, with different geometries; (iv) fault reactivation in an underground reservoir subject to primary production and Underground Gas Storage cycles. The results are analyzed and commented. In the fourth case, the possible magnitude of the seismic events triggered by fault reactivation is computed, in order to evaluate whether underground human activities may generate seismicity. The application of the fault model to large-scale problems gives rise to a set of sparse discrete systems of linearized equations with a generalized non-symmetric saddle point structure. The second part of this thesis is devoted to the development of efficient algorithms for the iterative solution of this kind of system. We focus on a preconditioning technique, denoted as “constraint preconditioning”, which exploits the native block structure of the Jacobian. The quality and performance of the preconditioner relies on two steps: (i) the preconditioning of the leading block and (ii) the Schur complement computation. In this work, novel preconditioning techniques for the leading block based on a multilevel framework are developed and tested. The main idea behind the multilevel preconditioner is to improve the quality of the factorized approximate inverses borrowing the scheme of incomplete factorizations, thus introducing some sequentially in perfectly parallelizable algorithms. The proposed approach is robust, from a theoretical point of view, and very efficient in parallel environment. As to the latter point, i.e. the Schur complement computation, it can be done with the aid of different approximations. The main difference is whether the Jacobian is symmetrized or not. The computation can be founded on the FSAI approximation of the leading block inverse or on a physically-based block diagonal block algorithm. The Schur complement must be inverted, thus other possibilities come in. The approximate Schur complement can be inverted through FSAI, if symmetric, or an incomplete factorization, if non-symmetric, but it can also be solved exactly, thanks to a direct solver. The performances of the proposed algorithms are finally investigated and discussed in a set of real-world numerical examples

Archivio istituzionale della ricerca - Università di Padova

Hardward and algorithm architectures for real-time additive synthesis

Author: Symons Peter Robert
Publication venue
Publication date: 01/01/2005
Field of study

Additive synthesis is a fundamental computer music synthesis paradigm tracing its origins to the work of Fourier and Helmholtz. Rudimentary implementation linearly combines harmonic sinusoids (or partials) to generate tones whose perceived timbral characteristics are a strong function of the partial amplitude spectrum. Having evolved over time, additive synthesis describes a collection of algorithms each characterised by the time-varying linear combination of basis components to generate temporal evolution of timbre. Basis components include exactly harmonic partials, inharmonic partials with time-varying frequency or non-sinusoidal waveforms each with distinct spectral characteristics. Additive synthesis of polyphonic musical instrument tones requires a large number of independently controlled partials incurring a large computational overhead whose investigation and reduction is a key motivator for this work. The thesis begins with a review of prevalent synthesis techniques setting additive synthesis in context and introducing the spectrum modelling paradigm which provides baseline spectral data to the additive synthesis process obtained from the analysis of natural sounds. We proceed to investigate recursive and phase accumulating digital sinusoidal oscillator algorithms, defining specific metrics to quantify relative performance. The concepts of phase accumulation, table lookup phase-amplitude mapping and interpolated fractional addressing are introduced and developed and shown to underpin an additive synthesis subclass - wavetable lookup synthesis (WLS). WLS performance is simulated against specific metrics and parameter conditions peculiar to computer music requirements. We conclude by presenting processing architectures which accelerate computational throughput of specific WLS operations and the sinusoidal additive synthesis model. In particular, we introduce and investigate the concept of phase domain processing and present several “pipeline friendly” arithmetic architectures using this technique which implement the additive synthesis of sinusoidal partials

Design and Implementation of IDCT/IDST-Specific Accelerators for HEVC Standard on Heterogeneous Accelerator-Rich Platform

Author: Pourabed Mohammad Ali
Publication venue
Publication date: 08/05/2019
Field of study

Having High Efficiency Video Coding (HEVC) is important for image processing, reducing bandwidth, and increasing video quality. There are different methods that can be used to implement HEVC. This thesis focuses on design and implementation of application-specific accelerators for IDCT/IDST algorithms dedicated for HEVC standard. Those algorithms are parallel-in-nature tasks which makes them suitable to be executed by heterogeneous multicore platforms. This is done using accelerators which are required for power efficient processing. In this study, Coarse-Grained Reconfigurable Arrays (CGRAs) are used for making a template for an accelerator. CGRA has one of the major roles in a Heterogeneous Accelerator-Rich Platforms (HARP) as it is capable of accelerating non-parallel loops with lower loop counts. This thesis includes various algorithms for the use of IDCT and IDST with different designs and templates, reaching a unique final architecture. The final output intended is to reach 4 points IDST together with a 4/8 points IDCT. Another feature added to the hypothesis is the use of different dimensions for the CGRA template in order to have a different type of accelerator. The many CGRAs are combined together in successive arrangement with Reduced Instructions Set Computers (RISC) over the Network-on-Chip (NoC). The aim is to study the performance of the accelerator used for the IDCT and the IDST. This can be evaluated as the data movement through NoC network along with comparison of performance of accelerator with clock cycles in order to calculate the efficiency of the system. The results show that a four point IDST and IDCT can be computed in 56 clock cycles. In addition, the 8 point IDCT can be implemented in 64 cycles. One important factor to consider during the study is the power and energy consumption which is important in this century. The dynamic power dissipation usage for the routing of data has reached a value of 4.03 mW. Whereas, the energy consumption was 1.76

\mu

J for the 4 points system (IDCT and IDST) and 3.06

\mu

J for the 8 points (IDCT). Processing Elements (PEs) are used for implementing the transform algorithm and units were operated at 200 MHz. Finally, these results show that 1080P image at 30 frames per second can be attained by using FPGA