757 research outputs found
Recommended from our members
Efficient FPGA implementation and power modelling of image and signal processing IP cores
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Field Programmable Gate Arrays (FPGAs) are the technology of choice in a number ofimage
and signal processing application areas such as consumer electronics, instrumentation,
medical data processing and avionics due to their reasonable energy consumption, high performance, security, low design-turnaround time and reconfigurability. Low power FPGA
devices are also emerging as competitive solutions for mobile and thermally constrained platforms. Most computationally intensive image and signal processing algorithms also consume a lot of power leading to a number of issues including reduced mobility, reliability concerns and increased design cost among others. Power dissipation has become one of the most important challenges, particularly for FPGAs. Addressing this problem requires optimisation and awareness at all levels in the design flow. The key achievements of the
work presented in this thesis are summarised here. Behavioural level optimisation strategies have been used for implementing matrix product and inner product through the use of mathematical techniques such as Distributed Arithmetic (DA) and its variations including offset binary coding, sparse factorisation and novel vector level transformations. Applications to test the impact of these algorithmic and arithmetic transformations include the fast Hadamard/Walsh transforms and Gaussian mixture models. Complete design space exploration has been performed on these cores, and where appropriate, they have been shown to clearly outperform comparable existing implementations. At the architectural level, strategies such as parallelism, pipelining and systolisation have been successfully applied for the design and optimisation of a number of
cores including colour space conversion, finite Radon transform, finite ridgelet transform and circular convolution. A pioneering study into the influence of supply voltage scaling for FPGA based designs, used in conjunction with performance enhancing strategies such as parallelism and pipelining has been performed. Initial results are very promising and indicated significant potential for future research in this area.
A key contribution of this work includes the development of a novel high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called Functional Level Power Analysis and Modelling (FLPAM). FLPAM
is scalable, platform independent and compares favourably with existing approaches. A hybrid, top-down design flow paradigm integrating FLPAM with commercially available design tools for systematic optimisation of IP cores has also been developed
Efficient quadratic placement for FPGAs.
Field Programmable Gate Arrays (FPGAs) are widely used in industry because they can implement any digital circuit on site simply by specifying programmable logic and their interconnections. However, this rapid prototyping advantage may be adversely affected because of the long compile time, which is dominated by placement and routing. This issue is of great importance, especially as the logic capacities of FPGAs continue to grow. This thesis focuses on the placement phase of FPGA Computer Aided Design (CAD) flow and presents a fast, high quality, wirelength-driven placement algorithm for FPGAs that is based on the quadratic placement approach. In this thesis, multiple iterations of equation solving process together with a linear wirelength reduction technique are introduced. The proposed algorithm efficiently handles the main problems with the quadratic placement algorithm and produces a fast and high quality placement. Experimental results, using twenty benchmark circuits, show that this algorithm can achieve comparable total wirelength and, on average, 5X faster run time when compared to an existing, state-of-the-art placement tool. This thesis also shows that the proposed algorithm delivers promising preliminary results in minimizing the critical path delay while maintaining high placement quality.Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .X86. Source: Masters Abstracts International, Volume: 44-04, page: 1946. Thesis (M.A.Sc.)--University of Windsor (Canada), 2005
A Comprehensive Workflow for General-Purpose Neural Modeling with Highly Configurable Neuromorphic Hardware Systems
In this paper we present a methodological framework that meets novel
requirements emerging from upcoming types of accelerated and highly
configurable neuromorphic hardware systems. We describe in detail a device with
45 million programmable and dynamic synapses that is currently under
development, and we sketch the conceptual challenges that arise from taking
this platform into operation. More specifically, we aim at the establishment of
this neuromorphic system as a flexible and neuroscientifically valuable
modeling tool that can be used by non-hardware-experts. We consider various
functional aspects to be crucial for this purpose, and we introduce a
consistent workflow with detailed descriptions of all involved modules that
implement the suggested steps: The integration of the hardware interface into
the simulator-independent model description language PyNN; a fully automated
translation between the PyNN domain and appropriate hardware configurations; an
executable specification of the future neuromorphic system that can be
seamlessly integrated into this biology-to-hardware mapping process as a test
bench for all software layers and possible hardware design modifications; an
evaluation scheme that deploys models from a dedicated benchmark library,
compares the results generated by virtual or prototype hardware devices with
reference software simulations and analyzes the differences. The integration of
these components into one hardware-software workflow provides an ecosystem for
ongoing preparative studies that support the hardware design process and
represents the basis for the maturity of the model-to-hardware mapping
software. The functionality and flexibility of the latter is proven with a
variety of experimental results
Recommended from our members
Efficient architectures and power modelling of multiresolution analysis algorithms on FPGA
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.In the past two decades, there has been huge amount of interest in Multiresolution Analysis Algorithms (MAAs) and their applications. Processing some of their applications such as medical imaging are computationally intensive, power hungry and requires large amount of memory which cause a high demand for efficient algorithm implementation, low power architecture and acceleration. Recently, some MAAs such as Finite Ridgelet Transform (FRIT) Haar Wavelet Transform (HWT) are became very popular and they are suitable for a number of image processing applications such as detection of line singularities and contiguous edges, edge detection (useful for compression and feature detection), medical image denoising and segmentation. Efficient hardware implementation and acceleration of these algorithms particularly when addressing large problems are becoming very chal-lenging and consume lot of power which leads to a number of issues including mobility, reliability concerns. To overcome the computation problems, Field Programmable Gate Arrays (FPGAs) are the technology of choice for accelerating computationally intensive applications due to their high performance. Addressing the power issue requires optimi- sation and awareness at all level of abstractions in the design flow.
The most important achievements of the work presented in this thesis are summarised
here.
Two factorisation methodologies for HWT which are called HWT Factorisation Method1 and (HWTFM1) and HWT Factorasation Method2 (HWTFM2) have been explored to increase number of zeros and reduce hardware resources. In addition, two novel efficient and optimised architectures for proposed methodologies based on Distributed Arithmetic (DA) principles have been proposed. The evaluation of the architectural results have shown that the proposed architectures results have reduced the arithmetics calculation (additions/subtractions) by 33% and 25% respectively compared to direct implementa-tion of HWT and outperformed existing results in place. The proposed HWTFM2 is implemented on advanced and low power FPGA devices using Handel-C language. The FPGAs implementation results have outperformed other existing results in terms of area and maximum frequency. In addition, a novel efficient architecture for Finite Radon Trans-form (FRAT) has also been proposed. The proposed architecture is integrated with the developed HWT architecture to build an optimised architecture for FRIT. Strategies such as parallelism and pipelining have been deployed at the architectural level for efficient im-plementation on different FPGA devices. The proposed FRIT architecture performance has been evaluated and the results outperformed some other existing architecture in place. Both FRAT and FRIT architectures have been implemented on FPGAs using Handel-C language. The evaluation of both architectures have shown that the obtained results out-performed existing results in place by almost 10% in terms of frequency and area. The proposed architectures are also applied on image data (256 Ā£ 256) and their Peak Signal to Noise Ratio (PSNR) is evaluated for quality purposes.
Two architectures for cyclic convolution based on systolic array using parallelism and pipelining which can be used as the main building block for the proposed FRIT architec-ture have been proposed. The first proposed architecture is a linear systolic array with pipelining process and the second architecture is a systolic array with parallel process. The second architecture reduces the number of registers by 42% compare to first architec-ture and both architectures outperformed other existing results in place. The proposed pipelined architecture has been implemented on different FPGA devices with vector size (N) 4,8,16,32 and word-length (W=8). The implementation results have shown a signifi-cant improvement and outperformed other existing results in place.
Ultimately, an in-depth evaluation of a high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called func-tional level power modelling approach have been presented. The mathematical techniques that form the basis of the proposed power modeling has been validated by a range of custom IP cores. The proposed power modelling is scalable, platform independent and compares favorably with existing approaches. A hybrid, top-down design flow paradigm integrating functional level power modelling with commercially available design tools for systematic optimisation of IP cores has also been developed. The in-depth evaluation of this tool enables us to observe the behavior of different custom IP cores in terms of power consumption and accuracy using different design methodologies and arithmetic techniques on virous FPGA platforms. Based on the results achieved, the proposed model accuracy is almost 99% true for all IP core's Dynamic Power (DP) components.Thomas Gerald Gray Charitable Trus
The IPS fidelity scale as a guideline to implement Supported Employment
info:eu-repo/semantics/publishe
CABE : a cloud-based acoustic beamforming emulator for FPGA-based sound source localization
Microphone arrays are gaining in popularity thanks to the availability of low-cost microphones. Applications including sonar, binaural hearing aid devices, acoustic indoor localization techniques and speech recognition are proposed by several research groups and companies. In most of the available implementations, the microphones utilized are assumed to offer an ideal response in a given frequency domain. Several toolboxes and software can be used to obtain a theoretical response of a microphone array with a given beamforming algorithm. However, a tool facilitating the design of a microphone array taking into account the non-ideal characteristics could not be found. Moreover, generating packages facilitating the implementation on Field Programmable Gate Arrays has, to our knowledge, not been carried out yet. Visualizing the responses in 2D and 3D also poses an engineering challenge. To alleviate these shortcomings, a scalable Cloud-based Acoustic Beamforming Emulator (CABE) is proposed. The non-ideal characteristics of microphones are considered during the computations and results are validated with acoustic data captured from microphones. It is also possible to generate hardware description language packages containing delay tables facilitating the implementation of Delay-and-Sum beamformers in embedded hardware. Truncation error analysis can also be carried out for fixed-point signal processing. The effects of disabling a given group of microphones within the microphone array can also be calculated. Results and packages can be visualized with a dedicated client application. Users can create and configure several parameters of an emulation, including sound source placement, the shape of the microphone array and the required signal processing flow. Depending on the user configuration, 2D and 3D graphs showing the beamforming results, waterfall diagrams and performance metrics can be generated by the client application. The emulations are also validated with captured data from existing microphone arrays.</jats:p
Leading the Blind:Automated Transistor-Level Modeling for FPGA Architects
The design and development of innovative FPGA architectures hinge on the flexibility of its toolchain. Retargetable toolchains, like the Verilog-to-Routing (VTR) flow, have been developed to enable the testing of new FPGAs by mapping circuits onto easily-described and possibly theoretical architectures. However, in reality, the difficulty extends beyond having CAD tools that support the architectural changes: it is equally important for FPGA architects to be able to produce reliable delay and area models for these tools. In addition to having acute architectural intuitions, designing and optimizing the circuit at the transistor-level requires architects to have, as well, a particular set of electrical engineering skills and expertise. The process is also painstaking and time-consuming, rendering the comparison of a variety of architectures or the exploration of a wide design space quite complicated and even impossible in practice. In this work, we present a novel approach to model the delay and area of FPGA architectures with various structures and characteristics, quickly and with acceptable accuracy. Abstracting from the user the transistor-level design and optimization that normally accompany the model- ing process, this approach, called FPRESSO, can be used by any architect without prerequisites. We take inspiration from the way a standard-cell flow performs large-scale transistor-size optimization and apply the same concepts to FPGAs, only at a coarser granularity. Skilled designers prepare for FPRESSO a set of locally optimized libraries of basic parameterizable components with a variety of drive strengths. Then, inexperienced users specify arbitrary FPGA architectures as interconnects of these basic components. The architecture is globally optimized, within minutes, through a standard logic synthesis tool, by choosing the most fitting version of each cell and adding buffers wherever appropriate. The resulting delay and area characteristics are automatically returned, in a format suitable for the VTR flow. A correct modeling of any architecture requires not only an optimization of the logic components, but also a proper modeling of the wires connecting these components. This does not only include measuring the length of the wires to determine their respective resistance and capacitance, but also, minimizing their length to reduce the wireload effect on the overall performance. To that end, FPRESSO features an automatic and generic wire modeling approach based on a simulated annealing floorplanning algorithm, to estimate the wires between the different components of the FPGA architecture. To evaluate the results of FPRESSO and confirm the validity of its modeled architectures, we use it to explore a wide range of FPGA architectures. First, we repeat a known study that helped set the standards on the optimal Look-Up-Table (LUT) and cluster size for conventional FPGAs. We show, by comparing with the results of the study, that modeling in FPRESSO preserves the very same trends and conclusions, with significantly less effort. We then extend the search space to cover fracturable LUTs and sparse crossbars, and show how FPRESSO makes the exploration of a huge search space not only possible but easy, efficient, and affordable, for any class of VTR users
- ā¦