11 research outputs found

    Novel sparse OBC based distributed arithmetic architecture for matrix transforms

    Get PDF
    Inner product (IP) forms the basis of a number of signal processing algorithms and applications such as transforms, filters, communication systems etc. Distributed arithmetic (DA) provides an effective methodology to implement IP of vectors and matrices using a simple combination of memory elements, adders and shifters instead of lumped multipliers. This bit level rearrangement results in much higher computational efficiencies and yields compact designs highly suited for high performance resource constrained applications. Offset binary coding (OBC) is an effective technique to further optimize the DA, and allows us to reduce the memory requirements by a factor of two, with minimum additional computational complexity. This makes OBC-DA attractive for applications that are both resource and memory constrained. In addition, sparse matrix factorization techniques can be exploited to further reduce the size of the DA-ROMs. In this paper, the design and implementation of a novel OBC based DA is demonstrated using a generic architecture for implementing discrete orthogonal transforms (DOTs). Implementation is performed on the Xilinx Virtex-II Pro field programmable gate array (FPGA), and a detailed comparison between conventional and OBC based DA is presented to highlight the trade offs in various design metrics including performance, area and power

    Run-Time Reconfigurable FFT Engine

    Get PDF
    This paper develops a system level architecture for implementing a cost-efficient, FPGA-based realtime FFT engine. This approach considers both the hardware cost (in terms of FPGA resource requirements), and performance (in terms of throughput). These two dimensions are optimized based on using run time reconfiguration, double buffering technique and the hardware virtualization to reuse the available processing components. The system employs sixteen reconfigurable parallel FFT cores. Each core represents a 16 complex point parallel FFT processor, running in continuous realtime FFT engine. The architecture support transform length of 256 complex points, as a demonstrator to the idea design, using fixed-point arithmetic and has been developed using radix-4 architecture. The parallel Booth technique for realizing the complex multiplier (required in the basic butterfly operation) is chosen. That is to save a lot of hardware compared to other techniques. The simulation results that have been performed using VHDL modeling language and ModelSim software shows that the full design can be implemented using single FPGA platform requiring about 50,000 Slices

    Efficient reconfigurable architectures for 3D medical image compression

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Recently, the more widespread use of three-dimensional (3-D) imaging modalities, such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US) have generated a massive amount of volumetric data. These have provided an impetus to the development of other applications, in particular telemedicine and teleradiology. In these fields, medical image compression is important since both efficient storage and transmission of data through high-bandwidth digital communication lines are of crucial importance. Despite their advantages, most 3-D medical imaging algorithms are computationally intensive with matrix transformation as the most fundamental operation involved in the transform-based methods. Therefore, there is a real need for high-performance systems, whilst keeping architectures exible to allow for quick upgradeability with real-time applications. Moreover, in order to obtain efficient solutions for large medical volumes data, an efficient implementation of these operations is of significant importance. Reconfigurable hardware, in the form of field programmable gate arrays (FPGAs) has been proposed as viable system building block in the construction of high-performance systems at an economical price. Consequently, FPGAs seem an ideal candidate to harness and exploit their inherent advantages such as massive parallelism capabilities, multimillion gate counts, and special low-power packages. The key achievements of the work presented in this thesis are summarised as follows. Two architectures for 3-D Haar wavelet transform (HWT) have been proposed based on transpose-based computation and partial reconfiguration suitable for 3-D medical imaging applications. These applications require continuous hardware servicing, and as a result dynamic partial reconfiguration (DPR) has been introduced. Comparative study for both non-partial and partial reconfiguration implementation has shown that DPR offers many advantages and leads to a compelling solution for implementing computationally intensive applications such as 3-D medical image compression. Using DPR, several large systems are mapped to small hardware resources, and the area, power consumption as well as maximum frequency are optimised and improved. Moreover, an FPGA-based architecture of the finite Radon transform (FRAT)with three design strategies has been proposed: direct implementation of pseudo-code with a sequential or pipelined description, and block random access memory (BRAM)- based method. An analysis with various medical imaging modalities has been carried out. Results obtained for image de-noising implementation using FRAT exhibits promising results in reducing Gaussian white noise in medical images. In terms of hardware implementation, promising trade-offs on maximum frequency, throughput and area are also achieved. Furthermore, a novel hardware implementation of 3-D medical image compression system with context-based adaptive variable length coding (CAVLC) has been proposed. An evaluation of the 3-D integer transform (IT) and the discrete wavelet transform (DWT) with lifting scheme (LS) for transform blocks reveal that 3-D IT demonstrates better computational complexity than the 3-D DWT, whilst the 3-D DWT with LS exhibits a lossless compression that is significantly useful for medical image compression. Additionally, an architecture of CAVLC that is capable of compressing high-definition (HD) images in real-time without any buffer between the quantiser and the entropy coder is proposed. Through a judicious parallelisation, promising results have been obtained with limited resources. In summary, this research is tackling the issues of massive 3-D medical volumes data that requires compression as well as hardware implementation to accelerate the slowest operations in the system. Results obtained also reveal a significant achievement in terms of the architecture efficiency and applications performance.Ministry of Higher Education Malaysia (MOHE), Universiti Tun Hussein Onn Malaysia (UTHM) and the British Counci

    FPGA implementation of a baseband processor for FBMC transmission

    Get PDF
    Work has begun on the latest generations of mobile networks, 5G. These will demand a more efficient use of the frequency spectrum, imposed by new use case scenarios and an increasing number of users.To comply with these requirements FBMC has been proposed as one of the possible waveforms to be used. Its main benefits include a higher spectral efficiency, a better performance in high speed scenarios and an improved robustness to mobility.This project intends to provide an implementation of a baseband processor for FMBC transmission. After being properly validated this implementation will then serve as the subject of further study, by analysing its performance in different metrics.This implementation will be done using an FPGA, as they offer a good development platform, considering they provide a good compromise between flexibility and processing power

    Efficient architectures and power modelling of multiresolution analysis algorithms on FPGA

    Get PDF
    In the past two decades, there has been huge amount of interest in Multiresolution Analysis Algorithms (MAAs) and their applications. Processing some of their applications such as medical imaging are computationally intensive, power hungry and requires large amount of memory which cause a high demand for efficient algorithm implementation, low power architecture and acceleration. Recently, some MAAs such as Finite Ridgelet Transform (FRIT) Haar Wavelet Transform (HWT) are became very popular and they are suitable for a number of image processing applications such as detection of line singularities and contiguous edges, edge detection (useful for compression and feature detection), medical image denoising and segmentation. Efficient hardware implementation and acceleration of these algorithms particularly when addressing large problems are becoming very chal-lenging and consume lot of power which leads to a number of issues including mobility, reliability concerns. To overcome the computation problems, Field Programmable Gate Arrays (FPGAs) are the technology of choice for accelerating computationally intensive applications due to their high performance. Addressing the power issue requires optimi- sation and awareness at all level of abstractions in the design flow. The most important achievements of the work presented in this thesis are summarised here. Two factorisation methodologies for HWT which are called HWT Factorisation Method1 and (HWTFM1) and HWT Factorasation Method2 (HWTFM2) have been explored to increase number of zeros and reduce hardware resources. In addition, two novel efficient and optimised architectures for proposed methodologies based on Distributed Arithmetic (DA) principles have been proposed. The evaluation of the architectural results have shown that the proposed architectures results have reduced the arithmetics calculation (additions/subtractions) by 33% and 25% respectively compared to direct implementa-tion of HWT and outperformed existing results in place. The proposed HWTFM2 is implemented on advanced and low power FPGA devices using Handel-C language. The FPGAs implementation results have outperformed other existing results in terms of area and maximum frequency. In addition, a novel efficient architecture for Finite Radon Trans-form (FRAT) has also been proposed. The proposed architecture is integrated with the developed HWT architecture to build an optimised architecture for FRIT. Strategies such as parallelism and pipelining have been deployed at the architectural level for efficient im-plementation on different FPGA devices. The proposed FRIT architecture performance has been evaluated and the results outperformed some other existing architecture in place. Both FRAT and FRIT architectures have been implemented on FPGAs using Handel-C language. The evaluation of both architectures have shown that the obtained results out-performed existing results in place by almost 10% in terms of frequency and area. The proposed architectures are also applied on image data (256 £ 256) and their Peak Signal to Noise Ratio (PSNR) is evaluated for quality purposes. Two architectures for cyclic convolution based on systolic array using parallelism and pipelining which can be used as the main building block for the proposed FRIT architec-ture have been proposed. The first proposed architecture is a linear systolic array with pipelining process and the second architecture is a systolic array with parallel process. The second architecture reduces the number of registers by 42% compare to first architec-ture and both architectures outperformed other existing results in place. The proposed pipelined architecture has been implemented on different FPGA devices with vector size (N) 4,8,16,32 and word-length (W=8). The implementation results have shown a signifi-cant improvement and outperformed other existing results in place. Ultimately, an in-depth evaluation of a high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called func-tional level power modelling approach have been presented. The mathematical techniques that form the basis of the proposed power modeling has been validated by a range of custom IP cores. The proposed power modelling is scalable, platform independent and compares favorably with existing approaches. A hybrid, top-down design flow paradigm integrating functional level power modelling with commercially available design tools for systematic optimisation of IP cores has also been developed. The in-depth evaluation of this tool enables us to observe the behavior of different custom IP cores in terms of power consumption and accuracy using different design methodologies and arithmetic techniques on virous FPGA platforms. Based on the results achieved, the proposed model accuracy is almost 99% true for all IP core's Dynamic Power (DP) components.EThOS - Electronic Theses Online ServiceThomas Gerald Gray Charitable TrustGBUnited Kingdo

    Rapid Digital Architecture Design of Computationally Complex Algorithms

    Get PDF
    Traditional digital design techniques hardly keep up with the rising abundance of programmable circuitry found on recent Field-Programmable Gate Arrays. Therefore, the novel Rapid Data Type-Agnostic Digital Design Methodology (RDAM) elevates the design perspective of digital design engineers away from the register-transfer level to the algorithmic level. It is founded on the capabilities of High-Level Synthesis tools. By consequently working with data type-agnostic source codes, the RDAM brings significant simplifications to the fixed-point conversion of algorithms and the design of complex-valued architectures. Signal processing applications from the field of Compressed Sensing illustrate the efficacy of the RDAM in the context of multi-user wireless communications. For instance, a complex-valued digital architecture of Orthogonal Matching Pursuit with rank-1 updating has successfully been implemented and tested

    Topics in Adaptive Optics

    Get PDF
    Advances in adaptive optics technology and applications move forward at a rapid pace. The basic idea of wavefront compensation in real-time has been around since the mid 1970s. The first widely used application of adaptive optics was for compensating atmospheric turbulence effects in astronomical imaging and laser beam propagation. While some topics have been researched and reported for years, even decades, new applications and advances in the supporting technologies occur almost daily. This book brings together 11 original chapters related to adaptive optics, written by an international group of invited authors. Topics include atmospheric turbulence characterization, astronomy with large telescopes, image post-processing, high power laser distortion compensation, adaptive optics and the human eye, wavefront sensors, and deformable mirrors

    Efficient Modelling and Simulation Methodology for the Design of Heterogeneous Mixed-Signal Systems on Chip

    Get PDF
    Systems on Chip (SoCs) and Systems in Package (SiPs) are key parts of a continuously broadening range of products, from chip cards and mobile phones to cars. Besides an increasing amount of digital hardware and software for data processing and storage, they integrate more and more analogue/RF circuits, sensors, and actuators to interact with their (analogue) environment. This trend towards more complex and heterogeneous systems with more intertwined functionalities is made possible by the continuous advances in the manufacturing technologies and pushed by market demand for new products and product variants. Therefore, the reuse and retargeting of existing component designs becomes more and more important. However, all these factors make the design process increasingly complex and multidisciplinary. Nowadays, the design of the individual components is usually well understood and optimised through the usage of a diversity of CAD/EDA tools, design languages, and data formats. These are based on applying specific modelling/abstraction concepts, description formalisms (also called Models of Computation (MoCs)) and analysis/simulation methods. The designer has to bridge the gaps between tools and methodologies using manual conversion of models and proprietary tool couplings/integrations, which is error-prone and time-consuming. A common design methodology and platform to manage, exchange, and collaboratively develop models of different formats and of different levels of abstraction is missing. The verification of the overall system is a big problem, as it requires the availability of compatible models for each component at the right level of abstraction to achieve satisfying results with respect to the system functionality and test coverage, but at the same time acceptable simulation performance in terms of accuracy and speed. Thus, the big challenge is the parallel integration of these very different part design processes. Therefore, the designers need a common design and simulation platform to create and refine an executable specification of the overall system (a virtual prototype) on a high level of abstraction, which supports different MoCs. This makes possible the exploration of different architecture options, estimation of the performance, validation of re-used parts, verification of the interfaces between heterogeneous components and interoperability with other systems as well as the assessment of the impacts of the future working environment and the manufacturing technologies used to realise the system. For embedded Analogue and Mixed-Signal (AMS) systems, the C++-based SystemC with its AMS extensions, to which recent standardisation the author contributed, is currently establishing itself as such a platform. This thesis describes the author's contribution to solve the modelling and simulation challenges mentioned above in three thematic phases. In the first phase, the prototype of a web-based platform to collect models from different domains and levels of abstraction together with their associated structural and semantical meta information has been developed and is called ModelLib. This work included the implementation of a hierarchical access control mechanism, which is able to protect the Intellectual Property (IP) constituted by the model at different levels of detail. The use cases developed for this tool show how it can support the AMS SoC design process by fostering the reuse and collaborative development of models for tasks like architecture exploration, system validation, and creation of more and more elaborated models of the system. The experiences from the ModelLib development delivered insight into which aspects need to be especially addressed throughout the development of models to make them reusable: mainly flexibility, documentation, and validation. This was the starting point for the development of an efficient modelling methodology for the top-down design and bottom-up verification of RF Systems based on the systematic usage of behavioural models in the second phase. One outcome is the developed library of well documented, parameterisable, and pin-accurate VHDL-AMS models of typical analogue/digital/RF components of a transceiver. The models offer the designer two sets of parameters: one based on the performance specifications and one based on the device parameters back-annotated from the transistor-level implementation. The abstraction level used for the description of the respective analogue/digital/RF component behaviour has been chosen to achieve a good trade-off between accuracy, fidelity, and simulation performance. The pin-accurate model interfaces facilitate the integration of transistor-level models for the validation of the behavioural models or the verification of a component implementation in the system context. These properties make the models suitable for different design tasks such as architecture exploration or overall system validation. This is demonstrated on a model of a binary Frequency-Shift Keying (FSK) transmitter parameterised to meet very different target specifications. This project showed also the limits in terms of abstraction and simulation performance of the "classical" AMS Hardware Description Languages (HDLs). Therefore, the third and last phase was dedicated to further raise the abstraction level for the description of complex and heterogeneous AMS SoCs and thus enable their efficient simulation using different synchronised MoCs. This work uses the C++-based simulation framework SystemC with its AMS extensions. New modelling capabilities going beyond the standardised SystemC AMS extensions have been introduced to describe energy conserving multi-domain systems in a formal and consistent way at a high level of abstraction. To this end, all constants, variables, and parameters of the system model, which represent a physical quantity, can now declare their dimension and associated system of units as an intrinsic part of their data type. Assignments to them need to contain besides the value also the correct measurement unit. This allows a much more precise but still compact definition of the models' interfaces and equations. Thus, the C++ compiler can check the correct assembly of the components and the coherency of the equations by means of dimensional analysis. The implementation is based on the Boost.Units library, which employs template metaprogramming techniques. A dedicated filter for the measurement units data types has been implemented to simplify the compiler messages and thus facilitate the localisation of unit errors. To ensure the reusability of models despite precisely defined interfaces, their interfaces and behaviours need to be parametrisable in a well-defined manner. The enabling implementation techniques for this have been demonstrated with the developed library of generic block diagram component models for the Timed Data Flow (TDF) MoC of the SystemC AMS extensions. These techniques are also the key to integrate a new MoC based on the bond graph formalism into the SystemC AMS extensions. Bond graphs facilitate the unified description of the energy conserving parts of heterogeneous systems with the help of a small set of modelling primitives parametrisable to the physical domain. The resulting models have a simulation performance comparable to an equivalent signal flow model
    corecore