5,798 research outputs found

    Energy-efficient acceleration of MPEG-4 compression tools

    Get PDF
    We propose novel hardware accelerator architectures for the most computationally demanding algorithms of the MPEG-4 video compression standard-motion estimation, binary motion estimation (for shape coding), and the forward/inverse discrete cosine transforms (incorporating shape adaptive modes). These accelerators have been designed using general low-energy design philosophies at the algorithmic/architectural abstraction levels. The themes of these philosophies are avoiding waste and trading area/performance for power and energy gains. Each core has been synthesised targeting TSMC 0.09 μm TCBN90LP technology, and the experimental results presented in this paper show that the proposed cores improve upon the prior art

    Energy efficient hardware acceleration of multimedia processing tools

    Get PDF
    The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

    Efficient FPGA implementation and power modelling of image and signal processing IP cores

    Get PDF
    Field Programmable Gate Arrays (FPGAs) are the technology of choice in a number ofimage and signal processing application areas such as consumer electronics, instrumentation, medical data processing and avionics due to their reasonable energy consumption, high performance, security, low design-turnaround time and reconfigurability. Low power FPGA devices are also emerging as competitive solutions for mobile and thermally constrained platforms. Most computationally intensive image and signal processing algorithms also consume a lot of power leading to a number of issues including reduced mobility, reliability concerns and increased design cost among others. Power dissipation has become one of the most important challenges, particularly for FPGAs. Addressing this problem requires optimisation and awareness at all levels in the design flow. The key achievements of the work presented in this thesis are summarised here. Behavioural level optimisation strategies have been used for implementing matrix product and inner product through the use of mathematical techniques such as Distributed Arithmetic (DA) and its variations including offset binary coding, sparse factorisation and novel vector level transformations. Applications to test the impact of these algorithmic and arithmetic transformations include the fast Hadamard/Walsh transforms and Gaussian mixture models. Complete design space exploration has been performed on these cores, and where appropriate, they have been shown to clearly outperform comparable existing implementations. At the architectural level, strategies such as parallelism, pipelining and systolisation have been successfully applied for the design and optimisation of a number of cores including colour space conversion, finite Radon transform, finite ridgelet transform and circular convolution. A pioneering study into the influence of supply voltage scaling for FPGA based designs, used in conjunction with performance enhancing strategies such as parallelism and pipelining has been performed. Initial results are very promising and indicated significant potential for future research in this area. A key contribution of this work includes the development of a novel high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called Functional Level Power Analysis and Modelling (FLPAM). FLPAM is scalable, platform independent and compares favourably with existing approaches. A hybrid, top-down design flow paradigm integrating FLPAM with commercially available design tools for systematic optimisation of IP cores has also been developed.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Rate-distortion Balanced Data Compression for Wireless Sensor Networks

    Get PDF
    This paper presents a data compression algorithm with error bound guarantee for wireless sensor networks (WSNs) using compressing neural networks. The proposed algorithm minimizes data congestion and reduces energy consumption by exploring spatio-temporal correlations among data samples. The adaptive rate-distortion feature balances the compressed data size (data rate) with the required error bound guarantee (distortion level). This compression relieves the strain on energy and bandwidth resources while collecting WSN data within tolerable error margins, thereby increasing the scale of WSNs. The algorithm is evaluated using real-world datasets and compared with conventional methods for temporal and spatial data compression. The experimental validation reveals that the proposed algorithm outperforms several existing WSN data compression methods in terms of compression efficiency and signal reconstruction. Moreover, an energy analysis shows that compressing the data can reduce the energy expenditure, and hence expand the service lifespan by several folds.Comment: arXiv admin note: text overlap with arXiv:1408.294

    Efficient architectures and power modelling of multiresolution analysis algorithms on FPGA

    Get PDF
    In the past two decades, there has been huge amount of interest in Multiresolution Analysis Algorithms (MAAs) and their applications. Processing some of their applications such as medical imaging are computationally intensive, power hungry and requires large amount of memory which cause a high demand for efficient algorithm implementation, low power architecture and acceleration. Recently, some MAAs such as Finite Ridgelet Transform (FRIT) Haar Wavelet Transform (HWT) are became very popular and they are suitable for a number of image processing applications such as detection of line singularities and contiguous edges, edge detection (useful for compression and feature detection), medical image denoising and segmentation. Efficient hardware implementation and acceleration of these algorithms particularly when addressing large problems are becoming very chal-lenging and consume lot of power which leads to a number of issues including mobility, reliability concerns. To overcome the computation problems, Field Programmable Gate Arrays (FPGAs) are the technology of choice for accelerating computationally intensive applications due to their high performance. Addressing the power issue requires optimi- sation and awareness at all level of abstractions in the design flow. The most important achievements of the work presented in this thesis are summarised here. Two factorisation methodologies for HWT which are called HWT Factorisation Method1 and (HWTFM1) and HWT Factorasation Method2 (HWTFM2) have been explored to increase number of zeros and reduce hardware resources. In addition, two novel efficient and optimised architectures for proposed methodologies based on Distributed Arithmetic (DA) principles have been proposed. The evaluation of the architectural results have shown that the proposed architectures results have reduced the arithmetics calculation (additions/subtractions) by 33% and 25% respectively compared to direct implementa-tion of HWT and outperformed existing results in place. The proposed HWTFM2 is implemented on advanced and low power FPGA devices using Handel-C language. The FPGAs implementation results have outperformed other existing results in terms of area and maximum frequency. In addition, a novel efficient architecture for Finite Radon Trans-form (FRAT) has also been proposed. The proposed architecture is integrated with the developed HWT architecture to build an optimised architecture for FRIT. Strategies such as parallelism and pipelining have been deployed at the architectural level for efficient im-plementation on different FPGA devices. The proposed FRIT architecture performance has been evaluated and the results outperformed some other existing architecture in place. Both FRAT and FRIT architectures have been implemented on FPGAs using Handel-C language. The evaluation of both architectures have shown that the obtained results out-performed existing results in place by almost 10% in terms of frequency and area. The proposed architectures are also applied on image data (256 £ 256) and their Peak Signal to Noise Ratio (PSNR) is evaluated for quality purposes. Two architectures for cyclic convolution based on systolic array using parallelism and pipelining which can be used as the main building block for the proposed FRIT architec-ture have been proposed. The first proposed architecture is a linear systolic array with pipelining process and the second architecture is a systolic array with parallel process. The second architecture reduces the number of registers by 42% compare to first architec-ture and both architectures outperformed other existing results in place. The proposed pipelined architecture has been implemented on different FPGA devices with vector size (N) 4,8,16,32 and word-length (W=8). The implementation results have shown a signifi-cant improvement and outperformed other existing results in place. Ultimately, an in-depth evaluation of a high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called func-tional level power modelling approach have been presented. The mathematical techniques that form the basis of the proposed power modeling has been validated by a range of custom IP cores. The proposed power modelling is scalable, platform independent and compares favorably with existing approaches. A hybrid, top-down design flow paradigm integrating functional level power modelling with commercially available design tools for systematic optimisation of IP cores has also been developed. The in-depth evaluation of this tool enables us to observe the behavior of different custom IP cores in terms of power consumption and accuracy using different design methodologies and arithmetic techniques on virous FPGA platforms. Based on the results achieved, the proposed model accuracy is almost 99% true for all IP core's Dynamic Power (DP) components.EThOS - Electronic Theses Online ServiceThomas Gerald Gray Charitable TrustGBUnited Kingdo
    corecore