1,694 research outputs found
Recommended from our members
Efficient FPGA implementation and power modelling of image and signal processing IP cores
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Field Programmable Gate Arrays (FPGAs) are the technology of choice in a number ofimage
and signal processing application areas such as consumer electronics, instrumentation,
medical data processing and avionics due to their reasonable energy consumption, high performance, security, low design-turnaround time and reconfigurability. Low power FPGA
devices are also emerging as competitive solutions for mobile and thermally constrained platforms. Most computationally intensive image and signal processing algorithms also consume a lot of power leading to a number of issues including reduced mobility, reliability concerns and increased design cost among others. Power dissipation has become one of the most important challenges, particularly for FPGAs. Addressing this problem requires optimisation and awareness at all levels in the design flow. The key achievements of the
work presented in this thesis are summarised here. Behavioural level optimisation strategies have been used for implementing matrix product and inner product through the use of mathematical techniques such as Distributed Arithmetic (DA) and its variations including offset binary coding, sparse factorisation and novel vector level transformations. Applications to test the impact of these algorithmic and arithmetic transformations include the fast Hadamard/Walsh transforms and Gaussian mixture models. Complete design space exploration has been performed on these cores, and where appropriate, they have been shown to clearly outperform comparable existing implementations. At the architectural level, strategies such as parallelism, pipelining and systolisation have been successfully applied for the design and optimisation of a number of
cores including colour space conversion, finite Radon transform, finite ridgelet transform and circular convolution. A pioneering study into the influence of supply voltage scaling for FPGA based designs, used in conjunction with performance enhancing strategies such as parallelism and pipelining has been performed. Initial results are very promising and indicated significant potential for future research in this area.
A key contribution of this work includes the development of a novel high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called Functional Level Power Analysis and Modelling (FLPAM). FLPAM
is scalable, platform independent and compares favourably with existing approaches. A hybrid, top-down design flow paradigm integrating FLPAM with commercially available design tools for systematic optimisation of IP cores has also been developed
Reconfigurable hardware for color space conversion
Color space conversion (CSC) is an important application in image and video processing systems. CSC has been implemented in software and various kinds of hardware. Hardware implementations can achieve a higher performance compared to software-only solutions. Application specific integrated circuits (ASICs) are efficient and have good performance. However, they lack the programmability of devices such as field programmable gate arrays (FPGAs). This thesis studies the performance vs. flexibility tradeoffs in the migration of an existing CSC design from an ASIC to an FPGA. The existing ASIC is used within a commercial color-printing pipeline. Performance is critical in this application. However, the flexibility of FPGAs is desirable for faster time to market and also the ability to reuse one physical device across multiple functions. This thesis investigates whether the reprogrammability of FPGAs can be used to reallocate idle resources and studies the suitability of FPGAs for image processing applications. In the ASIC design, two major conversion units that are never used at the same time are identified. The FPGA-based implementation instantiates only one of these two units at a time, thus saving area. Reconfiguring the FPGA switches which of the two units is instantiated. The goal is to configure the device and process an entire page within one second. The FPGA implementation is approximately a factor of three slower than the ASIC design, but fast enough to process one page per second. In the current setup, the configuration time is very high. It exceeds the total time allotted for both configuration and processing. However, other methods of configuration seem promising to reduce the time. Evaluation of the performance of the implementation and the reconfiguration time is presented. Methods to improve the performance and reduce the time and area for reconfiguration are discussed
Recommended from our members
The VolumePro Real-Time Ray-Casting System
This paper describes VolumePro, the world’s first single-chip realtime volume rendering system for consumer PCs. VolumePro implements ray-casting with parallel slice-by-slice processing. Our discussion of the architecture focuses mainly on the rendering pipeline and the memory organization. VolumePro has hardware for gradient estimation, classification, and per-sample Phong illumination. The system does not perform any pre-processing and makes parameter adjustments and changes to the volume data immediately visible. We describe several advanced features of VolumePro, such as gradient magnitude modulation of opacity and illumination, supersampling, cropping and cut planes. The system renders 500 million interpolated, Phong illuminated, composited samples per second. This is sufficient to render volumes with up to 16 million voxels (e.g., 2563) at 30 frames per second.Engineering and Applied Science
High-speed Opto-electronic Pre-processing of Polar Mellin Transform for Shift, Scale and Rotation Invariant Image Recognition at Record-Breaking Speeds
Space situational awareness demands efficient monitoring of terrestrial sites
and celestial bodies, necessitating advanced target recognition systems.
Current target recognition systems exhibit limited operational speed due to
challenges in handling substantial image data. While machine learning has
improved this scenario, highresolution images remain a concern. Optical
correlators, relying on analog processes, provide a potential alternative but
are hindered by material limitations. Recent advancements in hybrid
opto-electronic correlators (HOC) have addressed such limitations, additionally
achieving shift, scale, and rotation invariant (SSRI) target recognition
through use of the polar Mellin transform (PMT). However, there are currently
no techniques for obtaining the PMT at speeds fast enough to take advantage of
the inherent speed of the HOC. To that end, we demonstrate an optoelectronic
PMT pre-processor that can operate at record-breaking millisecond frame rates
using commercially available components for use in an automated SSRI HOC image
recognition system for space situational awareness.Comment: Conferenc
A VHDL design for hardware assistance of fractal image compression
Fractal image compression schemes have several unusual and useful attributes, including resolution independence, high compression ratios, good image quality, and rapid decompression. Despite this, one major difficulty has prevented their widespread adoption: the extremely high computational complexity of compression. Fractal image compression algorithms represent an image as a series of contractive transformations, each of which maps a large domain block to a smaller range block. Given only this set of transformations, it is possible to reconstruct an approximation of the original image by iteratively applying the transformations to an arbitrary image. Compression consists of partitioning the image into range blocks and finding a suitable transformation of a domain block to represent each one. This search for transformations must generally be done using a brute force approach, comparing successive domain blocks until a suitable match is found. Some algorithmic improvements have been found, but none are adequate to reduce the required compression time to something reasonable for many uses. This thesis presents a new ASIC design which performs a large number of the required comparisons in parallel, yielding a substantial speedup over a program on a general-purpose computer system. This ASIC is designed in VHDL, which may be synthesized to many different target architectures. The design has considerable flexibility which makes it applicable to different images and applications. The design is based around a pipeline of units that each compare one range block with a series of domain blocks which are fed through the pipeline. Comparisons are made to minimize the mean square error (MSE) of a transform given a linear mapping of the intensity values. This is, by far, the most common minimization strategy used in the literature. The speedup provided by this design is estimated to be about 1,000 times for 256 x 256 images divided into 8x8 blocks over a sequential processor given similar implementation technologies
Recommended from our members
Efficient architectures and power modelling of multiresolution analysis algorithms on FPGA
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.In the past two decades, there has been huge amount of interest in Multiresolution Analysis Algorithms (MAAs) and their applications. Processing some of their applications such as medical imaging are computationally intensive, power hungry and requires large amount of memory which cause a high demand for efficient algorithm implementation, low power architecture and acceleration. Recently, some MAAs such as Finite Ridgelet Transform (FRIT) Haar Wavelet Transform (HWT) are became very popular and they are suitable for a number of image processing applications such as detection of line singularities and contiguous edges, edge detection (useful for compression and feature detection), medical image denoising and segmentation. Efficient hardware implementation and acceleration of these algorithms particularly when addressing large problems are becoming very chal-lenging and consume lot of power which leads to a number of issues including mobility, reliability concerns. To overcome the computation problems, Field Programmable Gate Arrays (FPGAs) are the technology of choice for accelerating computationally intensive applications due to their high performance. Addressing the power issue requires optimi- sation and awareness at all level of abstractions in the design flow.
The most important achievements of the work presented in this thesis are summarised
here.
Two factorisation methodologies for HWT which are called HWT Factorisation Method1 and (HWTFM1) and HWT Factorasation Method2 (HWTFM2) have been explored to increase number of zeros and reduce hardware resources. In addition, two novel efficient and optimised architectures for proposed methodologies based on Distributed Arithmetic (DA) principles have been proposed. The evaluation of the architectural results have shown that the proposed architectures results have reduced the arithmetics calculation (additions/subtractions) by 33% and 25% respectively compared to direct implementa-tion of HWT and outperformed existing results in place. The proposed HWTFM2 is implemented on advanced and low power FPGA devices using Handel-C language. The FPGAs implementation results have outperformed other existing results in terms of area and maximum frequency. In addition, a novel efficient architecture for Finite Radon Trans-form (FRAT) has also been proposed. The proposed architecture is integrated with the developed HWT architecture to build an optimised architecture for FRIT. Strategies such as parallelism and pipelining have been deployed at the architectural level for efficient im-plementation on different FPGA devices. The proposed FRIT architecture performance has been evaluated and the results outperformed some other existing architecture in place. Both FRAT and FRIT architectures have been implemented on FPGAs using Handel-C language. The evaluation of both architectures have shown that the obtained results out-performed existing results in place by almost 10% in terms of frequency and area. The proposed architectures are also applied on image data (256 £ 256) and their Peak Signal to Noise Ratio (PSNR) is evaluated for quality purposes.
Two architectures for cyclic convolution based on systolic array using parallelism and pipelining which can be used as the main building block for the proposed FRIT architec-ture have been proposed. The first proposed architecture is a linear systolic array with pipelining process and the second architecture is a systolic array with parallel process. The second architecture reduces the number of registers by 42% compare to first architec-ture and both architectures outperformed other existing results in place. The proposed pipelined architecture has been implemented on different FPGA devices with vector size (N) 4,8,16,32 and word-length (W=8). The implementation results have shown a signifi-cant improvement and outperformed other existing results in place.
Ultimately, an in-depth evaluation of a high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called func-tional level power modelling approach have been presented. The mathematical techniques that form the basis of the proposed power modeling has been validated by a range of custom IP cores. The proposed power modelling is scalable, platform independent and compares favorably with existing approaches. A hybrid, top-down design flow paradigm integrating functional level power modelling with commercially available design tools for systematic optimisation of IP cores has also been developed. The in-depth evaluation of this tool enables us to observe the behavior of different custom IP cores in terms of power consumption and accuracy using different design methodologies and arithmetic techniques on virous FPGA platforms. Based on the results achieved, the proposed model accuracy is almost 99% true for all IP core's Dynamic Power (DP) components.Thomas Gerald Gray Charitable Trus
Hardware accelerated real-time Linux video anonymizer
Dissertação de mestrado em Engenharia Eletrónica Industrial e ComputadoresOs Sistemas Embebidos estão presentes atualmente numa variada gama de equipamentos do
quotidiano do ser humano. Desde TV-boxes, televisões, routers até ao indispensável telemóvel.
O Sistema Operativo Linux, com a sua filosofia de distribuição ”one-size-fits-all” tornou-se
uma alternativa viável, fornecendo um vasto suporte de hardware, técnicas de depuração, suporte
dos protocolos de comunicação de rede, entre outros serviços, que se tornaram no conjunto
standard de requisitos na maioria dos sistemas embebidos atuais.
Este sistema operativo torna-se apelativo pela sua filosofia open-source que disponibiliza ao
utilizador um vasto conjunto de bibliotecas de software que possibilitam o desenvolvimento num
determinado domínio com maior celeridade e facilidade de integração de software complexo.
Os algoritmos deMachine Learning são desenvolvidos para a automização de tarefas e estão
presentes nas mais variadas tecnologias, desde o sistema de foco de imagem nosmartphone até
ao sistema de deteção dos limites de faixa de rodagem de um sistema de condução autónoma.
Estes são algoritmos que quando compilados para as plataformas de sistemas embebidos,
resultam num esforço de processamento e de consumo de recursos, como o footprint de memória,
que na maior parte dos casos supera em larga escala o conjunto de recursos disponíveis para a
aplicação do sistema, sendo necessária a implementação de componentes que requerem maior
poder de processamento através de elementos de hardware para garantir que as métricas tem porais sejam satisfeitas.
Esta dissertação propõe-se, por isso, à criação de um sistema de anonimização de vídeo
que adquire, processa e manipula as frames, com o intuito de garantir o anonimato, mesmo na
transmissão.
A sua implementação inclui técnicas de Deteção de Objectos, fazendo uso da combinação
das tecnologias de aceleração por hardware: paralelização e execução em hardware especial izado. É proposta então uma implementação restringida tanto temporalmente como no consumo
de recursos ao nível do hardware e software.Embedded Systems are currently present in a wide range of everyday equipment. From TV-boxes,
televisions and routers to the indispensable smartphone.
Linux Operating System, with its ”one-size-fits-all” distribution philosophy, has become a
viable alternative, providing extensive support for hardware, debugging techniques, network com munication protocols, among other functionalities, which have become the standard set of re quirements in most modern embedded systems.
This operating system is appealing due to its open-source philosophy, which provides the
user with a vast set of software libraries that enable development in a given domain with greater
speed and ease the integration of complex software.
Machine Learning algorithms are developed to execute tasks autonomously, i.e., without
human supervision, and are present in the most varied technologies, from the image focus system
on the smartphone to the detection system of the lane limits of an autonomous driving system.
These are algorithms that, when compiled for embedded systems platforms, require an ef fort to process and consume resources, such as the memory footprint, which in most cases far
outweighs the set of resources available for the application of the system, requiring the imple mentation of components that need greater processing power through elements of hardware to
ensure that the time metrics are satisfied.
This dissertation proposes the creation of a video anonymization system that acquires, pro cesses, and manipulates the frames, in order to guarantee anonymity, even during the transmis sion.
Its implementation includes Object Detection techniques, making use of the combination
of hardware acceleration technologies: parallelization and execution in specialized hardware.
An implementation is then proposed, restricted both in time and in resource consumption at
hardware and software levels
- …