14 research outputs found
INTEND OF LUT/MUX COMPLEXS BY USING FPGA MODUS OPERANDI
For reducing the area and improving the performance of logical circuits, a combination of Lookup Table (LUT) with multiplexer methodology is applied together. By implementing this kind of architecture a new MUX: LUT structure is designed, which works based on the number of comparators and logical circuits. This implementation is more suitable for both accounting for complex logic block and routing area while maintaining mapping depth. Interconnections are increasingly the dominant contributor to delay, area and energy consumption in Complementary Metal-Oxide Semiconductor (CMOS) digital circuits. The proposed implementation overcomes several limitations found in previous quaternary implementations published so far, such as the need for special features in the CMOS process or power-hungry current-mode cells. We have to use the 512bit quaternary Lookup Table for a high level of operations in the FPGA. The proposed architecture of this paper will be planned to implemented and also analysis the output current, output voltage, area using Xilinx 14.3
Programmable flexible cores for SoC applications
Tese de mestrado. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200
Interconnect yield analysis and fault tolerance for field programmable gate arrays
Imperial Users onl
Desenvolvimento de um sistema para auxĂlio Ă locomoção de deficientes visuais atravĂ©s da implementação em arquiteturas reconfiguráveis da transformada Census para estimação de distância usando visĂŁo estĂ©reo
Monografia (graduação)—Universidade de BrasĂlia, Faculdade UnB Gama, 2013.Este trabalho propõe um sistema de auxĂlio para deficientes visuais no intuito de aumentar a independĂŞncia e gerar uma melhor qualidade de vida. O sistema está baseado no cálculo de correspondĂŞncias e disparidades entre duas imagens estereoscĂłpicas, para o qual foi usada a transformada Census e o cálculo da distância de Hamming visando estimar a distância frontal atĂ© os obstáculos. O sistema proposto está composto por um par de câmeras e um dispositivo FPGA (Field Programmable Gate Array) que acelera a execução dos algoritmos envolvidos. Uma ferramenta de geração automática de cĂłdigo VHDL foi construĂda no intuito de acelerar o tempo de desenvolvimento da implementação das arquiteturas de hardware para diferentes tamanhos de imagem usando máscaras de 3x3, 5x5, 7x7, 9x9 e 11x11 pixels. Todas as arquiteturas foram sintetizadas e um estudo de escalabilidade em termos de consumo de recursos foi realizado. Os resultados de sĂntese demonstram que as arquiteturas de hardware sĂŁo eficientemente mapeadas em dispositivos FPGA comerciais, alcançando uma frequĂŞncia de operação de 180MHz aproximadamente. Duas memorias ROM, uma para cada imagem, foram instanciadas visando emular o fluxo de pixels das câmeras esquerda e direita, e simulações comportamentais foram realizadas no intuito de verificar o comportamento lĂłgico das arquiteturas. A mesma tĂ©cnica foi implementada em hardware e software e a comparação numĂ©rica entre as implementações demonstram a eficiĂŞncia das arquiteturas propostas, alĂ©m disso, Ă© possĂvel concluir que as arquiteturas propostas apresentam resultados eficientes em termos da qualidade do mapa de disparidade. Um fator de aceleração de 211 vezes foi alcançado para o cálculo do mapa de disparidade se comparado com uma implementação em software usando um Desktop convencional Intel Core i7 operando a 3.4 GHz. É importante ressaltar que a utilização da transformada Census básica acrescenta ruĂdo no processo de cálculo de correspondĂŞncia entre as imagens e que a utilização das transformadas modificadas melhoram a performance dos resultados.This work proposes a system to help visually impaired people in order to improve their independence and quality of life. The system is based on the disparity map computation between two stereoscopic images and uses the Census transform and the Hamming distance for estimating the distance between the system and the obstacles. The proposed system is composed of a stereoscopic system and a FPGA (Field Programmable Gate Array) which accelerates the execution time of the involved algorithms. In this work a VHDL code generator was created for implementing the hardware architectures for different images sizes and mask sizes of 3x3, 5x5, 7x7 and 9x9 pixels. All the hardware architectures were synthesized and a scalability analysis in terms of hardware resources consumption was provided. Synthesis results demonstrates that the hardware architectures are efficiently mapped on commercial FPGA devices, achieving an operational frequency of 180MHz approximately. Two ROM memories, one for each image, were instantiated in order to emulate the stream of pixels from the left and right cameras and behavioural simulations were performed in order to verify the logic implementation of the architectures. Numerical comparisons between hardware and software implementations demonstrated the effectiveness of the proposed architectures. A speed up factor of 211 times was achieved for computing the disparity map between two images if compared with a software implementation using a Desktop solution Intel Core i7 operating at 3.4 GHz
Recommended from our members
Efficient architectures and power modelling of multiresolution analysis algorithms on FPGA
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.In the past two decades, there has been huge amount of interest in Multiresolution Analysis Algorithms (MAAs) and their applications. Processing some of their applications such as medical imaging are computationally intensive, power hungry and requires large amount of memory which cause a high demand for efficient algorithm implementation, low power architecture and acceleration. Recently, some MAAs such as Finite Ridgelet Transform (FRIT) Haar Wavelet Transform (HWT) are became very popular and they are suitable for a number of image processing applications such as detection of line singularities and contiguous edges, edge detection (useful for compression and feature detection), medical image denoising and segmentation. Efficient hardware implementation and acceleration of these algorithms particularly when addressing large problems are becoming very chal-lenging and consume lot of power which leads to a number of issues including mobility, reliability concerns. To overcome the computation problems, Field Programmable Gate Arrays (FPGAs) are the technology of choice for accelerating computationally intensive applications due to their high performance. Addressing the power issue requires optimi- sation and awareness at all level of abstractions in the design flow.
The most important achievements of the work presented in this thesis are summarised
here.
Two factorisation methodologies for HWT which are called HWT Factorisation Method1 and (HWTFM1) and HWT Factorasation Method2 (HWTFM2) have been explored to increase number of zeros and reduce hardware resources. In addition, two novel efficient and optimised architectures for proposed methodologies based on Distributed Arithmetic (DA) principles have been proposed. The evaluation of the architectural results have shown that the proposed architectures results have reduced the arithmetics calculation (additions/subtractions) by 33% and 25% respectively compared to direct implementa-tion of HWT and outperformed existing results in place. The proposed HWTFM2 is implemented on advanced and low power FPGA devices using Handel-C language. The FPGAs implementation results have outperformed other existing results in terms of area and maximum frequency. In addition, a novel efficient architecture for Finite Radon Trans-form (FRAT) has also been proposed. The proposed architecture is integrated with the developed HWT architecture to build an optimised architecture for FRIT. Strategies such as parallelism and pipelining have been deployed at the architectural level for efficient im-plementation on different FPGA devices. The proposed FRIT architecture performance has been evaluated and the results outperformed some other existing architecture in place. Both FRAT and FRIT architectures have been implemented on FPGAs using Handel-C language. The evaluation of both architectures have shown that the obtained results out-performed existing results in place by almost 10% in terms of frequency and area. The proposed architectures are also applied on image data (256 ÂŁ 256) and their Peak Signal to Noise Ratio (PSNR) is evaluated for quality purposes.
Two architectures for cyclic convolution based on systolic array using parallelism and pipelining which can be used as the main building block for the proposed FRIT architec-ture have been proposed. The first proposed architecture is a linear systolic array with pipelining process and the second architecture is a systolic array with parallel process. The second architecture reduces the number of registers by 42% compare to first architec-ture and both architectures outperformed other existing results in place. The proposed pipelined architecture has been implemented on different FPGA devices with vector size (N) 4,8,16,32 and word-length (W=8). The implementation results have shown a signifi-cant improvement and outperformed other existing results in place.
Ultimately, an in-depth evaluation of a high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called func-tional level power modelling approach have been presented. The mathematical techniques that form the basis of the proposed power modeling has been validated by a range of custom IP cores. The proposed power modelling is scalable, platform independent and compares favorably with existing approaches. A hybrid, top-down design flow paradigm integrating functional level power modelling with commercially available design tools for systematic optimisation of IP cores has also been developed. The in-depth evaluation of this tool enables us to observe the behavior of different custom IP cores in terms of power consumption and accuracy using different design methodologies and arithmetic techniques on virous FPGA platforms. Based on the results achieved, the proposed model accuracy is almost 99% true for all IP core's Dynamic Power (DP) components.Thomas Gerald Gray Charitable Trus
Digital Circuit Design Using Floating Gate Transistors
Floating gate (flash) transistors are used exclusively for memory applications today. These applications include SD cards of various form factors, USB flash drives and SSDs. In this thesis, we explore the use of flash transistors to implement digital logic circuits. Since the threshold voltage of flash transistors can be modified at a fine granularity during programming, several advantages are obtained by our flash-based digital circuit design approach. For one, speed binning at the factory can be controlled with precision. Secondly, an IC can be re-programmed in the field, to negate effects such as aging, which has been a significant problem in recent times, particularly for mission-critical applications. Thirdly, unlike a regular MOSFET, which has one threshold voltage level, a flash transistor can have multiple threshold voltage levels. The benefit of having multiple threshold voltage levels in a flash transistor is that it allows the ability to encode more symbols in each device, unlike a regular MOSFET. This allows us to implement multi-valued logic functions natively. In this thesis, we evaluate different flash-based digital circuit design approaches and compare their performance with a traditional CMOS standard cell-based design approach. We begin by evaluating our design approach at the cell level to optimize the design’s delay, power energy and physical area characteristics. The flash-based approach is demonstrated to be better than the CMOS standard cell approach, for these performance metrics. Afterwards, we present the performance of our design approach at the block level. We describe a synthesis flow to decompose a circuit block into a network of interconnected flash-based circuit cells. We also describe techniques to optimize the resulting network of flash-based circuit cells using don’t cares. Our optimization approach distinguishes itself from other optimization techniques that use don’t cares, since it a) targets a flash-based design flow, b) optimizes clusters of logic nodes at once instead of one node at a time, c) attempts to reduce the number of cubes instead of reducing the number of literals in each cube and d) performs optimization on the post-technology mapped netlist which results in a direct improvement in result quality, as compared to pre-technology mapping logic optimization that is typically done in the literature. The resulting network characteristics (delay, power, energy and physical area) are presented. These results are compared with a standard cell-based realization of the same block (obtained using commercial tools) and we demonstrate significant improvements in all the design metrics. We also study flash-based FPGA designs (both static and dynamic), and present the tradeoff of delay, power dissipation and energy consumption of the various designs. Our work differs from previously proposed flash-based FPGAs, since we embed the flash transistors (which store the configuration bits) directly within the logic and interconnect fabrics. We also present a detailed description of how the programming of the configuration bits is accomplished, for all the proposed designs
Digital Circuit Design Using Floating Gate Transistors
Floating gate (flash) transistors are used exclusively for memory applications today. These applications include SD cards of various form factors, USB flash drives and SSDs. In this thesis, we explore the use of flash transistors to implement digital logic circuits. Since the threshold voltage of flash transistors can be modified at a fine granularity during programming, several advantages are obtained by our flash-based digital circuit design approach. For one, speed binning at the factory can be controlled with precision. Secondly, an IC can be re-programmed in the field, to negate effects such as aging, which has been a significant problem in recent times, particularly for mission-critical applications. Thirdly, unlike a regular MOSFET, which has one threshold voltage level, a flash transistor can have multiple threshold voltage levels. The benefit of having multiple threshold voltage levels in a flash transistor is that it allows the ability to encode more symbols in each device, unlike a regular MOSFET. This allows us to implement multi-valued logic functions natively. In this thesis, we evaluate different flash-based digital circuit design approaches and compare their performance with a traditional CMOS standard cell-based design approach. We begin by evaluating our design approach at the cell level to optimize the design’s delay, power energy and physical area characteristics. The flash-based approach is demonstrated to be better than the CMOS standard cell approach, for these performance metrics. Afterwards, we present the performance of our design approach at the block level. We describe a synthesis flow to decompose a circuit block into a network of interconnected flash-based circuit cells. We also describe techniques to optimize the resulting network of flash-based circuit cells using don’t cares. Our optimization approach distinguishes itself from other optimization techniques that use don’t cares, since it a) targets a flash-based design flow, b) optimizes clusters of logic nodes at once instead of one node at a time, c) attempts to reduce the number of cubes instead of reducing the number of literals in each cube and d) performs optimization on the post-technology mapped netlist which results in a direct improvement in result quality, as compared to pre-technology mapping logic optimization that is typically done in the literature. The resulting network characteristics (delay, power, energy and physical area) are presented. These results are compared with a standard cell-based realization of the same block (obtained using commercial tools) and we demonstrate significant improvements in all the design metrics. We also study flash-based FPGA designs (both static and dynamic), and present the tradeoff of delay, power dissipation and energy consumption of the various designs. Our work differs from previously proposed flash-based FPGAs, since we embed the flash transistors (which store the configuration bits) directly within the logic and interconnect fabrics. We also present a detailed description of how the programming of the configuration bits is accomplished, for all the proposed designs
Recommended from our members
Efficient FPGA implementation and power modelling of image and signal processing IP cores
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Field Programmable Gate Arrays (FPGAs) are the technology of choice in a number ofimage
and signal processing application areas such as consumer electronics, instrumentation,
medical data processing and avionics due to their reasonable energy consumption, high performance, security, low design-turnaround time and reconfigurability. Low power FPGA
devices are also emerging as competitive solutions for mobile and thermally constrained platforms. Most computationally intensive image and signal processing algorithms also consume a lot of power leading to a number of issues including reduced mobility, reliability concerns and increased design cost among others. Power dissipation has become one of the most important challenges, particularly for FPGAs. Addressing this problem requires optimisation and awareness at all levels in the design flow. The key achievements of the
work presented in this thesis are summarised here. Behavioural level optimisation strategies have been used for implementing matrix product and inner product through the use of mathematical techniques such as Distributed Arithmetic (DA) and its variations including offset binary coding, sparse factorisation and novel vector level transformations. Applications to test the impact of these algorithmic and arithmetic transformations include the fast Hadamard/Walsh transforms and Gaussian mixture models. Complete design space exploration has been performed on these cores, and where appropriate, they have been shown to clearly outperform comparable existing implementations. At the architectural level, strategies such as parallelism, pipelining and systolisation have been successfully applied for the design and optimisation of a number of
cores including colour space conversion, finite Radon transform, finite ridgelet transform and circular convolution. A pioneering study into the influence of supply voltage scaling for FPGA based designs, used in conjunction with performance enhancing strategies such as parallelism and pipelining has been performed. Initial results are very promising and indicated significant potential for future research in this area.
A key contribution of this work includes the development of a novel high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called Functional Level Power Analysis and Modelling (FLPAM). FLPAM
is scalable, platform independent and compares favourably with existing approaches. A hybrid, top-down design flow paradigm integrating FLPAM with commercially available design tools for systematic optimisation of IP cores has also been developed