6 research outputs found

    Implementation of Separable & Steerable Gaussian Smoothers on an FPGA

    Get PDF
    Smoothing filters have been extensively used for noise removal and image restoration. Directional filters are widely used in computer vision and image processing tasks such as motion analysis, edge detection, line parameter estimation and texture analysis. It is practically impossible to tune the filters to all possible positions and orientations in real time due to huge computation requirement. The efficient way is to design a few basis filters, and express the output of a directional filter as a weighted sum of the basis filter outputs. Directional filters having these properties are called Steerable Filters. This thesis work emphasis is on the implementation of proposed computationally efficient separable and steerable Gaussian smoothers on a Xilinx VirtexII Pro FPGA platform. FPGAs are Field Programmable Gate Arrays which consist of a collection of logic blocks including lookup tables, flip flops and some amount of Random Access Memory. All blocks are wired together using an array of interconnects. The proposed technique [2] is implemented on a FPGA hardware taking the advantage of parallelism and pipelining

    Efficient FPGA Architectures for Separable Filters and Logarithmic Multipliers and Automation of Fish Feature Extraction Using Gabor Filters

    Get PDF
    Convolution and multiplication operations in the filtering process can be optimized by minimizing the resource utilization using Field Programmable Gate Arrays (FPGA) and separable filter kernels. An FPGA architecture for separable convolution is proposed to achieve reduction of on-chip resource utilization and external memory bandwidth for a given processing rate of the convolution unit. Multiplication in integer number system can be optimized in terms of resources, operation time and power consumption by converting to logarithmic domain. To achieve this, a method altering the filter weights is proposed and implemented for error reduction. The results obtained depict significant error reduction when compared to existing methods, thereby optimizing the multiplication in terms of the above mentioned metrics. Underwater video and still images are used by many programs within National Oceanic Atmospheric and Administration (NOAA) fisheries with the objective of identifying, classifying and quantifying living marine resources. They use underwater cameras to get video recording data for manual analysis. This process of manual analysis is labour intensive, time consuming and error prone. An efficient solution for this problem is proposed which uses Gabor filters for feature extraction. The proposed method is implemented to identify two species of fish namely Epinephelus morio and Ocyurus chrysurus. The results show higher rate of detection with minimal rate of false alarms

    FPGA urban traffic control simulation and evaluation platform

    Get PDF
    Mestrado em Engenharia Electrónica e TelecomunicaçõesThe study and development towards Urban Traffic Management and Control (UTMC) Systems have not solely or recently gained extreme importance only due to obvious issues such as traffic safety improvement, traffic congestion control and avoidance but also due to other underlying factors such as urban transportation efficiency, urban traffic originated air pollution and future concepts as are autonomous vehicle systems, which are presently taking shape. Generally speaking urban traffic simulations occur in a software environment, which comes to hinder the progress taken towards the actual implementation of UTMC systems. The reason to why such happens is based on the fact that urban traffic controllers are usually implemented and executed on hardware platforms, therefore software based models don‟t support an actual implementation directly. In this study we explore a novel approach to urban traffic simulation, aimed to eliminate the timeframe and work-distance between the UTMC system‟s design and an eventual implementation, where a Field Programmable Gate Array (FPGA) is used to execute a simulation model of an urban traffic network. Since the resource to FPGAs implies a hardware based execution, the resulting implementation of each traffic management and control element can be considered not only as having a close matched behavior to a real world implementation but also as an actual prototype. From the simulation viewpoint the use of FPGA‟s holds the prospect of being able to hold execution speeds many times faster than software based simulations as FPGA designs are able to execute a large number of parallel processes. This study shows that an Urban Traffic Control Simulation and Test Platform is possible by implementing a relatively simple urban network model in a low end FPGA. This result implies that with further time and resource investments a rather complex system can be developed which can handle large scale and complex UTMC systems with the promise of shortening the work distance between the concept and a real world running implementation

    Application-Specific Number Representation

    No full text
    Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application- specific number representations. Well-known number formats include fixed-point, floating- point, logarithmic number system (LNS), and residue number system (RNS). Such different number representations lead to different arithmetic designs and error behaviours, thus produc- ing implementations with different performance, accuracy, and cost. To investigate the design options in number representations, the first part of this thesis presents a platform that enables automated exploration of the number representation design space. The second part of the thesis shows case studies that optimise the designs for area, latency or throughput from the perspective of number representations. Automated design space exploration in the first part addresses the following two major issues: ² Automation requires arithmetic unit generation. This thesis provides optimised arithmetic library generators for logarithmic and residue arithmetic units, which support a wide range of bit widths and achieve significant improvement over previous designs. ² Generation of arithmetic units requires specifying the bit widths for each variable. This thesis describes an automatic bit-width optimisation tool called R-Tool, which combines dynamic and static analysis methods, and supports different number systems (fixed-point, floating-point, and LNS numbers). Putting it all together, the second part explores the effects of application-specific number representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic imaging computations. Experimental results show that customising the number representations brings benefits to hardware implementations: by selecting a more appropriate number format, we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%. On the performance side, hardware implementations with customised number formats achieve 5 to potentially over 40 times speedup over software implementations

    Accelerating the execution of time consuming software applications by configuring special hardware during the program execution on multiprocessor computers

    Get PDF
    За разлику од рачунара који се заснивају на контроли тока (енг. control-flow), чији су процесори способни за обављање свих инструкција дефинисаних архитектуром рачунара, а од којих сваки у једном тренутку обавља највише неколико инструкција, код рачунара заснованих на протоку података се хардвер конфигурише тако да се просторно распореде компоненте од којих је свака у стању да изврши само инструкцију за коју је предвиђена. Извршавање се своди на проток података кроз такав хардвер. Главне одлике овакве архитектуре рачунара су већа проточност података и смањена потрошња електричне енергије. Иако хардверске архитектуре рачунара засноване на протоку података постоје деценијама, технологија је тек недавно омогућила њихово равноправно коришћење са рачунарима заснованим на контроли тока, чиме проблем распоређивања послова између хардвера заснованог на протоку података и конвенционалних процесора све више добија на значају. Неке од временски захтевних апликација већи део времена извршавања проводе у цикличном понављању истих операција. Уколико су те итерације међусобно независне, или се могу довести у такав облик, онда је њихово извршавање погодно обавити употребом реконфигурабилног хардвера и парадигме засноване на протоку података. Ова теза описује постојеће метеде и предлаже нове за прављање распореда извршавања послова на оваквим архитектурама рачунара у циљу побољшања перформанси, при чему су само неке од апликација погодне за убрзавање коришћењем реконфигурабилног хардвера и парадигме засноване на протоку података. Предлажу се и временско и просторно дељење реконфигурабилног хардвера од стране конвенционалних процесора...In contrast to control-flow computer architectures, whose processors are capable of executing all instructions defined by the architecture, while each processor executes only up to few instructions simultaneously, hardware dataflow architectures are based on configuring hardware by spreading components capable of executing one instruction each over the surface. Computation is based on dataflow through the hardware. Main characteristics of this architecture are higher data throughput and reduced power consumption. Some of the computation demanding applications spend most of the execution time in iterating over the same set of instructions. Although hardware dataflow architectures exist for decades, due to the technology limitations, they have became valuable for executing such applications only recently. Therefore, the problem of scheduling jobs on dataflow hardware and conventional processors becomes increasingly important. Some of the computation demanding applications spend most of the execution time in executing for loops. If iterations are mutually independent, or if they can be transformed in such a form, then these applications are suitable for executing on dataflow hardware. This thesis presents available methods for creating schedules for this kind of architectures in order to reduce total execution times, and proposes new ones. Sharing the dataflow hardware in both time and space is proposed. Scheduling jobs on this architecture belongs to the NP problem class and scheduling time is considered as an overhead, so the algorithms use heuristics and search possible combinations of jobs only up to appropriate depth. Results confirm that this architecture can reduce total execution time and reveal the conditions under which the acceleration is possible..
    corecore