21 research outputs found

    HARDWARE ACCELARATED VISUAL TRACKING ALGORITHMS – A Systematic Literature Review

    Get PDF

    Hardware Accelarated Visual Tracking Algorithms. A Systematic Literature Review

    Get PDF
    Many industrial applications need object recognition and tracking capabilities. The algorithms developed for those purposes are computationally expensive. Yet ,real time performance, high accuracy and small power consumption are essential measures of the system. When all these requirements are combined, hardware acceleration of these algorithms becomes a feasible solution. The purpose of this study is to analyze the current state of these hardware acceleration solutions, which algorithms have been implemented in hardware and what modifications have been done in order to adapt these algorithms to hardware.Siirretty Doriast

    A high-performance hardware architecture of an image matching system based on the optimised SIFT algorithm

    Get PDF
    The Scale Invariant Feature Transform (SIFT) is one of the most popular matching algorithms in the field of computer vision. It takes over many other algorithms because features detected are fully invariant to image scaling and rotation, and are also shown to be robust to changes in 3D viewpoint, addition of noise, changes in illumination and a sustainable range of affine distortion. However, the computational complexity is high, which prevents it from achieving real-time. The aim of this project, therefore, is to develop a high-performance image matching system based on the optimised SIFT algorithm to perform real-time feature detection, description and matching. This thesis presents the stages of the development of the system. To reduce the computational complexity, an alternative to the grid layout of standard SIFT is proposed, which is termed as SRI-DASIY (Scale and Rotation Invariant DAISY). The SRI-DAISY achieves comparable performance with the standard SIFT descriptor, but is more efficient to be implemented using hardware, in terms of both computational complexity and memory usage. The design takes only 7.57 µs to generate a descriptor with a system frequency of 100 MHz, which is equivalent to approximately 132,100 descriptors per second and is of the highest throughput when compared with existing designs. Besides, a novel keypoint matching strategy is also presented in this thesis, which achieves higher precision than the widely applied distance ratio based matching and is computationally more efficient. All phases of the SIFT algorithm have been investigated, including feature detection, descriptor generation and descriptor matching. The characterisation of each individual part of the design is carried out and compared with the software simulation results. A fully stand-alone image matching system has been developed that consists of a CMOS camera front-end for image capture, a SIFT processing core embedded in a Field Programmable Logic Array (FPGA) device, and a USB back-end for data transfer. Experiments are conducted by using real-world images to verify the system performance. The system has been tested by integrating into two practical applications. The resulting image matching system eliminates the bottlenecks that limit the overall throughput of the system, and hence allowing the system to process images in real-time without interruption. The design can be modified to adapt to the applications processing images with higher resolution and is still able to achieve real-time

    A high-performance hardware architecture of an image matching system based on the optimised SIFT algorithm

    Get PDF
    The Scale Invariant Feature Transform (SIFT) is one of the most popular matching algorithms in the field of computer vision. It takes over many other algorithms because features detected are fully invariant to image scaling and rotation, and are also shown to be robust to changes in 3D viewpoint, addition of noise, changes in illumination and a sustainable range of affine distortion. However, the computational complexity is high, which prevents it from achieving real-time. The aim of this project, therefore, is to develop a high-performance image matching system based on the optimised SIFT algorithm to perform real-time feature detection, description and matching. This thesis presents the stages of the development of the system. To reduce the computational complexity, an alternative to the grid layout of standard SIFT is proposed, which is termed as SRI-DASIY (Scale and Rotation Invariant DAISY). The SRI-DAISY achieves comparable performance with the standard SIFT descriptor, but is more efficient to be implemented using hardware, in terms of both computational complexity and memory usage. The design takes only 7.57 µs to generate a descriptor with a system frequency of 100 MHz, which is equivalent to approximately 132,100 descriptors per second and is of the highest throughput when compared with existing designs. Besides, a novel keypoint matching strategy is also presented in this thesis, which achieves higher precision than the widely applied distance ratio based matching and is computationally more efficient. All phases of the SIFT algorithm have been investigated, including feature detection, descriptor generation and descriptor matching. The characterisation of each individual part of the design is carried out and compared with the software simulation results. A fully stand-alone image matching system has been developed that consists of a CMOS camera front-end for image capture, a SIFT processing core embedded in a Field Programmable Logic Array (FPGA) device, and a USB back-end for data transfer. Experiments are conducted by using real-world images to verify the system performance. The system has been tested by integrating into two practical applications. The resulting image matching system eliminates the bottlenecks that limit the overall throughput of the system, and hence allowing the system to process images in real-time without interruption. The design can be modified to adapt to the applications processing images with higher resolution and is still able to achieve real-time

    Recognition of objects to grasp and Neuro-Prosthesis control

    Get PDF

    Digital FPGA Circuits Design for Real-Time Video Processing with Reference to Two Application Scenarios

    Get PDF
    In the present days of digital revolution, image and/or video processing has become a ubiquitous task: from mobile devices to special environments, the need for a real-time approach is everyday more and more evident. Whatever the reason, either for user experience in recreational or internet-based applications or for safety related timeliness in hard-real-time scenarios, the exploration of technologies and techniques which allow for this requirement to be satisfied is a crucial point. General purpose CPU or GPU software implementations of these applications are quite simple and widespread, but commonly do not allow high performance because of the high layering that separates high level languages and libraries, which enforce complicated procedures and algorithms, from the base architecture of the CPUs that offers only limited and basic (although rapidly executed) arithmetic operations. The most practised approach nowadays is based on the use of Very-Large-Scale Integrated (VLSI) digital electronic circuits. Field Programmable Gate Arrays (FPGAs) are integrated digital circuits designed to be configured after manufacturing, "on the field". They typically provide lower performance levels when compared to Application Specific Integrated Circuits (ASICs), but at a lower cost, especially when dealing with limited production volumes. Of course, on-the-field programmability itself (and re-programmability, in the vast majority of cases) is also a characteristic feature that makes FPGA more suitable for applications with changing specifications where an update of capabilities may be a desirable benefit. Moreover, the time needed to fulfill the design cycle for FPGA-based circuits (including of course testing and debug speed) is much reduced when compared to the design flow and time-to-market of ASICs. In this thesis work, we will see (Chapter 1) some common problems and strategies involved with the use of FPGAs and FPGA-based systems for Real Time Image Processing and Real Time Video Processing (in the following alsoindicated interchangeably with the acronym RTVP); we will then focus, in particular, on two applications. Firstly, Chapter 2 will cover the implementation of a novel algorithm for Visual Search, known as CDVS, which has been recently standardised as part of the MPEG-7 standard. Visual search is an emerging field in mobile applications which is rapidly becoming ubiquitous. However, typically, algorithms for this kind of applications are connected with a high leverage on computational power and complex elaborations: as a consequence, implementation efficiency is a crucial point, and this generally results in the need for custom designed hardware. Chapter 3 will cover the implementation of an algorithm for the compression of hyperspectral images which is bit-true compatible with the CCSDS-123.0 standard algorithm. Hyperspectral images are three dimensional matrices in which each 2D plane represents the image, as captured by the sensor, in a given spectral band: their size may range from several millions of pixels up to billions of pixels. Typical scenarios of use of hyperspectral images include airborne and satellite-borne remote sensing. As a consequence, major concerns are the limitedness of both processing power and communication links bandwidth: thus, a proper compression algorithm, as well as the efficiency of its implementation, is crucial. In both cases we will first of all examine the scope of the work with reference to current state-of-the-art. We will then see the proposed implementations in their main characteristics and, to conclude, we will consider the primary experimental results

    Kodizajn arhitekture i algoritama za lokalizacijumobilnih robota i detekciju prepreka baziranih namodelu

    No full text
    This thesis proposes SoPC (System on a Programmable Chip) architectures for efficient embedding of vison-based localization and obstacle detection tasks in a navigational pipeline on autonomous mobile robots. The obtained results are equivalent or better in comparison to state-ofthe- art. For localization, an efficient hardware architecture that supports EKF-SLAM's local map management with seven-dimensional landmarks in real time is developed. For obstacle detection a novel method of object recognition is proposed - detection by identification framework based on single detection window scale. This framework allows adequate algorithmic precision and execution speeds on embedded hardware platforms.Ova teza bavi se dizajnom SoPC (engl. System on a Programmable Chip) arhitektura i algoritama za efikasnu implementaciju zadataka lokalizacije i detekcije prepreka baziranih na viziji u kontekstu autonomne robotske navigacije. Za lokalizaciju, razvijena je efikasna računarska arhitektura za EKF-SLAM algoritam, koja podržava skladištenje i obradu sedmodimenzionalnih orijentira lokalne mape u realnom vremenu. Za detekciju prepreka je predložena nova metoda prepoznavanja objekata u slici putem prozora detekcije fiksne dimenzije, koja omogućava veću brzinu izvršavanja algoritma detekcije na namenskim računarskim platformama

    High-performance hardware accelerators for image processing in space applications

    Get PDF
    Mars is a hard place to reach. While there have been many notable success stories in getting probes to the Red Planet, the historical record is full of bad news. The success rate for actually landing on the Martian surface is even worse, roughly 30%. This low success rate must be mainly credited to the Mars environment characteristics. In the Mars atmosphere strong winds frequently breath. This phenomena usually modifies the lander descending trajectory diverging it from the target one. Moreover, the Mars surface is not the best place where performing a safe land. It is pitched by many and close craters and huge stones, and characterized by huge mountains and hills (e.g., Olympus Mons is 648 km in diameter and 27 km tall). For these reasons a mission failure due to a landing in huge craters, on big stones or on part of the surface characterized by a high slope is highly probable. In the last years, all space agencies have increased their research efforts in order to enhance the success rate of Mars missions. In particular, the two hottest research topics are: the active debris removal and the guided landing on Mars. The former aims at finding new methods to remove space debris exploiting unmanned spacecrafts. These must be able to autonomously: detect a debris, analyses it, in order to extract its characteristics in terms of weight, speed and dimension, and, eventually, rendezvous with it. In order to perform these tasks, the spacecraft must have high vision capabilities. In other words, it must be able to take pictures and process them with very complex image processing algorithms in order to detect, track and analyse the debris. The latter aims at increasing the landing point precision (i.e., landing ellipse) on Mars. Future space-missions will increasingly adopt Video Based Navigation systems to assist the entry, descent and landing (EDL) phase of space modules (e.g., spacecrafts), enhancing the precision of automatic EDL navigation systems. For instance, recent space exploration missions, e.g., Spirity, Oppurtunity, and Curiosity, made use of an EDL procedure aiming at following a fixed and precomputed descending trajectory to reach a precise landing point. This approach guarantees a maximum landing point precision of 20 km. By comparing this data with the Mars environment characteristics, it is possible to understand how the mission failure probability still remains really high. A very challenging problem is to design an autonomous-guided EDL system able to even more reduce the landing ellipse, guaranteeing to avoid the landing in dangerous area of Mars surface (e.g., huge craters or big stones) that could lead to the mission failure. The autonomous behaviour of the system is mandatory since a manual driven approach is not feasible due to the distance between Earth and Mars. Since this distance varies from 56 to 100 million of km approximately due to the orbit eccentricity, even if a signal transmission at the light speed could be possible, in the best case the transmission time would be around 31 minutes, exceeding so the overall duration of the EDL phase. In both applications, algorithms must guarantee self-adaptability to the environmental conditions. Since the Mars (and in general the space) harsh conditions are difficult to be predicted at design time, these algorithms must be able to automatically tune the internal parameters depending on the current conditions. Moreover, real-time performances are another key factor. Since a software implementation of these computational intensive tasks cannot reach the required performances, these algorithms must be accelerated via hardware. For this reasons, this thesis presents my research work done on advanced image processing algorithms for space applications and the associated hardware accelerators. My research activity has been focused on both the algorithm and their hardware implementations. Concerning the first aspect, I mainly focused my research effort to integrate self-adaptability features in the existing algorithms. While concerning the second, I studied and validated a methodology to efficiently develop, verify and validate hardware components aimed at accelerating video-based applications. This approach allowed me to develop and test high performance hardware accelerators that strongly overcome the performances of the actual state-of-the-art implementations. The thesis is organized in four main chapters. Chapter 2 starts with a brief introduction about the story of digital image processing. The main content of this chapter is the description of space missions in which digital image processing has a key role. A major effort has been spent on the missions in which my research activity has a substantial impact. In particular, for these missions, this chapter deeply analizes and evaluates the state-of-the-art approaches and algorithms. Chapter 3 analyzes and compares the two technologies used to implement high performances hardware accelerators, i.e., Application Specific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs). Thanks to this information the reader may understand the main reasons behind the decision of space agencies to exploit FPGAs instead of ASICs for high-performance hardware accelerators in space missions, even if FPGAs are more sensible to Single Event Upsets (i.e., transient error induced on hardware component by alpha particles and solar radiation in space). Moreover, this chapter deeply describes the three available space-grade FPGA technologies (i.e., One-time Programmable, Flash-based, and SRAM-based), and the main fault-mitigation techniques against SEUs that are mandatory for employing space-grade FPGAs in actual missions. Chapter 4 describes one of the main contribution of my research work: a library of high-performance hardware accelerators for image processing in space applications. The basic idea behind this library is to offer to designers a set of validated hardware components able to strongly speed up the basic image processing operations commonly used in an image processing chain. In other words, these components can be directly used as elementary building blocks to easily create a complex image processing system, without wasting time in the debug and validation phase. This library groups the proposed hardware accelerators in IP-core families. The components contained in a same family share the same provided functionality and input/output interface. This harmonization in the I/O interface enables to substitute, inside a complex image processing system, components of the same family without requiring modifications to the system communication infrastructure. In addition to the analysis of the internal architecture of the proposed components, another important aspect of this chapter is the methodology used to develop, verify and validate the proposed high performance image processing hardware accelerators. This methodology involves the usage of different programming and hardware description languages in order to support the designer from the algorithm modelling up to the hardware implementation and validation. Chapter 5 presents the proposed complex image processing systems. In particular, it exploits a set of actual case studies, associated with the most recent space agency needs, to show how the hardware accelerator components can be assembled to build a complex image processing system. In addition to the hardware accelerators contained in the library, the described complex system embeds innovative ad-hoc hardware components and software routines able to provide high performance and self-adaptable image processing functionalities. To prove the benefits of the proposed methodology, each case study is concluded with a comparison with the current state-of-the-art implementations, highlighting the benefits in terms of performances and self-adaptability to the environmental conditions

    A Dataflow Framework For Developing Flexible Embedded Accelerators A Computer Vision Case Study.

    Get PDF
    The focus of this dissertation is the design and the implementation of a computing platform which can accelerate data processing in the embedded computation domain. We focus on a heterogeneous computing platform, whose hardware implementation can approach the power and area efficiency of specialized designs, while remaining flexible across the application domain. The multi-core architectures require parallel programming, which is widely-regarded as more challenging than sequential programming. Although shared memory parallel programs may be fairly easy to write (using OpenMP, for example), they are quite hard to optimize; providing embedded application developers with optimizing tools and programming frameworks is a challenge. The heterogeneous specialized elements make the problem even more difficult. Dataflow is a parallel computation model that relies exclusively on message passing, and that has some advantages over parallel programming tools in wide use today: simplicity, graphical representation, and determinism. Dataflow model is also a good match to streaming applications, such as audio, video and image processing, which operate on large sequences of data and are characterized by abundant parallelism and regular memory access patterns. Dataflow model of computation has gained acceptance in simulation and signal-processing communities. This thesis evaluates the applicability of the dataflow model for implementing domain-specific embedded accelerators for streaming applications
    corecore