38 research outputs found

    Real time architectures for the scale Invariant feature transform algorithm

    Get PDF
    Feature extraction in digital image processing is a very intensive task for a CPU. In order to achieve real time image throughputs, hardware parallelism must be exploited. The speed-up of the system is constrained by the degree of parallelism of the implementation and this one at the same time, by programmable device size and the power dissipation. In this work, issues related to the synthesis of the Scale-Invariant Feature Transform (SIFT) algorithm on a FPGA to obtain target processing rates faster than 50 frames per second for VGA images, are analyzed. In order to increase the speedup of the algorithm, the work includes the analysis of feasible simplifications of the algorithm for a tracking application and the results are synthesized on an FPGA.This work has been partially funded by Spanish government projects TEC2015-66878-C3-2-R (MINECO/FEDER, UE) and TEC2015- 66878-C3-3-R (MINECO/FEDER, UE)

    FPGA synthesis of an stereo image matching architecture for autonomous mobile robots

    Get PDF
    This paper describes a hardware proposal to speed up the process of image matching in stereo vision systems like those employed by autonomous mobile robots. This proposal combines a classical window-based matching approach with a previous stage, where key points are selected from each image of the stereo pair. In this first step the key point extraction method is based on the SIFT algorithm. Thus, in the second step, the window-based matching is only applied to the set of selected key points, instead of to the whole images. For images with a 1% of key points, this method speeds up the matching four orders of magnitude. This proposal is, on the one hand, a better parallelizable architecture than the original SIFT, and on the other, a faster technique than a full image windows matching approach. The architecture has been implemented on a lower power Virtex 6 FPGA and it achieves a image matching speed above 30 fps.This work has been funded by Spanish government project TEC2015-66878-C3-2-R (MINECO/FEDER, UE)

    CMOS-3D smart imager architectures for feature detection

    Get PDF
    This paper reports a multi-layered smart image sensor architecture for feature extraction based on detection of interest points. The architecture is conceived for 3-D integrated circuit technologies consisting of two layers (tiers) plus memory. The top tier includes sensing and processing circuitry aimed to perform Gaussian filtering and generate Gaussian pyramids in fully concurrent way. The circuitry in this tier operates in mixed-signal domain. It embeds in-pixel correlated double sampling, a switched-capacitor network for Gaussian pyramid generation, analog memories and a comparator for in-pixel analog-to-digital conversion. This tier can be further split into two for improved resolution; one containing the sensors and another containing a capacitor per sensor plus the mixed-signal processing circuitry. Regarding the bottom tier, it embeds digital circuitry entitled for the calculation of Harris, Hessian, and difference-of-Gaussian detectors. The overall system can hence be configured by the user to detect interest points by using the algorithm out of these three better suited to practical applications. The paper describes the different kind of algorithms featured and the circuitry employed at top and bottom tiers. The Gaussian pyramid is implemented with a switched-capacitor network in less than 50 μs, outperforming more conventional solutions.Xunta de Galicia 10PXIB206037PRMinisterio de Ciencia e Innovación TEC2009-12686, IPT-2011-1625-430000Office of Naval Research N00014111031

    Image Feature Extraction Acceleration

    Get PDF
    Image feature extraction is instrumental for most of the best-performing algorithms in computer vision. However, it is also expensive in terms of computational and memory resources for embedded systems due to the need of dealing with individual pixels at the earliest processing levels. In this regard, conventional system architectures do not take advantage of potential exploitation of parallelism and distributed memory from the very beginning of the processing chain. Raw pixel values provided by the front-end image sensor are squeezed into a high-speed interface with the rest of system components. Only then, after deserializing this massive dataflow, parallelism, if any, is exploited. This chapter introduces a rather different approach from an architectural point of view. We present two Application-Specific Integrated Circuits (ASICs) where the 2-D array of photo-sensitive devices featured by regular imagers is combined with distributed memory supporting concurrent processing. Custom circuitry is added per pixel in order to accelerate image feature extraction right at the focal plane. Specifically, the proposed sensing-processing chips aim at the acceleration of two flagships algorithms within the computer vision community: the Viola-Jones face detection algorithm and the Scale Invariant Feature Transform (SIFT). Experimental results prove the feasibility and benefits of this architectural solution.Ministerio de Economía y Competitividad TEC2012-38921-C02, IPT-2011- 1625-430000, IPC-20111009Junta de Andalucía TIC 2338-2013Xunta de Galicia EM2013/038Office of NavalResearch (USA) N00014141035

    A Novel Hardware Architecture for Real Time Extraction of Local Features

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 8. 이혁재.컴퓨팅 성능의 비약적인 발전과 보급은 컴퓨터 기술의 적용 분야를 데스크탑에서 스마트폰, 스마트 TV, 승용차 등에 이르기까지 폭넓은 범위로 넓히는 결과를 야기했다. 변화된 환경에서 대중은 기존에 없었던 좀 더 혁신적인 기능을 받아들일 준비가 되었고, 이에 부합하기 위해 Computer vision 기술은 점차 상용화의 길을 걷게 되었다. 물체 인식 및 추적, 3D reconstruction 등 폭넓게 응용될 수 있는 computer vision 기술들은 서로 다른 영상 사이에서 동일한 pixel을 찾는 image matching 기술을 필요로 한다. 관련 연구들 중 영상의 크기가 변하거나 회전하여도 안정적으로 matching이 가능한 Scale- Invariant Feature Transform (SIFT) 알고리즘이 제안되었고, 이후 카메라의 viewpoint 변화에도 강인한 Affine Invariant Extension of SIFT (ASIFT) 알고리즘이 제안되었다. SIFT 및 ASIFT 알고리즘은 image matching의 안정성이 높은 반면 많은 연산량을 요구한다. 이를 실시간 처리하기 위해 specifically designed hardware을 이용한 연산 가속 연구가 진행되고 있다. 본 논문에서는 실시간 (30frames/sec)으로 동작 가능한 ASIFT 하드웨어 구조를 제안한다. 이를 위해 첫번째로 SIFT feature를 실시간으로 연산 할 수 있는 SIFT 하드웨어 구조를 제안한다. SIFT 알고리즘은 널리 사용되는 만큼 많은 수의 가속 하드웨어 구조가 연구되었다. 대부분의 기존 SIFT 하드웨어는 실시간성을 충족시키지만, 이를 위해 과도하게 많은 내부 메모리를 사용하여 하드웨어 비용을 크게 증가시켰다. 이 이슈 사항으로 인해 내부 메모리와 외부 메모리를 혼용하는 새로운 SIFT 하드웨어 구조가 제안되었다. 이 경우 빈번한 외부 메모리 사용은 외부 메모리 latency로 인한 동작 속도 저하 문제를 일으킨다. 본 논문은 이 문제를 해결하기 위해 외부 메모리에서 읽어온 데이터 재사용 방안과, 외부 메모리에 저장하는 데이터에 대한 down-sampling 및 less significant bits 제거를 통한 데이터량 감소 방안을 제안한다. 제안하는 SIFT 하드웨어는 Gaussian image를 외부 메모리에 저장하며, 이 경우 descriptor 생성을 위해 local-patch를 읽어오는데 많은 외부 메모리 접근이 발생한다. 이를 저감하기 위해, 서로 다른 local-patch 상의 중복 데이터를 재사용하는 방안과 이를 위한 하드웨어 구조를 제시한다. 또한 Gaussian image의 데이터량 자체를 줄이기 위해 down-sampling 및 less significant bits 제거 방안을 이용한다. 이때 SIFT 알고리즘의 정확도 감소를 최소화하였다. 결과적으로 본 논문은 기존 state-of-the-art SIFT 하드웨어의 10.93% 크기의 내부 메모리만 사용하며, 3300개의 key-point에 대해 30 frames/sec (fps)의 속도로 동작 가능하다. ASIFT 알고리즘 연산을 고속으로 수행하기 위해서는 SIFT 하드웨어에 affine transform된 영상을 제공하는 affine transform 하드웨어가 delay 없이 데이터를 제공할 수 있어야 한다. 하지만 일반적인 affine transform 연산 방식을 이용할 경우 affine transform 하드웨어는 외부 메모리에서 원본 영상을 읽어 올 때 불연속적인 주소로 접근하게 된다. 이는 외부 메모리 latency를 발생시키며 affine transform module이 충분한 데이터를 SIFT 하드웨어에 공급해주지 못하는 문제를 야기한다. 이 문제를 해결하기 위하여 본 논문은 SIFT feature의 rotation-invariant한 특성을 이용하여, affine transform 연산 방식을 변경하였다. 이 방식은 ASIFT 알고리즘이 취하는 모든 affine transform 연산을 수행할 때 연속된 외부 메모리 주소로 입력 영상을 접근할 수 있게 해준다. 이를 통해 불필요한 외부 메모리 latency가 크게 감소된다. 제안된 affine transform 연산 방식은 원본 영상을 scaling한 뒤 skewing하는 연산 과정을 거친다. 본 논문은 이 과정에서 scaling된 영상 데이터를 서로 다른 affine transform 연산에 재활용하는 방법을 제안한다. 이는 scaling 연산량을 감소시킬 뿐 만 아니라 외부 메모리 접근량도 감소시킨다. 제안된 방안들로 인한 affine transform 하드웨어의 속도 향상은 SIFT 하드웨어에 대기 없이 데이터를 공급할 수 있게 해주고, 최종적으로 utilization 향상을 통한 ASIFT 하드웨어의 동작 속도 향상에 기여한다. 결과적으로 본 논문에서 제안하는 ASIFT 하드웨어는 높은 utilization으로 동작이 가능하며, 이로 인해 2,500개의 key- point가 검출되는 영상에 대하여 30fps의 동작 속도로 ASIFT 알고리즘을 수행할 수 있다.제 1 장 서론 1 1.1 연구 배경 1 1.2 연구 내용 4 1.3 논문 구성 6 제 2 장 이전 연구 소개 및 문제 제시 7 2.1 SIFT 알고리즘 및 연산 가속화 기술 7 2.1.1 Scale-Invariant Feture Transform (SIFT) 7 2.1.2 기존 SIFT 연산 가속화 연구 및 문제점 16 2.2 ASIFT 알고리즘 및 연산 가속화 기술 19 2.2.1 Scale-Invariant Feture Transform (SIFT) 19 2.2.2 기존 SIFT 연산 가속화 연구 23 2.3 실시간 ASIFT 하드웨어 구현을 위한 연구 방향 24 제 3 장 외부 메모리 bandwidth 저감된 SIFT 하드웨어 구조 26 3.1 외부 메모리에 저장될 SIFT 연산의 중간 데이터 고찰 26 3.2 외부 메모리 bandwidth를 줄이기 위한 방안 31 3.2.1 Local-patch 재사용 방안 31 3.2.2 Local-patch down sampling 방안 44 3.2.3 Gaussian image의 less significant bit 제거 47 3.2.4 Bandwidth 최적화 방안이 적용된 SIFT 하드웨어 구조 50 3.3 SIFT 하드웨어에 대한 실험 결과 55 3.3.1 SIFT 하드웨어의 스펙 55 3.3.2 외부 메모리 bandwidth 요구량 분석 57 3.3.3 동작 속도 60 3.3.4 Feature matching 정확도 64 제 4 장 ASIFT 하드웨어 구조 68 4.1 ASIFT 하드웨어에 적합한 affine transform 방식 68 4.1.1 새로운 affine transform 방식 68 4.1.2 내부 image buffer의 메모리 공간 최적화 74 4.2 ASIFT 하드웨어의 구조 78 4.2.1 기본 하드웨어 구조 및 scaling 연산 재사용 78 4.2.2 Affine transform parameter의 구성 81 4.2.3 ASIFT 하드웨어 구조 설명 85 4.3 ASIFT 하드웨어에 대한 실험 결과 89 4.3.1 새 affine transform 방식에 의한 메모리 latency 감소 89 4.3.2 Affine transform module의 출력 bandwidth 향상 91 4.3.3 ASIFT 하드웨어의 스펙과 동작 속도 93 4.3.4 Feature matching 정확도 95 제 5 장 결론 104 참고 문헌 106 Abstract 109Docto

    A high-performance hardware architecture of an image matching system based on the optimised SIFT algorithm

    Get PDF
    The Scale Invariant Feature Transform (SIFT) is one of the most popular matching algorithms in the field of computer vision. It takes over many other algorithms because features detected are fully invariant to image scaling and rotation, and are also shown to be robust to changes in 3D viewpoint, addition of noise, changes in illumination and a sustainable range of affine distortion. However, the computational complexity is high, which prevents it from achieving real-time. The aim of this project, therefore, is to develop a high-performance image matching system based on the optimised SIFT algorithm to perform real-time feature detection, description and matching. This thesis presents the stages of the development of the system. To reduce the computational complexity, an alternative to the grid layout of standard SIFT is proposed, which is termed as SRI-DASIY (Scale and Rotation Invariant DAISY). The SRI-DAISY achieves comparable performance with the standard SIFT descriptor, but is more efficient to be implemented using hardware, in terms of both computational complexity and memory usage. The design takes only 7.57 µs to generate a descriptor with a system frequency of 100 MHz, which is equivalent to approximately 132,100 descriptors per second and is of the highest throughput when compared with existing designs. Besides, a novel keypoint matching strategy is also presented in this thesis, which achieves higher precision than the widely applied distance ratio based matching and is computationally more efficient. All phases of the SIFT algorithm have been investigated, including feature detection, descriptor generation and descriptor matching. The characterisation of each individual part of the design is carried out and compared with the software simulation results. A fully stand-alone image matching system has been developed that consists of a CMOS camera front-end for image capture, a SIFT processing core embedded in a Field Programmable Logic Array (FPGA) device, and a USB back-end for data transfer. Experiments are conducted by using real-world images to verify the system performance. The system has been tested by integrating into two practical applications. The resulting image matching system eliminates the bottlenecks that limit the overall throughput of the system, and hence allowing the system to process images in real-time without interruption. The design can be modified to adapt to the applications processing images with higher resolution and is still able to achieve real-time

    A high-performance hardware architecture of an image matching system based on the optimised SIFT algorithm

    Get PDF
    The Scale Invariant Feature Transform (SIFT) is one of the most popular matching algorithms in the field of computer vision. It takes over many other algorithms because features detected are fully invariant to image scaling and rotation, and are also shown to be robust to changes in 3D viewpoint, addition of noise, changes in illumination and a sustainable range of affine distortion. However, the computational complexity is high, which prevents it from achieving real-time. The aim of this project, therefore, is to develop a high-performance image matching system based on the optimised SIFT algorithm to perform real-time feature detection, description and matching. This thesis presents the stages of the development of the system. To reduce the computational complexity, an alternative to the grid layout of standard SIFT is proposed, which is termed as SRI-DASIY (Scale and Rotation Invariant DAISY). The SRI-DAISY achieves comparable performance with the standard SIFT descriptor, but is more efficient to be implemented using hardware, in terms of both computational complexity and memory usage. The design takes only 7.57 µs to generate a descriptor with a system frequency of 100 MHz, which is equivalent to approximately 132,100 descriptors per second and is of the highest throughput when compared with existing designs. Besides, a novel keypoint matching strategy is also presented in this thesis, which achieves higher precision than the widely applied distance ratio based matching and is computationally more efficient. All phases of the SIFT algorithm have been investigated, including feature detection, descriptor generation and descriptor matching. The characterisation of each individual part of the design is carried out and compared with the software simulation results. A fully stand-alone image matching system has been developed that consists of a CMOS camera front-end for image capture, a SIFT processing core embedded in a Field Programmable Logic Array (FPGA) device, and a USB back-end for data transfer. Experiments are conducted by using real-world images to verify the system performance. The system has been tested by integrating into two practical applications. The resulting image matching system eliminates the bottlenecks that limit the overall throughput of the system, and hence allowing the system to process images in real-time without interruption. The design can be modified to adapt to the applications processing images with higher resolution and is still able to achieve real-time

    Digital FPGA Circuits Design for Real-Time Video Processing with Reference to Two Application Scenarios

    Get PDF
    In the present days of digital revolution, image and/or video processing has become a ubiquitous task: from mobile devices to special environments, the need for a real-time approach is everyday more and more evident. Whatever the reason, either for user experience in recreational or internet-based applications or for safety related timeliness in hard-real-time scenarios, the exploration of technologies and techniques which allow for this requirement to be satisfied is a crucial point. General purpose CPU or GPU software implementations of these applications are quite simple and widespread, but commonly do not allow high performance because of the high layering that separates high level languages and libraries, which enforce complicated procedures and algorithms, from the base architecture of the CPUs that offers only limited and basic (although rapidly executed) arithmetic operations. The most practised approach nowadays is based on the use of Very-Large-Scale Integrated (VLSI) digital electronic circuits. Field Programmable Gate Arrays (FPGAs) are integrated digital circuits designed to be configured after manufacturing, "on the field". They typically provide lower performance levels when compared to Application Specific Integrated Circuits (ASICs), but at a lower cost, especially when dealing with limited production volumes. Of course, on-the-field programmability itself (and re-programmability, in the vast majority of cases) is also a characteristic feature that makes FPGA more suitable for applications with changing specifications where an update of capabilities may be a desirable benefit. Moreover, the time needed to fulfill the design cycle for FPGA-based circuits (including of course testing and debug speed) is much reduced when compared to the design flow and time-to-market of ASICs. In this thesis work, we will see (Chapter 1) some common problems and strategies involved with the use of FPGAs and FPGA-based systems for Real Time Image Processing and Real Time Video Processing (in the following alsoindicated interchangeably with the acronym RTVP); we will then focus, in particular, on two applications. Firstly, Chapter 2 will cover the implementation of a novel algorithm for Visual Search, known as CDVS, which has been recently standardised as part of the MPEG-7 standard. Visual search is an emerging field in mobile applications which is rapidly becoming ubiquitous. However, typically, algorithms for this kind of applications are connected with a high leverage on computational power and complex elaborations: as a consequence, implementation efficiency is a crucial point, and this generally results in the need for custom designed hardware. Chapter 3 will cover the implementation of an algorithm for the compression of hyperspectral images which is bit-true compatible with the CCSDS-123.0 standard algorithm. Hyperspectral images are three dimensional matrices in which each 2D plane represents the image, as captured by the sensor, in a given spectral band: their size may range from several millions of pixels up to billions of pixels. Typical scenarios of use of hyperspectral images include airborne and satellite-borne remote sensing. As a consequence, major concerns are the limitedness of both processing power and communication links bandwidth: thus, a proper compression algorithm, as well as the efficiency of its implementation, is crucial. In both cases we will first of all examine the scope of the work with reference to current state-of-the-art. We will then see the proposed implementations in their main characteristics and, to conclude, we will consider the primary experimental results

    Kodizajn arhitekture i algoritama za lokalizacijumobilnih robota i detekciju prepreka baziranih namodelu

    No full text
    This thesis proposes SoPC (System on a Programmable Chip) architectures for efficient embedding of vison-based localization and obstacle detection tasks in a navigational pipeline on autonomous mobile robots. The obtained results are equivalent or better in comparison to state-ofthe- art. For localization, an efficient hardware architecture that supports EKF-SLAM's local map management with seven-dimensional landmarks in real time is developed. For obstacle detection a novel method of object recognition is proposed - detection by identification framework based on single detection window scale. This framework allows adequate algorithmic precision and execution speeds on embedded hardware platforms.Ova teza bavi se dizajnom SoPC (engl. System on a Programmable Chip) arhitektura i algoritama za efikasnu implementaciju zadataka lokalizacije i detekcije prepreka baziranih na viziji u kontekstu autonomne robotske navigacije. Za lokalizaciju, razvijena je efikasna računarska arhitektura za EKF-SLAM algoritam, koja podržava skladištenje i obradu sedmodimenzionalnih orijentira lokalne mape u realnom vremenu. Za detekciju prepreka je predložena nova metoda prepoznavanja objekata u slici putem prozora detekcije fiksne dimenzije, koja omogućava veću brzinu izvršavanja algoritma detekcije na namenskim računarskim platformama
    corecore