217 research outputs found
Development Of A High Performance Mosaicing And Super-Resolution Algorithm
In this dissertation, a high-performance mosaicing and super-resolution algorithm is described. The scale invariant feature transform (SIFT)-based mosaicing algorithm builds an initial mosaic which is iteratively updated by the robust super resolution algorithm to achieve the final high-resolution mosaic. Two different types of datasets are used for testing: high altitude balloon data and unmanned aerial vehicle data. To evaluate our algorithm, five performance metrics are employed: mean square error, peak signal to noise ratio, singular value decomposition, slope of reciprocal singular value curve, and cumulative probability of blur detection. Extensive testing shows that the proposed algorithm is effective in improving the captured aerial data and the performance metrics are accurate in quantifying the evaluation of the algorithm
Towards 3D Matching of Point Clouds Derived from Oblique and Nadir Airborne Imagery
Because of the low-expense high-efficient image collection process and the rich 3D and texture information presented in the images, a combined use of 2D airborne nadir and oblique images to reconstruct 3D geometric scene has a promising market for future commercial usage like urban planning or first responders. The methodology introduced in this thesis provides a feasible way towards fully automated 3D city modeling from oblique and nadir airborne imagery.
In this thesis, the difficulty of matching 2D images with large disparity is avoided by grouping the images first and applying the 3D registration afterward. The procedure starts with the extraction of point clouds using a modified version of the RIT 3D Extraction Workflow. Then the point clouds are refined by noise removal and surface smoothing processes. Since the point clouds extracted from different image groups use independent coordinate systems, there are translation, rotation and scale differences existing. To figure out these differences, 3D keypoints and their features are extracted. For each pair of point clouds, an initial alignment and a more accurate registration are applied in succession. The final transform matrix presents the parameters describing the translation, rotation and scale requirements.
The methodology presented in the thesis has been shown to behave well for test data. The robustness of this method is discussed by adding artificial noise to the test data. For Pictometry oblique aerial imagery, the initial alignment provides a rough alignment result, which contains a larger offset compared to that of test data because of the low quality of the point clouds themselves, but it can be further refined through the final optimization. The accuracy of the final registration result is evaluated by comparing it to the result obtained from manual selection of matched points.
Using the method introduced, point clouds extracted from different image groups could be combined with each other to build a more complete point cloud, or be used as a complement to existing point clouds extracted from other sources. This research will both improve the state of the art of 3D city modeling and inspire new ideas in related fields
Registration and categorization of camera captured documents
Camera captured document image analysis concerns with processing of documents captured with hand-held sensors, smart phones, or other capturing devices using advanced image processing, computer vision, pattern recognition, and machine learning techniques. As there is no constrained capturing in the real world, the captured documents suffer from illumination variation, viewpoint variation, highly variable scale/resolution, background clutter, occlusion, and non-rigid deformations e.g., folds and crumples. Document registration is a problem where the image of a template document whose layout is known is registered with a test document image. Literature in camera captured document mosaicing addressed the registration of captured documents with the assumption of considerable amount of single chunk overlapping content. These methods cannot be directly applied to registration of forms, bills, and other commercial documents where the fixed content is distributed into tiny portions across the document. On the other hand, most of the existing document image registration methods work with scanned documents under affine transformation. Literature in document image retrieval addressed categorization of documents based on text, figures, etc.
However, the scalability of existing document categorization methodologies based on logo identification is very limited. This dissertation focuses on two problems (i) registration of captured documents where the overlapping content is distributed into tiny portions across the documents and (ii) categorization of captured documents into predefined logo classes that scale to large datasets using local invariant features. A novel methodology is proposed for the registration of user defined Regions Of Interest (ROI) using corresponding local features from their neighborhood. The methodology enhances prior approaches in point pattern based registration, like RANdom SAmple Consensus (RANSAC) and Thin Plate Spline-Robust Point Matching (TPS-RPM), to enable registration of cell phone and camera captured documents under non-rigid transformations. Three novel aspects are embedded into the methodology: (i) histogram based uniformly transformed correspondence estimation, (ii) clustering of points located near the ROI to select only close by regions for matching, and (iii) validation of the registration in RANSAC and TPS-RPM algorithms. Experimental results on a dataset of 480 images captured using iPhone 3GS and Logitech webcam Pro 9000 have shown an average registration accuracy of 92.75% using Scale Invariant Feature Transform (SIFT).
Robust local features for logo identification are determined empirically by comparisons among SIFT, Speeded-Up Robust Features (SURF), Hessian-Affine, Harris-Affine, and Maximally Stable Extremal Regions (MSER). Two different matching methods are presented for categorization: matching all features extracted from the query document as a single set and a segment-wise matching of query document features using segmentation achieved by grouping area under intersecting dense local affine covariant regions. The later approach not only gives an approximate location of predicted logo classes in the query document but also helps to increase the prediction accuracies. In order to facilitate scalability to large data sets, inverted indexing of logo class features has been incorporated in both approaches. Experimental results on a dataset of real camera captured documents have shown a peak 13.25% increase in the F–measure accuracy using the later approach as compared to the former
Recommended from our members
Foveated object recognition by corner search
textHere we describe a gray scale object recognition system based on foveated corner finding, the computation of sequential fixation points, and elements of Lowe’s SIFT transform. The system achieves rotational, transformational, and limited scale invariant object recognition that produces recognition decisions using data extracted from sequential fixation points. It is broken into two logical steps. The first is to develop principles of foveated visual search and automated fixation selection to accomplish corner search. The result is a new algorithm for finding corners which is also a corner-based algorithm for aiming computed foveated visual fixations. In the algorithm, long saccades move the fovea to previously unexplored areas of the image, while short saccades improve the accuracy of putative corner locations. The system is tested on two natural scenes. As an interesting comparison study we compare fixations generated by the algorithm with those of subjects viewing the same images, whose eye movements are being recorded by an eyetracker. The comparison of fixation patterns is made using an information-theoretic measure. Results show that the algorithm is a good locator of corners, but does not correlate particularly well with human visual fixations. The second step is to use the corners located, which meet certain goodness criteria, as keypoints in a modified version of the SIFT algorithm. Two scales are implemented. This implementation creates a database of SIFT features of known objects. To recognize an unknown object, a corner is located and a feature vector created. The feature vector is compared with those in the database of known objects. The process is continued for each corner in the unknown object until enough information has been accumulated to reach a decision. The system was tested on 78 gray scale objects, hand tools and airplanes, and shown to perform well.Electrical and Computer Engineerin
Reconstruction of 3D Information about Vehicles Passing in front of a Surveillance Camera
Tato diplomová práce se zabĂ˝vá 3D rekonstrukcĂ vozidel projĂĹľdÄ›jĂcĂch pĹ™ed dohledovou kamerou. V práci je nejprve pĹ™edstavena kalibrace dohledovĂ© kamery a souvislost automatickĂ© kalibrace s 3D informacemi o sledovanĂ© dopravÄ›. Dále jsou pĹ™edstaveny algoritmy Structure from Motion a SLAM, spoleÄŤnÄ› s metodami pro odhad optickĂ©ho toku. Za účelem prozkoumánĂ chovánĂ pro snĂmky projĂĹľdÄ›jĂcĂch vozidel jsou provedeny experimenty s vĂ˝poÄŤtem korespondencĂ a algoritmem Structure from Motion. NáslednÄ› je postup algoritmu Structure from Motion upraven. SIFT pĹ™Ăznaky jsou nahrazeny algoritmem DeepMatching za účelem zĂskánĂ hustĂ˝ch bodovĂ˝ch korespondencĂ pro následnou fázi rekonstrukce. RekonstruovanĂ© modely jsou dále zpĹ™esnÄ›ny aplikovánĂm dodateÄŤnĂ˝ch omezenĂ, která jsou specifická pro rekonstrukci projĂĹľdÄ›jĂcĂch vozidel. ZĂskanĂ© modely jsou potĂ© vyhodnoceny. VeškerĂ© zjištÄ›nĂ© poznatky a informace o rekonstrukci vozidel jsou pak vyuĹľity k navrĹľenĂ dalšĂch modifikacĂ, kterĂ© by vedly k vytvoĹ™enĂ zcela vlastnĂho rekonstrukÄŤnĂho postupu, specializovanĂ©ho pĹ™Ămo pro 3D rekonstrukci projĂĹľdÄ›jĂcĂch vozidel.This master's thesis focuses on 3D reconstruction of vehicles passing in front of a traffic surveillance camera. Calibration process of surveillance camera is first introduced and the relation of automatic calibration with 3D information about observed traffic is described. Furthermore, Structure from Motion, SLAM, and optical flow algorithms are presented. A set of experiments with feature matching and the Structure from Motion algorithm is carried out to examine results on images of passing vehicles. Afterwards, the Structure from Motion pipeline is modified. Instead of using SIFT features, DeepMatching algorithm is utilized to obtain quasi-dense point correspondences for the subsequent reconstruction phase. Afterwards, reconstructed models are refined by applying additional constraints specific to the vehicle reconstruction task. The resultant models are then evaluated. Lastly, observations and acquired information about the process of vehicle reconstruction are utilized to form proposals for prospective design of an entirely custom pipeline that would be specialized for 3D reconstruction of passing vehicles.
A multisensor SLAM for dense maps of large scale environments under poor lighting conditions
This thesis describes the development and implementation of a multisensor large scale autonomous mapping system for surveying tasks in underground mines. The hazardous nature of the underground mining industry has resulted in a push towards autonomous solutions to the most dangerous operations, including surveying tasks. Many existing autonomous mapping techniques rely on approaches to the Simultaneous Localization and Mapping (SLAM) problem which are not suited to the extreme characteristics of active underground mining environments. Our proposed multisensor system has been designed from the outset to address the unique challenges associated with underground SLAM. The robustness, self-containment and portability of the system maximize the potential applications.The multisensor mapping solution proposed as a result of this work is based on a fusion of omnidirectional bearing-only vision-based localization and 3D laser point cloud registration. By combining these two SLAM techniques it is possible to achieve some of the advantages of both approaches – the real-time attributes of vision-based SLAM and the dense, high precision maps obtained through 3D lasers. The result is a viable autonomous mapping solution suitable for application in challenging underground mining environments.A further improvement to the robustness of the proposed multisensor SLAM system is a consequence of incorporating colour information into vision-based localization. Underground mining environments are often dominated by dynamic sources of illumination which can cause inconsistent feature motion during localization. Colour information is utilized to identify and remove features resulting from illumination artefacts and to improve the monochrome based feature matching between frames.Finally, the proposed multisensor mapping system is implemented and evaluated in both above ground and underground scenarios. The resulting large scale maps contained a maximum offset error of ±30mm for mapping tasks with lengths over 100m
An Image Processing Approach Toward a Visual Intra-Cortical Stimulator
Abstract Visual impairment may be caused by various factors varying from trauma, birth-defects, and diseases. Until today there are no viable medical treatments for this condition; hence bio-medical approaches are being employed to overcome that. The Cortivision team has been working on an intra-cortical implant that can bypass the retina and optic nerve and directly stimulate the visual cortex. In this work we aimed to implement a modular, reusable, and parameterizable object recognition system that tends to ``simplify'' video data prior to stimulation; hence opening new horizons for partial vision restoration, navigational and even recognition abilities.
We identified the Scale Invariant Feature Transform (SIFT) algorithm as being a robust candidate for our application's needs. A multithreaded software prototype of the SIFT and Lucas-Kanade tracker was implemented to ensure proper overall operation. The feature extractor, difference of Gaussians (DoG) part of the SIFT, being the most computationally expensive, was migrated to an FPGA implementation due to the real-time restrictions that is not achievable on a host machine. The VHDL implementation is highly parameterizable for different application needs and tradeoffs. We introduced a novel architecture employing the sub-kernel trick to reduce resource usage compared to preexisting architectures while still being comparably accurate to a software floating point implementation. In order to alleviate transmission bottlenecks, the system also includes a new parallel Huffman encoder design that is capable of performing lossless compression of both images and scale space image pyramids taking into account spatial and scale data correlations during the predictor phase. The encoder was able to achieve compression ratios of 27.3% on the Caltech-256 data-set. Furthermore, a new camera and fiducial markers setup based on image processing was proposed in order to target the phosphene map estimation problem which affects the quality of the final stimulation that is perceived by the patient.----------RÉSUMÉ Introduction et objectifs La déficience visuelle, qui est définie par la perte totale ou partielle de la vision, n'est actuellement pas médicalement traitable. Des approches biomédicales modernes sont utilisées pour stimuler électriquement la vision; ces approches peuvent être divisées en trois groupes principaux: le premier ciblant les implants rétiniens Humayun et al. (2003), Kim et al. (2004), Chow et al. (2004); Palanker et al. (2005), Toledo et al. (2005); Yanai et al. (2007), Winter et al. (2007); Zrenner et al. (2011), le deuxième ciblant les implants du nerf optique Veraart et al. (2003), Sakaguchi et al. (2009), et le troisième ciblant les implants intra-corticaux Doljanu et Sawan (2007); Coulombe et al. (2007); Srivastava et al. (2007). L’inconvénient principal des deux premiers groupes, c'est qu'ils ne sont pas suffisamment génériques pour surmonter la majorité des maladies de déficience visuelle, car ils dépendent du fait que le patient doit avoir un nerf optique intact et/ou une rétine partiellement opérationnelle ; ce qui n'est pas le cas pour le troisième groupe. L'équipe du Laboratoire Polystim Neurotechnologies travaille actuellement sur un implant intra-cortical qui stimule directement le cortex visuel primaire (région V1) ; le nom du projet global est Cortivision. Le système utilise une caméra, un module de traitement d'image, un transmetteur RF (radiofréquence) et un stimulateur implantable. Cette méthode est robuste et générique car elle contourne l'oeil et le nerf optique. Un des défis majeurs est le traitement d'image nécessaire pour «simplifier» les données antérieures à la stimulation, l'extraction de l’information utile en écartant les données superflues. Les pixels qui sont capturés par la caméra n'ont pas de correspondance un-à -un sur le cortex visuel comme dans une image rectangulaire, ils sont plutôt mis en correspondance avec une carte complexe de «phosphènes» Coulombe et al. (2007); Srivastava et al. (2007). Les phosphènes sont des points lumineux qui apparaissent dans le champ de vision du patient quand le cerveau est stimulé électriquement. Ces points changent en terme de taille, de luminosité et d’emplacement en fonction de la façon dont la stimulation électrique est effectuée (c'est à dire un changement dans la fréquence, la tension, la durée, etc. ...) et même par le placement physique des électrodes dans le cortex visuel. Les approches actuelles visent à stimuler des images de phosphènes monochromes à basse résolution. Sachant cela, nous nous attendons plutôt à une vision de faible qualité qui rend des activités comme naviguer, interpréter des objets, ou encore lire, difficile pour le patient. Ceci est principalement dû à la complexité de l’étalonnage de la carte phosphène et sa correspondance, et aussi à la non-trivialité de savoir comment simplifier les données à partir des images qui viennent de la camera de façon qu’on conserve seulement les données pertinentes. La Figure 1.1 est un exemple qui démontre la non-trivialité de transformer une image grise en stimulation phosphène
High-performance hardware accelerators for image processing in space applications
Mars is a hard place to reach. While there have been many notable success stories in getting probes to the Red Planet, the historical record is full of bad news. The success rate for actually landing on the Martian surface is even worse, roughly 30%. This low success rate must be mainly credited to the Mars environment characteristics. In the Mars atmosphere strong winds frequently breath. This phenomena usually modifies the lander descending trajectory diverging it from the target one. Moreover, the Mars surface is not the best place where performing a safe land. It is pitched by many and close craters and huge stones, and characterized by huge mountains and hills (e.g., Olympus Mons is 648 km in diameter and 27 km tall). For these reasons a mission failure due to a landing in huge craters, on big stones or on part of the surface characterized by a high slope is highly probable.
In the last years, all space agencies have increased their research efforts in order to enhance the success rate of Mars missions. In particular, the two hottest research topics are: the active debris removal and the guided landing on Mars.
The former aims at finding new methods to remove space debris exploiting unmanned spacecrafts. These must be able to autonomously: detect a debris, analyses it, in order to extract its characteristics in terms of weight, speed and dimension, and, eventually, rendezvous with it. In order to perform these tasks, the spacecraft must have high vision capabilities. In other words, it must be able to take pictures and process them with very complex image processing algorithms in order to detect, track and analyse the debris.
The latter aims at increasing the landing point precision (i.e., landing ellipse) on Mars. Future space-missions will increasingly adopt Video Based Navigation systems to assist the entry, descent and landing (EDL) phase of space modules (e.g., spacecrafts), enhancing the precision of automatic EDL navigation systems. For instance, recent space exploration missions, e.g., Spirity, Oppurtunity, and Curiosity, made use of an EDL procedure aiming at following a fixed and precomputed descending trajectory to reach a precise landing point. This approach guarantees a maximum landing point precision of 20 km. By comparing this data with the Mars environment characteristics, it is possible to understand how the mission failure probability still remains really high.
A very challenging problem is to design an autonomous-guided EDL system able to even more reduce the landing ellipse, guaranteeing to avoid the landing in dangerous area of Mars surface (e.g., huge craters or big stones) that could lead to the mission failure. The autonomous behaviour of the system is mandatory since a manual driven approach is not feasible due to the distance between Earth and Mars. Since this distance varies from 56 to 100 million of km approximately due to the orbit eccentricity, even if a signal transmission at the light speed could be possible, in the best case the transmission time would be around 31 minutes, exceeding so the overall duration of the EDL phase.
In both applications, algorithms must guarantee self-adaptability to the environmental conditions. Since the Mars (and in general the space) harsh conditions are difficult to be predicted at design time, these algorithms must be able to automatically tune the internal parameters depending on the current conditions.
Moreover, real-time performances are another key factor. Since a software implementation of these computational intensive tasks cannot reach the required performances, these algorithms must be accelerated via hardware.
For this reasons, this thesis presents my research work done on advanced image processing algorithms for space applications and the associated hardware accelerators. My research activity has been focused on both the algorithm and their hardware implementations. Concerning the first aspect, I mainly focused my research effort to integrate self-adaptability features in the existing algorithms. While concerning the second, I studied and validated a methodology to efficiently develop, verify and validate hardware components aimed at accelerating video-based applications. This approach allowed me to develop and test high performance hardware accelerators that strongly overcome the performances of the actual state-of-the-art implementations.
The thesis is organized in four main chapters.
Chapter 2 starts with a brief introduction about the story of digital image processing. The main content of this chapter is the description of space missions in which digital image processing has a key role. A major effort has been spent on the missions in which my research activity has a substantial impact. In particular, for these missions, this chapter deeply analizes and evaluates the state-of-the-art approaches and algorithms.
Chapter 3 analyzes and compares the two technologies used to implement high performances hardware accelerators, i.e., Application Specific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs). Thanks to this information the reader may understand the main reasons behind the decision of space agencies to exploit FPGAs instead of ASICs for high-performance hardware accelerators in space missions, even if FPGAs are more sensible to Single Event Upsets (i.e., transient error induced on hardware component by alpha particles and solar radiation in space). Moreover, this chapter deeply describes the three available space-grade FPGA technologies (i.e., One-time Programmable, Flash-based, and SRAM-based), and the main fault-mitigation techniques against SEUs that are mandatory for employing space-grade FPGAs in actual missions.
Chapter 4 describes one of the main contribution of my research work: a library of high-performance hardware accelerators for image processing in space applications. The basic idea behind this library is to offer to designers a set of validated hardware components able to strongly speed up the basic image processing operations commonly used in an image processing chain. In other words, these components can be directly used as elementary building blocks to easily create a complex image processing system, without wasting time in the debug and validation phase. This library groups the proposed hardware accelerators in IP-core families. The components contained in a same family share the same provided functionality and input/output interface. This harmonization in the I/O interface enables to substitute, inside a complex image processing system, components of the same family without requiring modifications to the system communication infrastructure. In addition to the analysis of the internal architecture of the proposed components, another important aspect of this chapter is the methodology used to develop, verify and validate the proposed high performance image processing hardware accelerators. This methodology involves the usage of different programming and hardware description languages in order to support the designer from the algorithm modelling up to the hardware implementation and validation.
Chapter 5 presents the proposed complex image processing systems. In particular, it exploits a set of actual case studies, associated with the most recent space agency needs, to show how the hardware accelerator components can be assembled to build a complex image processing system. In addition to the hardware accelerators contained in the library, the described complex system embeds innovative ad-hoc hardware components and software routines able to provide high performance and self-adaptable image processing functionalities. To prove the benefits of the proposed methodology, each case study is concluded with a comparison with the current state-of-the-art implementations, highlighting the benefits in terms of performances and self-adaptability to the environmental conditions
Vision-based retargeting for endoscopic navigation
Endoscopy is a standard procedure for visualising the human gastrointestinal tract. With the advances in biophotonics, imaging techniques such as narrow band imaging, confocal laser endomicroscopy, and optical coherence tomography can be combined with normal endoscopy for assisting the early diagnosis of diseases, such as cancer. In the past decade, optical biopsy has emerged to be an effective tool for tissue analysis, allowing in vivo and in situ assessment of pathological sites with real-time feature-enhanced microscopic images. However, the non-invasive nature of optical biopsy leads to an intra-examination retargeting problem, which is associated with the difficulty of re-localising a biopsied site consistently throughout the whole examination. In addition to intra-examination retargeting, retargeting of a pathological site is even more challenging across examinations, due to tissue deformation and changing tissue morphologies and appearances. The purpose of this thesis is to address both the intra- and inter-examination retargeting problems associated with optical biopsy. We propose a novel vision-based framework for intra-examination retargeting. The proposed framework is based on combining visual tracking and detection with online learning of the appearance of the biopsied site. Furthermore, a novel cascaded detection approach based on random forests and structured support vector machines is developed to achieve efficient retargeting. To cater for reliable inter-examination retargeting, the solution provided in this thesis is achieved by solving an image retrieval problem, for which an online scene association approach is proposed to summarise an endoscopic video collected in the first examination into distinctive scenes. A hashing-based approach is then used to learn the intrinsic representations of these scenes, such that retargeting can be achieved in subsequent examinations by retrieving the relevant images using the learnt representations. For performance evaluation of the proposed frameworks, extensive phantom, ex vivo and in vivo experiments have been conducted, with results demonstrating the robustness and potential clinical values of the methods proposed.Open Acces
- …