312 research outputs found

    Automatic Image Registration in Infrared-Visible Videos using Polygon Vertices

    Full text link
    In this paper, an automatic method is proposed to perform image registration in visible and infrared pair of video sequences for multiple targets. In multimodal image analysis like image fusion systems, color and IR sensors are placed close to each other and capture a same scene simultaneously, but the videos are not properly aligned by default because of different fields of view, image capturing information, working principle and other camera specifications. Because the scenes are usually not planar, alignment needs to be performed continuously by extracting relevant common information. In this paper, we approximate the shape of the targets by polygons and use affine transformation for aligning the two video sequences. After background subtraction, keypoints on the contour of the foreground blobs are detected using DCE (Discrete Curve Evolution)technique. These keypoints are then described by the local shape at each point of the obtained polygon. The keypoints are matched based on the convexity of polygon's vertices and Euclidean distance between them. Only good matches for each local shape polygon in a frame, are kept. To achieve a global affine transformation that maximises the overlapping of infrared and visible foreground pixels, the matched keypoints of each local shape polygon are stored temporally in a buffer for a few number of frames. The matrix is evaluated at each frame using the temporal buffer and the best matrix is selected, based on an overlapping ratio criterion. Our experimental results demonstrate that this method can provide highly accurate registered images and that we outperform a previous related method

    3D photogrammetric data modeling and optimization for multipurpose analysis and representation of Cultural Heritage assets

    Get PDF
    This research deals with the issues concerning the processing, managing, representation for further dissemination of the big amount of 3D data today achievable and storable with the modern geomatic techniques of 3D metric survey. In particular, this thesis is focused on the optimization process applied to 3D photogrammetric data of Cultural Heritage assets. Modern Geomatic techniques enable the acquisition and storage of a big amount of data, with high metric and radiometric accuracy and precision, also in the very close range field, and to process very detailed 3D textured models. Nowadays, the photogrammetric pipeline has well-established potentialities and it is considered one of the principal technique to produce, at low cost, detailed 3D textured models. The potentialities offered by high resolution and textured 3D models is today well-known and such representations are a powerful tool for many multidisciplinary purposes, at different scales and resolutions, from documentation, conservation and restoration to visualization and education. For example, their sub-millimetric precision makes them suitable for scientific studies applied to the geometry and materials (i.e. for structural and static tests, for planning restoration activities or for historical sources); their high fidelity to the real object and their navigability makes them optimal for web-based visualization and dissemination applications. Thanks to the improvement made in new visualization standard, they can be easily used as visualization interface linking different kinds of information in a highly intuitive way. Furthermore, many museums look today for more interactive exhibitions that may increase the visitors’ emotions and many recent applications make use of 3D contents (i.e. in virtual or augmented reality applications and through virtual museums). What all of these applications have to deal with concerns the issue deriving from the difficult of managing the big amount of data that have to be represented and navigated. Indeed, reality based models have very heavy file sizes (also tens of GB) that makes them difficult to be handled by common and portable devices, published on the internet or managed in real time applications. Even though recent advances produce more and more sophisticated and capable hardware and internet standards, empowering the ability to easily handle, visualize and share such contents, other researches aim at define a common pipeline for the generation and optimization of 3D models with a reduced number of polygons, however able to satisfy detailed radiometric and geometric requests. iii This thesis is inserted in this scenario and focuses on the 3D modeling process of photogrammetric data aimed at their easy sharing and visualization. In particular, this research tested a 3D models optimization, a process which aims at the generation of Low Polygons models, with very low byte file size, processed starting from the data of High Poly ones, that nevertheless offer a level of detail comparable to the original models. To do this, several tools borrowed from the game industry and game engine have been used. For this test, three case studies have been chosen, a modern sculpture of a contemporary Italian artist, a roman marble statue, preserved in the Civic Archaeological Museum of Torino, and the frieze of the Augustus arch preserved in the city of Susa (Piedmont- Italy). All the test cases have been surveyed by means of a close range photogrammetric acquisition and three high detailed 3D models have been generated by means of a Structure from Motion and image matching pipeline. On the final High Poly models generated, different optimization and decimation tools have been tested with the final aim to evaluate the quality of the information that can be extracted by the final optimized models, in comparison to those of the original High Polygon one. This study showed how tools borrowed from the Computer Graphic offer great potentialities also in the Cultural Heritage field. This application, in fact, may meet the needs of multipurpose and multiscale studies, using different levels of optimization, and this procedure could be applied to different kind of objects, with a variety of different sizes and shapes, also on multiscale and multisensor data, such as buildings, architectural complexes, data from UAV surveys and so on

    3D head motion, point-of-regard and encoded gaze fixations in real scenes: next-generation portable video-based monocular eye tracking

    Get PDF
    Portable eye trackers allow us to see where a subject is looking when performing a natural task with free head and body movements. These eye trackers include headgear containing a camera directed at one of the subject\u27s eyes (the eye camera) and another camera (the scene camera) positioned above the same eye directed along the subject\u27s line-of-sight. The output video includes the scene video with a crosshair depicting where the subject is looking -- the point-of-regard (POR) -- that is updated for each frame. This video may be the desired final result or it may be further analyzed to obtain more specific information about the subject\u27s visual strategies. A list of the calculated POR positions in the scene video can also be analyzed. The goals of this project are to expand the information that we can obtain from a portable video-based monocular eye tracker and to minimize the amount of user interaction required to obtain and analyze this information. This work includes offline processing of both the eye and scene videos to obtain robust 2D PORs in scene video frames, identify gaze fixations from these PORs, obtain 3D head motion and ray trace fixations through volumes-of-interest (VOIs) to determine what is being fixated, when and where (3D POR). To avoid the redundancy of ray tracing a 2D POR in every video frame and to group these POR data meaningfully, a fixation-identification algorithm is employed to simplify the long list of 2D POR data into gaze fixations. In order to ray trace these fixations, the 3D motion -- position and orientation over time -- of the scene camera is computed. This camera motion is determined via an iterative structure and motion recovery algorithm that requires a calibrated camera and knowledge of the 3D location of at least four points in the scene (that can be selected from premeasured VOI vertices). The subjects 3D head motion is obtained directly from this camera motion. For the final stage of the algorithm, the 3D locations and dimensions of VOIs in the scene are required. This VOI information in world coordinates is converted to camera coordinates for ray tracing. A representative 2D POR position for each fixation is converted from image coordinates to the same camera coordinate system. Then, a ray is traced from the camera center through this position to determine which (if any) VOI is being fixated and where it is being fixated -- the 3D POR in the world. Results are presented for various real scenes. Novel visualizations of portable eye tracker data created using the results of our algorithm are also presented

    RÔivaste tekstureerimine kasutades Kinect V2.0

    Get PDF
    This thesis describes three new garment retexturing methods for FitsMe virtual fitting room applications using data from Microsoft Kinect II RGB-D camera. The first method, which is introduced, is an automatic technique for garment retexturing using a single RGB-D image and infrared information obtained from Kinect II. First, the garment is segmented out from the image using GrabCut or depth segmentation. Then texture domain coordinates are computed for each pixel belonging to the garment using normalized 3D information. Afterwards, shading is applied to the new colors from the texture image. The second method proposed in this work is about 2D to 3D garment retexturing where a segmented garment of a manikin or person is matched to a new source garment and retextured, resulting in augmented images in which the new source garment is transferred to the manikin or person. The problem is divided into garment boundary matching based on point set registration which uses Gaussian mixture models and then interpolate inner points using surface topology extracted through geodesic paths, which leads to a more realistic result than standard approaches. The final contribution of this thesis is by introducing another novel method which is used for increasing the texture quality of a 3D model of a garment, by using the same Kinect frame sequence which was used in the model creation. Firstly, a structured mesh must be created from the 3D model, therefore the 3D model is wrapped to a base model with defined seams and texture map. Afterwards frames are matched to the newly created model and by process of ray casting the color values of the Kinect frames are mapped to the UV map of the 3D model

    Digital 3D documentation of cultural heritage sites based on terrestrial laser scanning

    Get PDF

    Computationally efficient deformable 3D object tracking with a monocular RGB camera

    Get PDF
    182 p.Monocular RGB cameras are present in most scopes and devices, including embedded environments like robots, cars and home automation. Most of these environments have in common a significant presence of human operators with whom the system has to interact. This context provides the motivation to use the captured monocular images to improve the understanding of the operator and the surrounding scene for more accurate results and applications.However, monocular images do not have depth information, which is a crucial element in understanding the 3D scene correctly. Estimating the three-dimensional information of an object in the scene using a single two-dimensional image is already a challenge. The challenge grows if the object is deformable (e.g., a human body or a human face) and there is a need to track its movements and interactions in the scene.Several methods attempt to solve this task, including modern regression methods based on Deep NeuralNetworks. However, despite the great results, most are computationally demanding and therefore unsuitable for several environments. Computational efficiency is a critical feature for computationally constrained setups like embedded or onboard systems present in robotics and automotive applications, among others.This study proposes computationally efficient methodologies to reconstruct and track three-dimensional deformable objects, such as human faces and human bodies, using a single monocular RGB camera. To model the deformability of faces and bodies, it considers two types of deformations: non-rigid deformations for face tracking, and rigid multi-body deformations for body pose tracking. Furthermore, it studies their performance on computationally restricted devices like smartphones and onboard systems used in the automotive industry. The information extracted from such devices gives valuable insight into human behaviour a crucial element in improving human-machine interaction.We tested the proposed approaches in different challenging application fields like onboard driver monitoring systems, human behaviour analysis from monocular videos, and human face tracking on embedded devices

    Computationally efficient deformable 3D object tracking with a monocular RGB camera

    Get PDF
    182 p.Monocular RGB cameras are present in most scopes and devices, including embedded environments like robots, cars and home automation. Most of these environments have in common a significant presence of human operators with whom the system has to interact. This context provides the motivation to use the captured monocular images to improve the understanding of the operator and the surrounding scene for more accurate results and applications.However, monocular images do not have depth information, which is a crucial element in understanding the 3D scene correctly. Estimating the three-dimensional information of an object in the scene using a single two-dimensional image is already a challenge. The challenge grows if the object is deformable (e.g., a human body or a human face) and there is a need to track its movements and interactions in the scene.Several methods attempt to solve this task, including modern regression methods based on Deep NeuralNetworks. However, despite the great results, most are computationally demanding and therefore unsuitable for several environments. Computational efficiency is a critical feature for computationally constrained setups like embedded or onboard systems present in robotics and automotive applications, among others.This study proposes computationally efficient methodologies to reconstruct and track three-dimensional deformable objects, such as human faces and human bodies, using a single monocular RGB camera. To model the deformability of faces and bodies, it considers two types of deformations: non-rigid deformations for face tracking, and rigid multi-body deformations for body pose tracking. Furthermore, it studies their performance on computationally restricted devices like smartphones and onboard systems used in the automotive industry. The information extracted from such devices gives valuable insight into human behaviour a crucial element in improving human-machine interaction.We tested the proposed approaches in different challenging application fields like onboard driver monitoring systems, human behaviour analysis from monocular videos, and human face tracking on embedded devices

    Methods for Real-time Visualization and Interaction with Landforms

    Get PDF
    This thesis presents methods to enrich data modeling and analysis in the geoscience domain with a particular focus on geomorphological applications. First, a short overview of the relevant characteristics of the used remote sensing data and basics of its processing and visualization are provided. Then, two new methods for the visualization of vector-based maps on digital elevation models (DEMs) are presented. The first method uses a texture-based approach that generates a texture from the input maps at runtime taking into account the current viewpoint. In contrast to that, the second method utilizes the stencil buffer to create a mask in image space that is then used to render the map on top of the DEM. A particular challenge in this context is posed by the view-dependent level-of-detail representation of the terrain geometry. After suitable visualization methods for vector-based maps have been investigated, two landform mapping tools for the interactive generation of such maps are presented. The user can carry out the mapping directly on the textured digital elevation model and thus benefit from the 3D visualization of the relief. Additionally, semi-automatic image segmentation techniques are applied in order to reduce the amount of user interaction required and thus make the mapping process more efficient and convenient. The challenge in the adaption of the methods lies in the transfer of the algorithms to the quadtree representation of the data and in the application of out-of-core and hierarchical methods to ensure interactive performance. Although high-resolution remote sensing data are often available today, their effective resolution at steep slopes is rather low due to the oblique acquisition angle. For this reason, remote sensing data are suitable to only a limited extent for visualization as well as landform mapping purposes. To provide an easy way to supply additional imagery, an algorithm for registering uncalibrated photos to a textured digital elevation model is presented. A particular challenge in registering the images is posed by large variations in the photos concerning resolution, lighting conditions, seasonal changes, etc. The registered photos can be used to increase the visual quality of the textured DEM, in particular at steep slopes. To this end, a method is presented that combines several georegistered photos to textures for the DEM. The difficulty in this compositing process is to create a consistent appearance and avoid visible seams between the photos. In addition to that, the photos also provide valuable means to improve landform mapping. To this end, an extension of the landform mapping methods is presented that allows the utilization of the registered photos during mapping. This way, a detailed and exact mapping becomes feasible even at steep slopes

    Segmentation mutuelle d'objets d'intĂ©rĂȘt dans des sĂ©quences d'images stĂ©rĂ©o multispectrales

    Get PDF
    Les systĂšmes de vidĂ©osurveillance automatisĂ©s actuellement dĂ©ployĂ©s dans le monde sont encore bien loin de ceux qui sont reprĂ©sentĂ©s depuis des annĂ©es dans les oeuvres de sciencefiction. Une des raisons derriĂšre ce retard de dĂ©veloppement est le manque d’outils de bas niveau permettant de traiter les donnĂ©es brutes captĂ©es sur le terrain. Le prĂ©-traitement de ces donnĂ©es sert Ă  rĂ©duire la quantitĂ© d’information qui transige vers des serveurs centralisĂ©s, qui eux effectuent l’interprĂ©tation complĂšte du contenu visuel captĂ©. L’identification d’objets d’intĂ©rĂȘt dans les images brutes Ă  partir de leur mouvement est un exemple de prĂ©-traitement qui peut ĂȘtre rĂ©alisĂ©. Toutefois, dans un contexte de vidĂ©osurveillance, une mĂ©thode de prĂ©-traitement ne peut gĂ©nĂ©ralement pas se fier Ă  un modĂšle d’apparence ou de forme qui caractĂ©rise ces objets, car leur nature exacte n’est pas connue d’avance. Cela complique donc l’élaboration des mĂ©thodes de traitement de bas niveau. Dans cette thĂšse, nous prĂ©sentons diffĂ©rentes mĂ©thodes permettant de dĂ©tecter et de segmenter des objets d’intĂ©rĂȘt Ă  partir de sĂ©quences vidĂ©o de maniĂšre complĂštement automatisĂ©e. Nous explorons d’abord les approches de segmentation vidĂ©o monoculaire par soustraction d’arriĂšre-plan. Ces approches se basent sur l’idĂ©e que l’arriĂšre-plan d’une scĂšne peut ĂȘtre modĂ©lisĂ© au fil du temps, et que toute variation importante d’apparence non prĂ©dite par le modĂšle dĂ©voile en fait la prĂ©sence d’un objet en intrusion. Le principal dĂ©fi devant ĂȘtre relevĂ© par ce type de mĂ©thode est que leur modĂšle d’arriĂšre-plan doit pouvoir s’adapter aux changements dynamiques des conditions d’observation de la scĂšne. La mĂ©thode conçue doit aussi pouvoir rester sensible Ă  l’apparition de nouveaux objets d’intĂ©rĂȘt, malgrĂ© cette robustesse accrue aux comportements dynamiques prĂ©visibles. Nous proposons deux mĂ©thodes introduisant diffĂ©rentes techniques de modĂ©lisation qui permettent de mieux caractĂ©riser l’apparence de l’arriĂšre-plan sans que le modĂšle soit affectĂ© par les changements d’illumination, et qui analysent la persistance locale de l’arriĂšre-plan afin de mieux dĂ©tecter les objets d’intĂ©rĂȘt temporairement immobilisĂ©s. Nous introduisons aussi de nouveaux mĂ©canismes de rĂ©troaction servant Ă  ajuster les hyperparamĂštres de nos mĂ©thodes en fonction du dynamisme observĂ© de la scĂšne et de la qualitĂ© des rĂ©sultats produits.----------ABSTRACT: The automated video surveillance systems currently deployed around the world are still quite far in terms of capabilities from the ones that have inspired countless science fiction works over the past few years. One of the reasons behind this lag in development is the lack of lowlevel tools that allow raw image data to be processed directly in the field. This preprocessing is used to reduce the amount of information transferred to centralized servers that have to interpret the captured visual content for further use. The identification of objects of interest in raw images based on motion is an example of a reprocessing step that might be required by a large system. However, in a surveillance context, the preprocessing method can seldom rely on an appearance or shape model to recognize these objects since their exact nature cannot be known exactly in advance. This complicates the elaboration of low-level image processing methods. In this thesis, we present different methods that detect and segment objects of interest from video sequences in a fully unsupervised fashion. We first explore monocular video segmentation approaches based on background subtraction. These approaches are based on the idea that the background of an observed scene can be modeled over time, and that any drastic variation in appearance that is not predicted by the model actually reveals the presence of an intruding object. The main challenge that must be met by background subtraction methods is that their model should be able to adapt to dynamic changes in scene conditions. The designed methods must also remain sensitive to the emergence of new objects of interest despite this increased robustness to predictable dynamic scene behaviors. We propose two methods that introduce different modeling techniques to improve background appearance description in an illumination-invariant way, and that analyze local background persistence to improve the detection of temporarily stationary objects. We also introduce new feedback mechanisms used to adjust the hyperparameters of our methods based on the observed dynamics of the scene and the quality of the generated output

    Use of ERTS-1 data: Summary report of work on ten tasks

    Get PDF
    The author has identified the following significant results. Depth mapping's for a portion of Lake Michigan and at the Little Bahama Bank test site have been verified by use of navigation charts and on-site visits. A thirteen category recognition map of Yellowstone Park has been prepared. Model calculation of atmospheric effects for various altitudes have been prepared. Radar, SLAR, and ERTS-1 data for flooded areas of Monroe County, Michigan are being studied. Water bodies can be reliably recognized and mapped using maximum likelihood processing of ERTS-1 digital data. Wetland mapping has been accomplished by slicing of single band and/or ratio processing of two bands for a single observation date. Both analog and digital processing have been used to map the Lake Ontario basin using ERTS-1 data. Operating characteristic curves were developed for the proportion estimation algorithm to determine its performance in the measurement of surface water area. The signal in band MSS-5 was related to sediment content of waters by modelling approach and by relating surface measurements of water to processed ERTS data. Radiance anomalies in ERTS-1 data could be associated with the presence of oil on water in San Francisco Bay, but the anomalies were of the same order as those caused by variations in sediment concentration and tidal flushing
    • 

    corecore