312 research outputs found
Automatic Image Registration in Infrared-Visible Videos using Polygon Vertices
In this paper, an automatic method is proposed to perform image registration
in visible and infrared pair of video sequences for multiple targets. In
multimodal image analysis like image fusion systems, color and IR sensors are
placed close to each other and capture a same scene simultaneously, but the
videos are not properly aligned by default because of different fields of view,
image capturing information, working principle and other camera specifications.
Because the scenes are usually not planar, alignment needs to be performed
continuously by extracting relevant common information. In this paper, we
approximate the shape of the targets by polygons and use affine transformation
for aligning the two video sequences. After background subtraction, keypoints
on the contour of the foreground blobs are detected using DCE (Discrete Curve
Evolution)technique. These keypoints are then described by the local shape at
each point of the obtained polygon. The keypoints are matched based on the
convexity of polygon's vertices and Euclidean distance between them. Only good
matches for each local shape polygon in a frame, are kept. To achieve a global
affine transformation that maximises the overlapping of infrared and visible
foreground pixels, the matched keypoints of each local shape polygon are stored
temporally in a buffer for a few number of frames. The matrix is evaluated at
each frame using the temporal buffer and the best matrix is selected, based on
an overlapping ratio criterion. Our experimental results demonstrate that this
method can provide highly accurate registered images and that we outperform a
previous related method
3D photogrammetric data modeling and optimization for multipurpose analysis and representation of Cultural Heritage assets
This research deals with the issues concerning the processing, managing, representation
for further dissemination of the big amount of 3D data today achievable and storable with
the modern geomatic techniques of 3D metric survey. In particular, this thesis is focused
on the optimization process applied to 3D photogrammetric data of Cultural Heritage
assets.
Modern Geomatic techniques enable the acquisition and storage of a big amount of data,
with high metric and radiometric accuracy and precision, also in the very close range
field, and to process very detailed 3D textured models. Nowadays, the photogrammetric
pipeline has well-established potentialities and it is considered one of the principal
technique to produce, at low cost, detailed 3D textured models.
The potentialities offered by high resolution and textured 3D models is today well-known
and such representations are a powerful tool for many multidisciplinary purposes, at
different scales and resolutions, from documentation, conservation and restoration to
visualization and education. For example, their sub-millimetric precision makes them
suitable for scientific studies applied to the geometry and materials (i.e. for structural and
static tests, for planning restoration activities or for historical sources); their high fidelity
to the real object and their navigability makes them optimal for web-based visualization
and dissemination applications. Thanks to the improvement made in new visualization
standard, they can be easily used as visualization interface linking different kinds of
information in a highly intuitive way. Furthermore, many museums look today for more
interactive exhibitions that may increase the visitorsâ emotions and many recent
applications make use of 3D contents (i.e. in virtual or augmented reality applications and
through virtual museums).
What all of these applications have to deal with concerns the issue deriving from the
difficult of managing the big amount of data that have to be represented and navigated.
Indeed, reality based models have very heavy file sizes (also tens of GB) that makes them
difficult to be handled by common and portable devices, published on the internet or
managed in real time applications. Even though recent advances produce more and more
sophisticated and capable hardware and internet standards, empowering the ability to
easily handle, visualize and share such contents, other researches aim at define a common
pipeline for the generation and optimization of 3D models with a reduced number of
polygons, however able to satisfy detailed radiometric and geometric requests.
iii
This thesis is inserted in this scenario and focuses on the 3D modeling process of
photogrammetric data aimed at their easy sharing and visualization. In particular, this
research tested a 3D models optimization, a process which aims at the generation of Low
Polygons models, with very low byte file size, processed starting from the data of High
Poly ones, that nevertheless offer a level of detail comparable to the original models. To
do this, several tools borrowed from the game industry and game engine have been used.
For this test, three case studies have been chosen, a modern sculpture of a contemporary
Italian artist, a roman marble statue, preserved in the Civic Archaeological Museum of
Torino, and the frieze of the Augustus arch preserved in the city of Susa (Piedmont-
Italy). All the test cases have been surveyed by means of a close range photogrammetric
acquisition and three high detailed 3D models have been generated by means of a
Structure from Motion and image matching pipeline. On the final High Poly models
generated, different optimization and decimation tools have been tested with the final aim
to evaluate the quality of the information that can be extracted by the final optimized
models, in comparison to those of the original High Polygon one. This study showed how
tools borrowed from the Computer Graphic offer great potentialities also in the Cultural
Heritage field. This application, in fact, may meet the needs of multipurpose and
multiscale studies, using different levels of optimization, and this procedure could be
applied to different kind of objects, with a variety of different sizes and shapes, also on
multiscale and multisensor data, such as buildings, architectural complexes, data from
UAV surveys and so on
3D head motion, point-of-regard and encoded gaze fixations in real scenes: next-generation portable video-based monocular eye tracking
Portable eye trackers allow us to see where a subject is looking when performing a natural task with free head and body movements. These eye trackers include headgear containing a camera directed at one of the subject\u27s eyes (the eye camera) and another camera (the scene camera) positioned above the same eye directed along the subject\u27s line-of-sight. The output video includes the scene video with a crosshair depicting where the subject is looking -- the point-of-regard (POR) -- that is updated for each frame. This video may be the desired final result or it may be further analyzed to obtain more specific information about the subject\u27s visual strategies. A list of the calculated POR positions in the scene video can also be analyzed. The goals of this project are to expand the information that we can obtain from a portable video-based monocular eye tracker and to minimize the amount of user interaction required to obtain and analyze this information. This work includes offline processing of both the eye and scene videos to obtain robust 2D PORs in scene video frames, identify gaze fixations from these PORs, obtain 3D head motion and ray trace fixations through volumes-of-interest (VOIs) to determine what is being fixated, when and where (3D POR). To avoid the redundancy of ray tracing a 2D POR in every video frame and to group these POR data meaningfully, a fixation-identification algorithm is employed to simplify the long list of 2D POR data into gaze fixations. In order to ray trace these fixations, the 3D motion -- position and orientation over time -- of the scene camera is computed. This camera motion is determined via an iterative structure and motion recovery algorithm that requires a calibrated camera and knowledge of the 3D location of at least four points in the scene (that can be selected from premeasured VOI vertices). The subjects 3D head motion is obtained directly from this camera motion. For the final stage of the algorithm, the 3D locations and dimensions of VOIs in the scene are required. This VOI information in world coordinates is converted to camera coordinates for ray tracing. A representative 2D POR position for each fixation is converted from image coordinates to the same camera coordinate system. Then, a ray is traced from the camera center through this position to determine which (if any) VOI is being fixated and where it is being fixated -- the 3D POR in the world. Results are presented for various real scenes. Novel visualizations of portable eye tracker data created using the results of our algorithm are also presented
RÔivaste tekstureerimine kasutades Kinect V2.0
This thesis describes three new garment retexturing methods for FitsMe virtual fitting room applications
using data from Microsoft Kinect II RGB-D camera.
The first method, which is introduced, is an automatic technique for garment retexturing using
a single RGB-D image and infrared information obtained from Kinect II. First, the garment
is segmented out from the image using GrabCut or depth segmentation. Then texture domain
coordinates are computed for each pixel belonging to the garment using normalized 3D information.
Afterwards, shading is applied to the new colors from the texture image.
The second method proposed in this work is about 2D to 3D garment retexturing where a segmented
garment of a manikin or person is matched to a new source garment and retextured,
resulting in augmented images in which the new source garment is transferred to the manikin
or person. The problem is divided into garment boundary matching based on point set registration
which uses Gaussian mixture models and then interpolate inner points using surface
topology extracted through geodesic paths, which leads to a more realistic result than standard
approaches.
The final contribution of this thesis is by introducing another novel method which is used for
increasing the texture quality of a 3D model of a garment, by using the same Kinect frame
sequence which was used in the model creation. Firstly, a structured mesh must be created
from the 3D model, therefore the 3D model is wrapped to a base model with defined seams and
texture map. Afterwards frames are matched to the newly created model and by process of ray
casting the color values of the Kinect frames are mapped to the UV map of the 3D model
Computationally efficient deformable 3D object tracking with a monocular RGB camera
182 p.Monocular RGB cameras are present in most scopes and devices, including embedded environments like robots, cars and home automation. Most of these environments have in common a significant presence of human operators with whom the system has to interact. This context provides the motivation to use the captured monocular images to improve the understanding of the operator and the surrounding scene for more accurate results and applications.However, monocular images do not have depth information, which is a crucial element in understanding the 3D scene correctly. Estimating the three-dimensional information of an object in the scene using a single two-dimensional image is already a challenge. The challenge grows if the object is deformable (e.g., a human body or a human face) and there is a need to track its movements and interactions in the scene.Several methods attempt to solve this task, including modern regression methods based on Deep NeuralNetworks. However, despite the great results, most are computationally demanding and therefore unsuitable for several environments. Computational efficiency is a critical feature for computationally constrained setups like embedded or onboard systems present in robotics and automotive applications, among others.This study proposes computationally efficient methodologies to reconstruct and track three-dimensional deformable objects, such as human faces and human bodies, using a single monocular RGB camera. To model the deformability of faces and bodies, it considers two types of deformations: non-rigid deformations for face tracking, and rigid multi-body deformations for body pose tracking. Furthermore, it studies their performance on computationally restricted devices like smartphones and onboard systems used in the automotive industry. The information extracted from such devices gives valuable insight into human behaviour a crucial element in improving human-machine interaction.We tested the proposed approaches in different challenging application fields like onboard driver monitoring systems, human behaviour analysis from monocular videos, and human face tracking on embedded devices
Computationally efficient deformable 3D object tracking with a monocular RGB camera
182 p.Monocular RGB cameras are present in most scopes and devices, including embedded environments like robots, cars and home automation. Most of these environments have in common a significant presence of human operators with whom the system has to interact. This context provides the motivation to use the captured monocular images to improve the understanding of the operator and the surrounding scene for more accurate results and applications.However, monocular images do not have depth information, which is a crucial element in understanding the 3D scene correctly. Estimating the three-dimensional information of an object in the scene using a single two-dimensional image is already a challenge. The challenge grows if the object is deformable (e.g., a human body or a human face) and there is a need to track its movements and interactions in the scene.Several methods attempt to solve this task, including modern regression methods based on Deep NeuralNetworks. However, despite the great results, most are computationally demanding and therefore unsuitable for several environments. Computational efficiency is a critical feature for computationally constrained setups like embedded or onboard systems present in robotics and automotive applications, among others.This study proposes computationally efficient methodologies to reconstruct and track three-dimensional deformable objects, such as human faces and human bodies, using a single monocular RGB camera. To model the deformability of faces and bodies, it considers two types of deformations: non-rigid deformations for face tracking, and rigid multi-body deformations for body pose tracking. Furthermore, it studies their performance on computationally restricted devices like smartphones and onboard systems used in the automotive industry. The information extracted from such devices gives valuable insight into human behaviour a crucial element in improving human-machine interaction.We tested the proposed approaches in different challenging application fields like onboard driver monitoring systems, human behaviour analysis from monocular videos, and human face tracking on embedded devices
Methods for Real-time Visualization and Interaction with Landforms
This thesis presents methods to enrich data modeling and analysis in the geoscience domain with a particular focus on geomorphological applications. First, a short overview of the relevant characteristics of the used remote sensing data and basics of its processing and visualization are provided. Then, two new methods for the visualization of vector-based maps on digital elevation models (DEMs) are presented. The first method uses a texture-based approach that generates a texture from the input maps at runtime taking into account the current viewpoint. In contrast to that, the second method utilizes the stencil buffer to create a mask in image space that is then used to render the map on top of the DEM. A particular challenge in this context is posed by the view-dependent level-of-detail representation of the terrain geometry. After suitable visualization methods for vector-based maps have been investigated, two landform mapping tools for the interactive generation of such maps are presented. The user can carry out the mapping directly on the textured digital elevation model and thus benefit from the 3D visualization of the relief. Additionally, semi-automatic image segmentation techniques are applied in order to reduce the amount of user interaction required and thus make the mapping process more efficient and convenient. The challenge in the adaption of the methods lies in the transfer of the algorithms to the quadtree representation of the data and in the application of out-of-core and hierarchical methods to ensure interactive performance. Although high-resolution remote sensing data are often available today, their effective resolution at steep slopes is rather low due to the oblique acquisition angle. For this reason, remote sensing data are suitable to only a limited extent for visualization as well as landform mapping purposes. To provide an easy way to supply additional imagery, an algorithm for registering uncalibrated photos to a textured digital elevation model is presented. A particular challenge in registering the images is posed by large variations in the photos concerning resolution, lighting conditions, seasonal changes, etc. The registered photos can be used to increase the visual quality of the textured DEM, in particular at steep slopes. To this end, a method is presented that combines several georegistered photos to textures for the DEM. The difficulty in this compositing process is to create a consistent appearance and avoid visible seams between the photos. In addition to that, the photos also provide valuable means to improve landform mapping. To this end, an extension of the landform mapping methods is presented that allows the utilization of the registered photos during mapping. This way, a detailed and exact mapping becomes feasible even at steep slopes
Segmentation mutuelle d'objets d'intĂ©rĂȘt dans des sĂ©quences d'images stĂ©rĂ©o multispectrales
Les systĂšmes de vidĂ©osurveillance automatisĂ©s actuellement dĂ©ployĂ©s dans le monde sont encore bien loin de ceux qui sont reprĂ©sentĂ©s depuis des annĂ©es dans les oeuvres de sciencefiction. Une des raisons derriĂšre ce retard de dĂ©veloppement est le manque dâoutils de bas niveau permettant de traiter les donnĂ©es brutes captĂ©es sur le terrain. Le prĂ©-traitement de ces donnĂ©es sert Ă rĂ©duire la quantitĂ© dâinformation qui transige vers des serveurs centralisĂ©s, qui eux effectuent lâinterprĂ©tation complĂšte du contenu visuel captĂ©. Lâidentification dâobjets dâintĂ©rĂȘt dans les images brutes Ă partir de leur mouvement est un exemple de prĂ©-traitement qui peut ĂȘtre rĂ©alisĂ©. Toutefois, dans un contexte de vidĂ©osurveillance, une mĂ©thode de prĂ©-traitement ne peut gĂ©nĂ©ralement pas se fier Ă un modĂšle dâapparence ou de forme qui caractĂ©rise ces objets, car leur nature exacte nâest pas connue dâavance. Cela complique donc lâĂ©laboration des mĂ©thodes de traitement de bas niveau.
Dans cette thĂšse, nous prĂ©sentons diffĂ©rentes mĂ©thodes permettant de dĂ©tecter et de segmenter des objets dâintĂ©rĂȘt Ă partir de sĂ©quences vidĂ©o de maniĂšre complĂštement automatisĂ©e. Nous explorons dâabord les approches de segmentation vidĂ©o monoculaire par soustraction dâarriĂšre-plan. Ces approches se basent sur lâidĂ©e que lâarriĂšre-plan dâune scĂšne peut ĂȘtre modĂ©lisĂ© au fil du temps, et que toute variation importante dâapparence non prĂ©dite par le modĂšle dĂ©voile en fait la prĂ©sence dâun objet en intrusion. Le principal dĂ©fi devant ĂȘtre relevĂ© par ce type de mĂ©thode est que leur modĂšle dâarriĂšre-plan doit pouvoir sâadapter aux changements dynamiques des conditions dâobservation de la scĂšne. La mĂ©thode conçue doit aussi pouvoir rester sensible Ă lâapparition de nouveaux objets dâintĂ©rĂȘt, malgrĂ© cette robustesse accrue aux comportements dynamiques prĂ©visibles. Nous proposons deux mĂ©thodes introduisant diffĂ©rentes techniques de modĂ©lisation qui permettent de mieux caractĂ©riser lâapparence de lâarriĂšre-plan sans que le modĂšle soit affectĂ© par les changements dâillumination, et qui analysent la persistance locale de lâarriĂšre-plan afin de mieux dĂ©tecter les objets dâintĂ©rĂȘt temporairement immobilisĂ©s. Nous introduisons aussi de nouveaux mĂ©canismes de rĂ©troaction servant Ă ajuster les hyperparamĂštres de nos mĂ©thodes en fonction du dynamisme observĂ© de la scĂšne et de la qualitĂ© des rĂ©sultats produits.----------ABSTRACT: The automated video surveillance systems currently deployed around the world are still quite far in terms of capabilities from the ones that have inspired countless science fiction works over the past few years. One of the reasons behind this lag in development is the lack of lowlevel tools that allow raw image data to be processed directly in the field. This preprocessing is used to reduce the amount of information transferred to centralized servers that have to interpret the captured visual content for further use. The identification of objects of interest
in raw images based on motion is an example of a reprocessing step that might be required by a large system. However, in a surveillance context, the preprocessing method can seldom rely on an appearance or shape model to recognize these objects since their exact nature cannot be known exactly in advance. This complicates the elaboration of low-level image processing methods. In this thesis, we present different methods that detect and segment objects of interest from video sequences in a fully unsupervised fashion. We first explore monocular video segmentation
approaches based on background subtraction. These approaches are based on the idea that the background of an observed scene can be modeled over time, and that any drastic
variation in appearance that is not predicted by the model actually reveals the presence of an intruding object. The main challenge that must be met by background subtraction methods is that their model should be able to adapt to dynamic changes in scene conditions. The designed methods must also remain sensitive to the emergence of new objects of interest despite this increased robustness to predictable dynamic scene behaviors. We propose two methods that introduce different modeling techniques to improve background appearance description in an illumination-invariant way, and that analyze local background persistence to improve the detection of temporarily stationary objects. We also introduce new feedback mechanisms used to adjust the hyperparameters of our methods based on the observed dynamics of the scene and the quality of the generated output
Use of ERTS-1 data: Summary report of work on ten tasks
The author has identified the following significant results. Depth mapping's for a portion of Lake Michigan and at the Little Bahama Bank test site have been verified by use of navigation charts and on-site visits. A thirteen category recognition map of Yellowstone Park has been prepared. Model calculation of atmospheric effects for various altitudes have been prepared. Radar, SLAR, and ERTS-1 data for flooded areas of Monroe County, Michigan are being studied. Water bodies can be reliably recognized and mapped using maximum likelihood processing of ERTS-1 digital data. Wetland mapping has been accomplished by slicing of single band and/or ratio processing of two bands for a single observation date. Both analog and digital processing have been used to map the Lake Ontario basin using ERTS-1 data. Operating characteristic curves were developed for the proportion estimation algorithm to determine its performance in the measurement of surface water area. The signal in band MSS-5 was related to sediment content of waters by modelling approach and by relating surface measurements of water to processed ERTS data. Radiance anomalies in ERTS-1 data could be associated with the presence of oil on water in San Francisco Bay, but the anomalies were of the same order as those caused by variations in sediment concentration and tidal flushing
- âŠ