216 research outputs found
Constrained camera motion estimation and 3D reconstruction
The creation of virtual content from visual data is a tedious task which requires a high amount of skill and expertise. Although the majority of consumers is in possession of multiple imaging devices that would enable them to perform this task in principle, the processing techniques and tools are still intended for the use by trained experts. As more and more capable hardware becomes available, there is a growing need among consumers and professionals alike for new flexible and reliable tools that reduce the amount of time and effort required to create high-quality content.
This thesis describes advances of the state of the art in three areas of computer vision: camera motion estimation, probabilistic 3D reconstruction, and template fitting.
First, a new camera model geared towards stereoscopic input data is introduced, which is subsequently developed into a generalized framework for constrained camera motion estimation. A probabilistic reconstruction method for 3D line segments is then described, which takes global connectivity constraints into account. Finally, a new framework for symmetry-aware template fitting is presented, which allows the creation of high-quality models from low-quality input 3D scans.
Evaluations with a broad range of challenging synthetic and real-world data sets demonstrate that the new constrained camera motion estimation methods provide improved accuracy and flexibility, and that the new constrained 3D reconstruction methods improve the current state of the art.Die Erzeugung virtueller Inhalte aus visuellem Datenmaterial ist langwierig und erfordert viel Geschick und Sachkenntnis. Obwohl der GroĂteil der Konsumenten mehrere BildgebungsgerĂ€te besitzt, die es ihm im Prinzip erlauben wĂŒrden, dies durchzufĂŒhren, sind die Techniken und Werkzeuge noch immer fĂŒr den Einsatz durch ausgebildete Fachleute gedacht. Da immer leistungsfĂ€higere Hardware zur VerfĂŒgung steht, gibt es sowohl bei Konsumenten als auch bei Fachleuten eine wachsende Nachfrage nach neuen flexiblen und verlĂ€sslichen Werkzeugen, die die Erzeugung von qualitativ hochwertigen Inhalten vereinfachen.
In der vorliegenden Arbeit werden Erweiterungen des Stands der Technik in den folgenden drei Bereichen der Bildverarbeitung beschrieben: KamerabewegungsschÀtzung, wahrscheinlichkeitstheoretische 3D-Rekonstruktion und Template-Fitting.
Zuerst wird ein neues Kameramodell vorgestellt, das fĂŒr die Verarbeitung von stereoskopischen Eingabedaten ausgelegt ist. Dieses Modell wird in der Folge in eine generalisierte Methode zur KamerabewegungsschĂ€tzung unter Nebenbedingungen erweitert. AnschlieĂend wird ein wahrscheinlichkeitstheoretisches Verfahren zur Rekonstruktion von 3D-Liniensegmenten beschrieben, das globale Verbindungen als Nebenbedingungen berĂŒcksichtigt. SchlieĂlich wird eine neue Methode zum Fitting eines Template-Modells prĂ€sentiert, bei der die BerĂŒcksichtigung der Symmetriestruktur des Templates die Erzeugung von Modellen hoher QualitĂ€t aus 3D-Eingabedaten niedriger QualitĂ€t erlaubt.
Evaluierungen mit einem breiten Spektrum an anspruchsvollen synthetischen und realen DatensÀtzen zeigen, dass die neuen Methoden zur KamerabewegungsschÀtzung unter Nebenbedingungen höhere Genauigkeit und mehr FlexibilitÀt ermöglichen, und dass die neuen Methoden zur 3D-Rekonstruktion unter Nebenbedingungen den Stand der Technik erweitern
Unobtrusive and pervasive video-based eye-gaze tracking
Eye-gaze tracking has long been considered a desktop technology that finds its use inside the traditional office setting, where the operating conditions may be controlled. Nonetheless, recent advancements in mobile technology and a growing interest in capturing natural human behaviour have motivated an emerging interest in tracking eye movements within unconstrained real-life conditions, referred to as pervasive eye-gaze tracking. This critical review focuses on emerging passive and unobtrusive video-based eye-gaze tracking methods in recent literature, with the aim to identify different research avenues that are being followed in response to the challenges of pervasive eye-gaze tracking. Different eye-gaze tracking approaches are discussed in order to bring out their strengths and weaknesses, and to identify any limitations, within the context of pervasive eye-gaze tracking, that have yet to be considered by the computer vision community.peer-reviewe
3D Face Reconstruction: the Road to Forensics
3D face reconstruction algorithms from images and videos are applied to many fields, from plastic surgery to the entertainment sector, thanks to their advantageous features. However, when looking at forensic applications, 3D face reconstruction must observe strict requirements that still make its possible role in bringing evidence to a lawsuit unclear. An extensive investigation of the constraints, potential, and limits of its application in forensics is still missing. Shedding some light on this matter is the goal of the present survey, which starts by clarifying the relation between forensic applications and biometrics, with a focus on face recognition. Therefore, it provides an analysis of the achievements of 3D face reconstruction algorithms from surveillance videos and mugshot images and discusses the current obstacles that separate 3D face reconstruction from an active role in forensic applications. Finally, it examines the underlying data sets, with their advantages and limitations, while proposing alternatives that could substitute or complement them
Online learning and fusion of orientation appearance models for robust rigid object tracking
We introduce a robust framework for learning and fusing of orientation appearance models based on both texture and depth information for rigid object tracking. Our framework fuses data obtained from a standard visual camera and dense depth maps obtained by low-cost consumer depth cameras such as the Kinect. To combine these two completely different modalities, we propose to use features that do not depend on the data representation: angles. More specifically, our framework combines image gradient orientations as extracted from intensity images with the directions of surface normals computed from dense depth fields. We propose to capture the correlations between the obtained orientation appearance models using a fusion approach motivated by the original Active Appearance Models (AAMs). To incorporate these features in a learning framework, we use a robust kernel based on the Euler representation of angles which does not require off-line training, and can be efficiently implemented online. The robustness of learning from orientation appearance models is presented both theoretically and experimentally in this work. This kernel enables us to cope with gross measurement errors, missing data as well as other typical problems such as illumination changes and occlusions. By combining the proposed models with a particle filter, the proposed framework was used for performing 2D plus 3D rigid object tracking, achieving robust performance in very difficult tracking scenarios including extreme pose variations. © 2014 Elsevier B.V. All rights reserved
Coping with Data Scarcity in Deep Learning and Applications for Social Good
The recent years are experiencing an extremely fast evolution of the Computer Vision and
Machine Learning fields: several application domains benefit from the newly developed
technologies and industries are investing a growing amount of money in Artificial Intelligence.
Convolutional Neural Networks and Deep Learning substantially contributed to the rise and
the diffusion of AI-based solutions, creating the potential for many disruptive new businesses.
The effectiveness of Deep Learning models is grounded by the availability of a huge
amount of training data. Unfortunately, data collection and labeling is an extremely expensive
task in terms of both time and costs; moreover, it frequently requires the collaboration of
domain experts.
In the first part of the thesis, I will investigate some methods for reducing the cost
of data acquisition for Deep Learning applications in the relatively constrained industrial
scenarios related to visual inspection. I will primarily assess the effectiveness of Deep Neural
Networks in comparison with several classical Machine Learning algorithms requiring a
smaller amount of data to be trained. Hereafter, I will introduce a hardware-based data
augmentation approach, which leads to a considerable performance boost taking advantage of
a novel illumination setup designed for this purpose. Finally, I will investigate the situation in
which acquiring a sufficient number of training samples is not possible, in particular the most
extreme situation: zero-shot learning (ZSL), which is the problem of multi-class classification
when no training data is available for some of the classes. Visual features designed for image
classification and trained offline have been shown to be useful for ZSL to generalize towards
classes not seen during training. Nevertheless, I will show that recognition performances
on unseen classes can be sharply improved by learning ad hoc semantic embedding (the
pre-defined list of present and absent attributes that represent a class) and visual features, to
increase the correlation between the two geometrical spaces and ease the metric learning
process for ZSL.
In the second part of the thesis, I will present some successful applications of state-of-the-
art Computer Vision, Data Analysis and Artificial Intelligence methods. I will illustrate
some solutions developed during the 2020 Coronavirus Pandemic for controlling the disease
vii
evolution and for reducing virus spreading. I will describe the first publicly available
dataset for the analysis of face-touching behavior that we annotated and distributed, and
I will illustrate an extensive evaluation of several computer vision methods applied to the
produced dataset. Moreover, I will describe the privacy-preserving solution we developed
for estimating the \u201cSocial Distance\u201d and its violations, given a single uncalibrated image
in unconstrained scenarios. I will conclude the thesis with a Computer Vision solution
developed in collaboration with the Egyptian Museum of Turin for digitally unwrapping
mummies analyzing their CT scan, to support the archaeologists during mummy analysis
and avoiding the devastating and irreversible process of physically unwrapping the bandages
for removing amulets and jewels from the body
Accelerated volumetric reconstruction from uncalibrated camera views
While both work with images, computer graphics and computer vision are inverse problems. Computer graphics starts traditionally with input geometric models and produces image sequences. Computer vision starts with input image sequences and produces geometric models. In the last few years, there has been a convergence of research to bridge the gap between the two fields.
This convergence has produced a new field called Image-based Rendering and Modeling (IBMR). IBMR represents the effort of using the geometric information recovered from real images to generate new images with the hope that the synthesized
ones appear photorealistic, as well as reducing the time spent on model creation.
In this dissertation, the capturing, geometric and photometric aspects of an IBMR system are studied. A versatile framework was developed that enables the reconstruction of scenes from images acquired with a handheld digital camera. The proposed system targets applications in areas such as Computer Gaming and Virtual Reality, from a lowcost perspective. In the spirit of IBMR, the human operator is allowed to provide the high-level information, while underlying algorithms are used to perform low-level computational work. Conforming to the latest architecture trends, we propose a streaming voxel carving method, allowing a fast GPU-based processing on commodity hardware
Shape from Shading through Shape Evolution
In this paper, we address the shape-from-shading problem by training deep
networks with synthetic images. Unlike conventional approaches that combine
deep learning and synthetic imagery, we propose an approach that does not need
any external shape dataset to render synthetic images. Our approach consists of
two synergistic processes: the evolution of complex shapes from simple
primitives, and the training of a deep network for shape-from-shading. The
evolution generates better shapes guided by the network training, while the
training improves by using the evolved shapes. We show that our approach
achieves state-of-the-art performance on a shape-from-shading benchmark
Die Virtuelle Videokamera: ein System zur Blickpunktsynthese in beliebigen, dynamischen Szenen
The Virtual Video Camera project strives to create free viewpoint video from casually captured multi-view data. Multiple video streams of a dynamic scene are captured with off-the-shelf camcorders, and the user can re-render the scene from novel perspectives. In this thesis the algorithmic core of the Virtual Video Camera is presented. This includes the algorithm
for image correspondence estimation as well as the image-based renderer. Furthermore, its application in the context of an actual video production is showcased, and the rendering and image processing pipeline is extended to incorporate depth information.Das Virtual Video Camera Projekt dient der Erzeugung von Free Viewpoint Video Ansichten von Multi-View Aufnahmen: Material mehrerer Videoströme wird hierzu mit handelsĂŒblichen Camcordern aufgezeichnet. Im Anschluss kann die Szene aus beliebigen, von den ursprĂŒnglichen Kameras nicht abgedeckten Blickwinkeln betrachtet werden. In dieser Dissertation
wird der algorithmische Kern der Virtual Video Camera vorgestellt. Dies beinhaltet das Verfahren zur BildkorrespondenzschĂ€tzung sowie den bildbasierten Renderer. DarĂŒber hinaus wird die Anwendung im Kontext einer Videoproduktion beleuchtet. Dazu wird die bildbasierte Erzeugung neuer Blickpunkte um die Erzeugung und Einbindung von Tiefeninformationen
erweitert
- âŠ