23 research outputs found
Depth-Assisted Semantic Segmentation, Image Enhancement and Parametric Modeling
This dissertation addresses the problem of employing 3D depth information on solving a number of traditional challenging computer vision/graphics problems. Humans have the abilities of perceiving the depth information in 3D world, which enable humans to reconstruct layouts, recognize objects and understand the geometric space and semantic meanings of the visual world. Therefore it is significant to explore how the 3D depth information can be utilized by computer vision systems to mimic such abilities of humans. This dissertation aims at employing 3D depth information to solve vision/graphics problems in the following aspects: scene understanding, image enhancements and 3D reconstruction and modeling.
In addressing scene understanding problem, we present a framework for semantic segmentation and object recognition on urban video sequence only using dense depth maps recovered from the video. Five view-independent 3D features that vary with object class are extracted from dense depth maps and used for segmenting and recognizing different object classes in street scene images. We demonstrate a scene parsing algorithm that uses only dense 3D depth information to outperform using sparse 3D or 2D appearance features.
In addressing image enhancement problem, we present a framework to overcome the imperfections of personal photographs of tourist sites using the rich information provided by large-scale internet photo collections (IPCs). By augmenting personal 2D images with 3D information reconstructed from IPCs, we address a number of traditionally challenging image enhancement techniques and achieve high-quality results using simple and robust algorithms.
In addressing 3D reconstruction and modeling problem, we focus on parametric modeling of flower petals, the most distinctive part of a plant. The complex structure, severe occlusions and wide variations make the reconstruction of their 3D models a challenging task. We overcome these challenges by combining data driven modeling techniques with domain knowledge from botany. Taking a 3D point cloud of an input flower scanned from a single view, each segmented petal is fitted with a scale-invariant morphable petal shape model, which is constructed from individually scanned 3D exemplar petals. Novel constraints based on botany studies are incorporated into the fitting process for realistically reconstructing occluded regions and maintaining correct 3D spatial relations.
The main contribution of the dissertation is in the intelligent usage of 3D depth information on solving traditional challenging vision/graphics problems. By developing some advanced algorithms either automatically or with minimum user interaction, the goal of this dissertation is to demonstrate that computed 3D depth behind the multiple images contains rich information of the visual world and therefore can be intelligently utilized to recognize/ understand semantic meanings of scenes, efficiently enhance and augment single 2D images, and reconstruct high-quality 3D models
From Image-based Motion Analysis to Free-Viewpoint Video
The problems of capturing real-world scenes with cameras and automatically analyzing the visible motion have traditionally been in the focus of computer vision research. The photo-realistic rendition of dynamic real-world scenes, on the other hand, is a problem that has been investigated in the field of computer graphics. In this thesis, we demonstrate that the joint solution to all three of these problems enables the creation of powerful new tools that are benecial for both research disciplines. Analysis and rendition of real-world scenes with human actors are amongst the most challenging problems. In this thesis we present new algorithmic recipes to attack them. The dissertation consists of three parts: In part I, we present novel solutions to two fundamental problems of human motion analysis. Firstly, we demonstrate a novel hybrid approach for markerfree human motion capture from multiple video streams. Thereafter, a new algorithm for automatic non-intrusive estimation of kinematic body models of arbitrary moving subjects from video is detailed. In part II of the thesis, we demonstrate that a marker-free motion capture approach makes possible the model-based reconstruction of free-viewpoint videos of human actors from only a handful of video streams. The estimated 3D videos enable the photo-realistic real-time rendition of a dynamic scene from arbitrary novel viewpoints. Texture information from video is not only applied to generate a realistic surface appearance, but also to improve the precision of the motion estimation scheme. The commitment to a generic body model also allows us to reconstruct a time-varying reflectance description of an actor`s body surface which allows us to realistically render the free-viewpoint videos under arbitrary lighting conditions. A novel method to capture high-speed large scale motion using regular still cameras and the principle of multi-exposure photography is described in part III. The fundamental principles underlying the methods in this thesis are not only applicable to humans but to a much larger class of subjects. It is demonstrated that, in conjunction, our proposed algorithmic recipes serve as building blocks for the next generation of immersive 3D visual media.Die Entwicklung neuer Algorithmen zur optischen Erfassung und Analyse der
Bewegung in dynamischen Szenen ist einer der Forschungsschwerpunkte in der
computergestützten Bildverarbeitung. Während im maschinellen Bildverstehen
das Augenmerk auf der Extraktion von Informationen liegt, konzentriert sich die
Computergrafik auf das inverse Problem, die fotorealistische Darstellung bewegter Szenen. In jüngster Vergangenheit haben sich die beiden Disziplinen kontinuierlich angenähert, da es eine Vielzahl an herausfordernden wissenschaftlichen Fragestellungen gibt, die eine gemeinsame Lösung des Bilderfassungs-, des Bildanalyse- und des Bildsyntheseproblems verlangen.
Zwei der schwierigsten Probleme, welche für Forscher aus beiden Disziplinen
eine große Relevanz besitzen, sind die Analyse und die Synthese von dynamischen
Szenen, in denen Menschen im Mittelpunkt stehen. Im Rahmen dieser
Dissertation werden Verfahren vorgestellt, welche die optische Erfassung dieser
Art von Szenen, die automatische Analyse der Bewegungen und die realistische
neue Darstellung im Computer erlauben. Es wid deutlich werden, dass eine Integration
von Algorithmen zur Lösung dieser drei Probleme in ein Gesamtsystem
die Erzeugung völlig neuartiger dreidimensionaler Darstellungen von Menschen
in Bewegung ermöglicht. Die Dissertation ist in drei Teile gegliedert:
Teil I beginnt mit der Beschreibung des Entwurfs und des Baus eines Studios
zur zeitsynchronen Erfassung mehrerer Videobildströme. Die im Studio aufgezeichneten
Multivideosequenzen dienen als Eingabedaten für die im Rahmen
dieser Dissertation entwickelten videogestützten Bewegunsanalyseverfahren und
die Algorithmen zur Erzeugung dreidimensionaler Videos.
Im Anschluß daran werden zwei neu entwickelte Verfahren vorgestellt,
die Antworten auf zwei fundamentale Fragen in der optischen Erfassung
menschlicher Bewegung geben, die Messung von Bewegungsparametern und
die Erzeugung von kinematischen Skelettmodellen. Das erste Verfahren ist ein
hybrider Algorithmus zur markierungslosen optischen Messung von Bewegunsgparametern
aus Multivideodaten. Der Verzicht auf optische Markierungen
wird dadurch ermöglicht, dass zur Bewegungsanalyse sowohl aus den Bilddaten
rekonstruierte Volumenmodelle als auch leicht zu erfassende Körpermerkmale
verwendet werden. Das zweite Verfahren dient der automatischen Rekonstruktion
eines kinematischen Skelettmodells anhand von Multivideodaten. Der Algorithmus
benötigt weder optischen Markierungen in der Szene noch a priori
Informationen über die Körperstruktur, und ist in gleicher Form auf Menschen,
Tiere und Objekte anwendbar.
Das Thema das zweiten Teils dieser Arbeit ist ein modellbasiertes Verfahrenzur Rekonstruktion dreidimensionaler Videos von Menschen in Bewegung aus
nur wenigen zeitsynchronen Videoströmen. Der Betrachter kann die errechneten
3D Videos auf einem Computer in Echtzeit abspielen und dabei interaktiv
einen beliebigen virtuellen Blickpunkt auf die Geschehnisse einnehmen. Im
Zentrum unseres Ansatzes steht ein silhouettenbasierter Analyse-durch-Synthese
Algorithmus, der es ermöglicht, ohne optische Markierungen sowohl die Form
als auch die Bewegung eines Menschen zu erfassen. Durch die Berechnung
zeitveränderlicher Oberächentexturen aus den Videodaten ist gewährleistet,
dass eine Person aus jedem beliebigen Blickwinkel ein fotorealistisches Erscheinungsbild
besitzt. In einer ersten algorithmischen Erweiterung wird gezeigt, dass
die Texturinformation auch zur Verbesserung der Genauigkeit der Bewegunsgssch
ätzung eingesetzt werden kann. Zudem ist es durch die Verwendung eines
generischen Körpermodells möglich, nicht nur dynamische Texturen sondern
sogar dynamische Reektionseigenschaften der Körperoberäche zu messen.
Unser Reektionsmodell besteht aus einer parametrischen BRDF für jeden Texel
und einer dynamischen Normalenkarte für die gesamte Körperoberäche. Auf
diese Weise können 3D Videos auch unter völlig neuen simulierten Beleuchtungsbedingungen
realistisch wiedergegeben werden.
Teil III dieser Arbeit beschreibt ein neuartiges Verfahren zur optischen
Messung sehr schneller Bewegungen. Bisher erforderten optische Aufnahmen
von Hochgeschwindigkeitsbewegungen sehr teure Spezialkameras mit hohen
Bildraten. Im Gegensatz dazu verwendet die hier beschriebene Methode einfache
Digitalfotokameras und das Prinzip der Multiblitzfotograe. Es wird gezeigt, dass
mit Hilfe dieses Verfahrens sowohl die sehr schnelle artikulierte Handbewegung
des Werfers als auch die Flugparameter des Balls während eines Baseballpitches
gemessen werden können. Die hochgenau erfaßten Parameter ermöglichen es, die
gemessene Bewegung in völlig neuer Weise im Computer zu visualisieren.
Obgleich die in dieser Dissertation vorgestellten Verfahren vornehmlich der
Analyse und Darstellung menschlicher Bewegungen dienen, sind die grundlegenden
Prinzipien auch auf viele anderen Szenen anwendbar. Jeder der beschriebenen
Algorithmen löst zwar in erster Linie ein bestimmtes Teilproblem, aber in Ihrer
Gesamtheit können die Verfahren als Bausteine verstanden werden, welche die
nächste Generation interaktiver dreidimensionaler Medien ermöglichen werden
Constrained camera motion estimation and 3D reconstruction
The creation of virtual content from visual data is a tedious task which requires a high amount of skill and expertise. Although the majority of consumers is in possession of multiple imaging devices that would enable them to perform this task in principle, the processing techniques and tools are still intended for the use by trained experts. As more and more capable hardware becomes available, there is a growing need among consumers and professionals alike for new flexible and reliable tools that reduce the amount of time and effort required to create high-quality content.
This thesis describes advances of the state of the art in three areas of computer vision: camera motion estimation, probabilistic 3D reconstruction, and template fitting.
First, a new camera model geared towards stereoscopic input data is introduced, which is subsequently developed into a generalized framework for constrained camera motion estimation. A probabilistic reconstruction method for 3D line segments is then described, which takes global connectivity constraints into account. Finally, a new framework for symmetry-aware template fitting is presented, which allows the creation of high-quality models from low-quality input 3D scans.
Evaluations with a broad range of challenging synthetic and real-world data sets demonstrate that the new constrained camera motion estimation methods provide improved accuracy and flexibility, and that the new constrained 3D reconstruction methods improve the current state of the art.Die Erzeugung virtueller Inhalte aus visuellem Datenmaterial ist langwierig und erfordert viel Geschick und Sachkenntnis. Obwohl der Großteil der Konsumenten mehrere Bildgebungsgeräte besitzt, die es ihm im Prinzip erlauben würden, dies durchzuführen, sind die Techniken und Werkzeuge noch immer für den Einsatz durch ausgebildete Fachleute gedacht. Da immer leistungsfähigere Hardware zur Verfügung steht, gibt es sowohl bei Konsumenten als auch bei Fachleuten eine wachsende Nachfrage nach neuen flexiblen und verlässlichen Werkzeugen, die die Erzeugung von qualitativ hochwertigen Inhalten vereinfachen.
In der vorliegenden Arbeit werden Erweiterungen des Stands der Technik in den folgenden drei Bereichen der Bildverarbeitung beschrieben: Kamerabewegungsschätzung, wahrscheinlichkeitstheoretische 3D-Rekonstruktion und Template-Fitting.
Zuerst wird ein neues Kameramodell vorgestellt, das für die Verarbeitung von stereoskopischen Eingabedaten ausgelegt ist. Dieses Modell wird in der Folge in eine generalisierte Methode zur Kamerabewegungsschätzung unter Nebenbedingungen erweitert. Anschließend wird ein wahrscheinlichkeitstheoretisches Verfahren zur Rekonstruktion von 3D-Liniensegmenten beschrieben, das globale Verbindungen als Nebenbedingungen berücksichtigt. Schließlich wird eine neue Methode zum Fitting eines Template-Modells präsentiert, bei der die Berücksichtigung der Symmetriestruktur des Templates die Erzeugung von Modellen hoher Qualität aus 3D-Eingabedaten niedriger Qualität erlaubt.
Evaluierungen mit einem breiten Spektrum an anspruchsvollen synthetischen und realen Datensätzen zeigen, dass die neuen Methoden zur Kamerabewegungsschätzung unter Nebenbedingungen höhere Genauigkeit und mehr Flexibilität ermöglichen, und dass die neuen Methoden zur 3D-Rekonstruktion unter Nebenbedingungen den Stand der Technik erweitern
Spatial integration in computer-augmented realities
In contrast to virtual reality, which immerses the user in a wholly computergenerated perceptual environment, augmented reality systems superimpose virtual entities on the user's view of the real world. This concept promises to fulfil new applications in a wide range of fields, but there are some challenging issues to be resolved. One issue relates to achieving accurate registration of virtual and real worlds. Accurate spatial registration is not only required with respect to lateral positioning, but also in depth. A limiting problem with existing optical-see-through displays, typically used for augmenting reality, is that they are incapable of displaying a full range of depth cues. Most significantly, they are unable to occlude real background and hence cannot produce interposition depth cueing. Neither are they able to modify the real-world view in the ways required to produce convincing common illumination effects such as virtual shadows across real surfaces. Also, at present, there are no wholly satisfactory ways of determining suitable common illumination models with which to determine the real-virtual light interactions necessary for producing such depth cues. This thesis establishes that interpositioning is essential for appropriate estimation of depth in augmented realities, and that the presence of shadows provides an important refining cue. It also extends the concept of a transparency alpha-channel to allow optical-see-through systems to display appropriate depth cues. The generalised theory of the approach is described mathematically and algorithms developed to automate generation of display-surface images. Three practical physical display strategies are presented; using a transmissive mask, selective lighting using digital projection, and selective reflection using digital micromirror devices. With respect to obtaining a common illumination model, all current approaches require either . prior knowledge of the light sources illuminating the real scene, or involve inserting some kind of probe into the scene with which to determine real light source position, shape, and intensity. This thesis presents an alternative approach that infers a plausible illumination from a limited view of the scene.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Remote Visual Observation of Real Places Through Virtual Reality Headsets
Virtual Reality has always represented a fascinating yet powerful opportunity that has attracted studies and technology developments, especially since the latest release on the market of powerful high-resolution and wide field-of-view VR headsets. While the great potential of such VR systems is common and accepted knowledge, issues remain related to how to design systems and setups capable of fully exploiting the latest hardware advances.
The aim of the proposed research is to study and understand how to increase the perceived level of realism and sense of presence when remotely observing real places through VR headset displays. Hence, to produce a set of guidelines that give directions to system designers about how to optimize the display-camera setup to enhance performance, focusing on remote visual observation of real places. The outcome of this investigation represents unique knowledge that is believed to be very beneficial for better VR headset designs towards improved remote observation systems.
To achieve the proposed goal, this thesis presents a thorough investigation of existing literature and previous researches, which is carried out systematically to identify the most important factors ruling realism, depth perception, comfort, and sense of presence in VR headset observation. Once identified, these factors are further discussed and assessed through a series of experiments and usability studies, based on a predefined set of research questions.
More specifically, the role of familiarity with the observed place, the role of the environment characteristics shown to the viewer, and the role of the display used for the remote observation of the virtual environment are further investigated. To gain more insights, two usability studies are proposed with the aim of defining guidelines and best practices.
The main outcomes from the two studies demonstrate that test users can experience an enhanced realistic observation when natural features, higher resolution displays, natural illumination, and high image contrast are used in Mobile VR. In terms of comfort, simple scene layouts and relaxing environments are considered ideal to reduce visual fatigue and eye strain. Furthermore, sense of presence increases when observed environments induce strong emotions, and depth perception improves in VR when several monocular cues such as lights and shadows are combined with binocular depth cues.
Based on these results, this investigation then presents a focused evaluation on the outcomes and introduces an innovative eye-adapted High Dynamic Range (HDR) approach, which the author believes to be of great improvement in the context of remote observation when combined with eye-tracked VR headsets. Within this purpose, a third user study is proposed to compare static HDR and eye-adapted HDR observation in VR, to assess that the latter can improve realism, depth perception, sense of presence, and in certain cases even comfort. Results from this last study confirmed the author expectations, proving that eye-adapted HDR and eye tracking should be used to achieve best visual performances for remote observation in modern VR systems
Gaze-Based Human-Robot Interaction by the Brunswick Model
We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered
Superpixel lattices
Superpixels are small image segments that are used in popular approaches to object
detection and recognition problems. The superpixel approach is motivated by the observation
that pixels within small image segments can usually be attributed the same
label. This allows a superpixel representation to produce discriminative features based
on data dependent regions of support. The reduced set of image primitives produced
by superpixels can also be exploited to improve the efficiency of subsequent processing
steps. However, it is common for the superpixel representation to have a different graph
structure from the original pixel representation of the image.
The first part of the thesis argues that a number of desirable properties of the
pixel representation should be maintained by superpixels and that this is not possible
with existing methods. We propose a new representation, the superpixel lattice, and
demonstrate its advantages.
The second part of the thesis investigates incorporating a priori information into
superpixel segmentations. We learn a probabilistic model that describes the spatial
density of object boundaries in the image. We demonstrate our approach using road
scene data and show that our algorithm successfully exploits the spatial distribution of
object boundaries to improve the superpixel segmentation.
The third part of the thesis presents a globally optimal solution to our superpixel
lattice problem in either the horizontal or vertical direction. The solution makes use of
a Markov Random Field formulation where the label field is guaranteed to be a set of
ordered layers. We introduce an iterative algorithm that uses this framework to learn
colour distributions across an image in an unsupervised manner.
We conclude that our approach achieves comparable or better performance than
competing methods and that it confers several additional advantages
Virtual Reality Games for Motor Rehabilitation
This paper presents a fuzzy logic based method to track user satisfaction without the need for devices to monitor users physiological conditions. User satisfaction is the key to any product’s acceptance; computer applications and video games provide a unique opportunity to provide a tailored environment for each user to better suit their needs. We have implemented a non-adaptive fuzzy logic model of emotion, based on the emotional component of the Fuzzy Logic Adaptive Model of Emotion (FLAME) proposed by El-Nasr, to estimate player emotion in UnrealTournament 2004. In this paper we describe the implementation of this system and present the results of one of several play tests. Our research contradicts the current literature that suggests physiological measurements are needed. We show that it is possible to use a software only method to estimate user emotion
Recommended from our members
Near real-time monitoring of buried oil pipeline right-of-way for third-party incursion
Many security systems employing different methods have been proposed to protect buried oil pipelines transporting petroleum products from the well head via the refinery to: depots and other receiving stations. Currently there is a security gap in the monitoring of these buried pipelines in real time and in keeping them protected from third party interference. This thesis addresses the problem of monitoring these systems by developing an automated image analysis system with the aid of a low-cost multisensory Unmanned Aerial Vehicle (UAV) for monitoring of buried pipeline right-of-way (ROW). The method used in this research is based on the identification of threat objects of interest from the video frame sequences of the pipeline right-of-way acquired by the UAV. This is achieved by training the system to recognise objects of interest using trained correlation filters. To determine the geographical location of detected objects, the Video frame sequences captured by the UAV platform were ortho-rectified to form ortho-images which were then mosaicked to form a seamless Digital Surface Model (DSM) covering the test area using a photogrammetry model. The DSM formed from the mosaicking of ortho-images is then emerged with a digital globe for geo-referencing of detected objects. Experiments were carried out on a test field located in United Kingdom and Nigeria, where video and telemetry data were collected, then processed using the techniques created in this research. The results demonstrated that the developed correlation filter was able to detect objects of interest despite the distortions that come with the object image, due to the fact that the expected distortion was compensated for using the training images. When compared with the 6 control points in the digital globe the accuracy of the two-dimension DSM gave a misalignment error of between 2 and 3 metres