Search CORE

37 research outputs found

Human Pose Estimation with Supervoxels

Author: Schick Alexander
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

This thesis investigates how segmentation as a preprocessing step can reduce both the search space as well as complexity of human pose estimation in the context of smart environments. A 3D reconstruction is computed with a voxel carving algorithm. Based on a superpixel algorithm, these voxels are segmented into supervoxels that are then applied to pictorial structures in 3D to efficiently estimate the human pose. Both static and dynamic gesture recognition applications were developed

KITopen

From light rays to 3D models

Author: Donné Simon
Publication venue: Faculty of Engineering and Architecture Ghent University
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

RigidFusion: RGB-D Scene Reconstruction with Rigidly-movie Objects

Author: Li C
Mitra NJ
Niessner M
Wong Y-S
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 01/05/2021
Field of study

Although surface reconstruction from depth data has made significant advances in the recent years, handling changing environments remains a major challenge. This is unsatisfactory, as humans regularly move objects in their environments. Existing solutions focus on a restricted set of objects (e.g., those detected by semantic classifiers) possibly with template meshes, assume static camera, or mark objects touched by humans as moving. We remove these assumptions by introducing RigidFusion. Our core idea is a novel asynchronous moving-object detection method, combined with a modified volumetric fusion. This is achieved by a model-to-frame TSDF decomposition leveraging free-space carving of tracked depth values of the current frame with respect to the background model during run-time. As output, we produce separate volumetric reconstructions for the background and each moving object in the scene, along with its trajectory over time. Our method does not rely on the object priors (e.g., semantic labels or pre-scanned meshes) and is insensitive to the motion residuals between objects and the camera. In comparison to state-of-the-art methods (e.g., Co-Fusion, MaskFusion), we handle significantly more challenging reconstruction scenarios involving moving camera and improve moving-object detection (26% on the miss-detection ratio), tracking (27% on MOTA), and reconstruction (3% on the reconstruction F1) on the synthetic dataset. Please refer the supplementary and the project website for the video demonstration (geometry.cs.ucl.ac.uk/projects/2021/rigidfusion)

UCL Discovery

Combining Features and Semantics for Low-level Computer Vision

Author: Güney Fatma
Publication venue: Universität Tübingen
Publication date: 01/01/2018
Field of study

Visual perception of depth and motion plays a significant role in understanding and navigating the environment. Reconstructing outdoor scenes in 3D and estimating the motion from video cameras are of utmost importance for applications like autonomous driving. The corresponding problems in computer vision have witnessed tremendous progress over the last decades, yet some aspects still remain challenging today. Striking examples are reflecting and textureless surfaces or large motions which cannot be easily recovered using traditional local methods. Further challenges include occlusions, large distortions and difficult lighting conditions. In this thesis, we propose to overcome these challenges by modeling non-local interactions leveraging semantics and contextual information. Firstly, for binocular stereo estimation, we propose to regularize over larger areas on the image using object-category specific disparity proposals which we sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image. The disparity proposals encode the fact that objects of certain categories are not arbitrarily shaped but typically exhibit regular structures. We integrate them as non-local regularizer for the challenging object class 'car' into a superpixel-based graphical model and demonstrate its benefits especially in reflective regions. Secondly, for 3D reconstruction, we leverage the fact that the larger the reconstructed area, the more likely objects of similar type and shape will occur in the scene. This is particularly true for outdoor scenes where buildings and vehicles often suffer from missing texture or reflections, but share similarity in 3D shape. We take advantage of this shape similarity by localizing objects using detectors and jointly reconstructing them while learning a volumetric model of their shape. This allows to reduce noise while completing missing surfaces as objects of similar shape benefit from all observations for the respective category. Evaluations with respect to LIDAR ground-truth on a novel challenging suburban dataset show the advantages of modeling structural dependencies between objects. Finally, motivated by the success of deep learning techniques in matching problems, we present a method for learning context-aware features for solving optical flow using discrete optimization. Towards this goal, we present an efficient way of training a context network with a large receptive field size on top of a local network using dilated convolutions on patches. We perform feature matching by comparing each pixel in the reference image to every pixel in the target image, utilizing fast GPU matrix multiplication. The matching cost volume from the network's output forms the data term for discrete MAP inference in a pairwise Markov random field. Extensive evaluations reveal the importance of context for feature matching.Die visuelle Wahrnehmung von Tiefe und Bewegung spielt eine wichtige Rolle bei dem Verständnis und der Navigation in unserer Umwelt. Die 3D Rekonstruktion von Szenen im Freien und die Schätzung der Bewegung von Videokameras sind von größter Bedeutung für Anwendungen, wie das autonome Fahren. Die Erforschung der entsprechenden Probleme des maschinellen Sehens hat in den letzten Jahrzehnten enorme Fortschritte gemacht, jedoch bleiben einige Aspekte heute noch ungelöst. Beispiele hierfür sind reflektierende und texturlose Oberflächen oder große Bewegungen, bei denen herkömmliche lokale Methoden häufig scheitern. Weitere Herausforderungen sind niedrige Bildraten, Verdeckungen, große Verzerrungen und schwierige Lichtverhältnisse. In dieser Arbeit schlagen wir vor nicht-lokale Interaktionen zu modellieren, die semantische und kontextbezogene Informationen nutzen, um diese Herausforderungen zu meistern. Für die binokulare Stereo Schätzung schlagen wir zuallererst vor zusammenhängende Bereiche mit objektklassen-spezifischen Disparitäts Vorschlägen zu regularisieren, die wir mit inversen Grafik Techniken auf der Grundlage einer spärlichen Disparitätsschätzung und semantischen Segmentierung des Bildes erhalten. Die Disparitäts Vorschläge kodieren die Tatsache, dass die Gegenstände bestimmter Kategorien nicht willkürlich geformt sind, sondern typischerweise regelmäßige Strukturen aufweisen. Wir integrieren sie für die komplexe Objektklasse 'Auto' in Form eines nicht-lokalen Regularisierungsterm in ein Superpixel-basiertes grafisches Modell und zeigen die Vorteile vor allem in reflektierenden Bereichen. Zweitens nutzen wir für die 3D-Rekonstruktion die Tatsache, dass mit der Größe der rekonstruierten Fläche auch die Wahrscheinlichkeit steigt, Objekte von ähnlicher Art und Form in der Szene zu enthalten. Dies gilt besonders für Szenen im Freien, in denen Gebäude und Fahrzeuge oft vorkommen, die unter fehlender Textur oder Reflexionen leiden aber ähnlichkeit in der Form aufweisen. Wir nutzen diese ähnlichkeiten zur Lokalisierung von Objekten mit Detektoren und zur gemeinsamen Rekonstruktion indem ein volumetrisches Modell ihrer Form erlernt wird. Dies ermöglicht auftretendes Rauschen zu reduzieren, während fehlende Flächen vervollständigt werden, da Objekte ähnlicher Form von allen Beobachtungen der jeweiligen Kategorie profitieren. Die Evaluierung auf einem neuen, herausfordernden vorstädtischen Datensatz in Anbetracht von LIDAR-Entfernungsdaten zeigt die Vorteile der Modellierung von strukturellen Abhängigkeiten zwischen Objekten. Zuletzt, motiviert durch den Erfolg von Deep Learning Techniken bei der Mustererkennung, präsentieren wir eine Methode zum Erlernen von kontextbezogenen Merkmalen zur Lösung des optischen Flusses mittels diskreter Optimierung. Dazu stellen wir eine effiziente Methode vor um zusätzlich zu einem Lokalen Netzwerk ein Kontext-Netzwerk zu erlernen, das mit Hilfe von erweiterter Faltung auf Patches ein großes rezeptives Feld besitzt. Für das Feature Matching vergleichen wir mit schnellen GPU-Matrixmultiplikation jedes Pixel im Referenzbild mit jedem Pixel im Zielbild. Das aus dem Netzwerk resultierende Matching Kostenvolumen bildet den Datenterm für eine diskrete MAP Inferenz in einem paarweisen Markov Random Field. Eine umfangreiche Evaluierung zeigt die Relevanz des Kontextes für das Feature Matching

Publikationsserver der Universität Tübingen

Single View Modeling and View Synthesis

Author: Liao Miao
Publication venue: UKnowledge
Publication date: 01/01/2011
Field of study

This thesis develops new algorithms to produce 3D content from a single camera. Today, amateurs can use hand-held camcorders to capture and display the 3D world in 2D, using mature technologies. However, there is always a strong desire to record and re-explore the 3D world in 3D. To achieve this goal, current approaches usually make use of a camera array, which suffers from tedious setup and calibration processes, as well as lack of portability, limiting its application to lab experiments. In this thesis, I try to produce the 3D contents using a single camera, making it as simple as shooting pictures. It requires a new front end capturing device rather than a regular camcorder, as well as more sophisticated algorithms. First, in order to capture the highly detailed object surfaces, I designed and developed a depth camera based on a novel technique called light fall-off stereo (LFS). The LFS depth camera outputs color+depth image sequences and achieves 30 fps, which is necessary for capturing dynamic scenes. Based on the output color+depth images, I developed a new approach that builds 3D models of dynamic and deformable objects. While the camera can only capture part of a whole object at any instance, partial surfaces are assembled together to form a complete 3D model by a novel warping algorithm. Inspired by the success of single view 3D modeling, I extended my exploration into 2D-3D video conversion that does not utilize a depth camera. I developed a semi-automatic system that converts monocular videos into stereoscopic videos, via view synthesis. It combines motion analysis with user interaction, aiming to transfer as much depth inferring work from the user to the computer. I developed two new methods that analyze the optical flow in order to provide additional qualitative depth constraints. The automatically extracted depth information is presented in the user interface to assist with user labeling work. In this thesis, I developed new algorithms to produce 3D contents from a single camera. Depending on the input data, my algorithm can build high fidelity 3D models for dynamic and deformable objects if depth maps are provided. Otherwise, it can turn the video clips into stereoscopic video

University of Kentucky

Personen-Tracking in Umgebungen mit verdeckenden Objekten basierend auf 3D-Rekonstruktionsdaten

Author: Ober-Gecks Antje
Publication venue
Publication date: 01/01/2023
Field of study

EPub Bayreuth

Automated video-based analysis of human operators in mixed-model assembly work stations

Author: Bauters Karel
Publication venue
Publication date: 25/06/2021
Field of study

Ghent University Academic Bibliography

Material Recognition Meets 3D Reconstruction : Novel Tools for Efficient, Automatic Acquisition Systems

Author: Weinmann Michael
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

For decades, the accurate acquisition of geometry and reflectance properties has represented one of the major objectives in computer vision and computer graphics with many applications in industry, entertainment and cultural heritage. Reproducing even the finest details of surface geometry and surface reflectance has become a ubiquitous prerequisite in visual prototyping, advertisement or digital preservation of objects. However, today's acquisition methods are typically designed for only a rather small range of material types. Furthermore, there is still a lack of accurate reconstruction methods for objects with a more complex surface reflectance behavior beyond diffuse reflectance. In addition to accurate acquisition techniques, the demand for creating large quantities of digital contents also pushes the focus towards fully automatic and highly efficient solutions that allow for masses of objects to be acquired as fast as possible. This thesis is dedicated to the investigation of basic components that allow an efficient, automatic acquisition process. We argue that such an efficient, automatic acquisition can be realized when material recognition "meets" 3D reconstruction and we will demonstrate that reliably recognizing the materials of the considered object allows a more efficient geometry acquisition. Therefore, the main objectives of this thesis are given by the development of novel, robust geometry acquisition techniques for surface materials beyond diffuse surface reflectance, and the development of novel, robust techniques for material recognition. In the context of 3D geometry acquisition, we introduce an improvement of structured light systems, which are capable of robustly acquiring objects ranging from diffuse surface reflectance to even specular surface reflectance with a sufficient diffuse component. We demonstrate that the resolution of the reconstruction can be increased significantly for multi-camera, multi-projector structured light systems by using overlappings of patterns that have been projected under different projector poses. As the reconstructions obtained by applying such triangulation-based techniques still contain high-frequency noise due to inaccurately localized correspondences established for images acquired under different viewpoints, we furthermore introduce a novel geometry acquisition technique that complements the structured light system with additional photometric normals and results in significantly more accurate reconstructions. In addition, we also present a novel method to acquire the 3D shape of mirroring objects with complex surface geometry. The aforementioned investigations on 3D reconstruction are accompanied by the development of novel tools for reliable material recognition which can be used in an initial step to recognize the present surface materials and, hence, to efficiently select the subsequently applied appropriate acquisition techniques based on these classified materials. In the scope of this thesis, we therefore focus on material recognition for scenarios with controlled illumination as given in lab environments as well as scenarios with natural illumination that are given in photographs of typical daily life scenes. Finally, based on the techniques developed in this thesis, we provide novel concepts towards efficient, automatic acquisition systems

bonndoc – Der Publikationsserver der Universität Bonn

Recommended from our members

Detailed and Practical 3D Reconstruction with Advanced Photometric Stereo Modelling

Author: Logothetis Fotios
Publication venue: University of Cambridge
Publication date: 04/05/2019
Field of study

Object 3D reconstruction has always been one of the main objectives of computer vision. After many decades of research, most techniques are still unsuccessful at recovering high resolution surfaces, especially for objects with limited surface texture. Moreover, most shiny materials are particularly hard to reconstruct. Photometric Stereo (PS), which operates by capturing multiple images under changing illumination has traditionally been one of the most successful techniques at recovering a large amount of surface details, by exploiting the relationship between shading and local shape. However, using PS has been highly impractical because most approaches are only applicable in a very controlled lab setting and limited to objects experiencing diffuse reflection. Nevertheless, recent advances in differential modelling have made complicated Photometric Stereo models possible and variational optimisations for these kinds of models show remarkable resilience to real world imperfections such as non-Gaussian noise and other outliers. Thus, a highly accurate, photometric-based reconstruction system is now possible. The contribution of this thesis is threefold. First of all, the Photometric Stereo model is extended in order to be able to deal with arbitrary ambient lighting. This is a step towards acquisition in a non-fully controlled lab setting. Secondly, the need for a priori knowledge of the light source brightness and attenuation characteristics is relaxed as an alternating optimisation procedure is proposed which is able to estimate these parameters. This extension allows for quick acquisition with inexpensive LEDs that exhibit unpredictable illumination characteristics (flickering etc). Finally, a volumetric parameterisation is proposed which allows one to tackle the multi-view Photometric Stereo problem in a similar manner, in a simple unified differential model. This final extension allows for complete object reconstruction merging information from multiple images taken from multiple viewpoints and variable illumination. The theoretical work in this thesis is experimentally evaluated in a number of challenging real world experiments, with data captured by custom-made hardware. In addition, the applicability of the generality of the proposed models is demonstrated by presenting a differential model for the shape of polarisation problem, which leads to a unified optimisation problem, fusing information from both methods. This allows for the acquisition of geometrical information about objects such as semi-transparent glass, hitherto hard to deal with

Apollo (Cambridge)