3,225 research outputs found
DiffuStereo: High Quality Human Reconstruction via Diffusion-based Stereo Using Sparse Cameras
We propose DiffuStereo, a novel system using only sparse cameras (8 in this
work) for high-quality 3D human reconstruction. At its core is a novel
diffusion-based stereo module, which introduces diffusion models, a type of
powerful generative models, into the iterative stereo matching network. To this
end, we design a new diffusion kernel and additional stereo constraints to
facilitate stereo matching and depth estimation in the network. We further
present a multi-level stereo network architecture to handle high-resolution (up
to 4k) inputs without requiring unaffordable memory footprint. Given a set of
sparse-view color images of a human, the proposed multi-level diffusion-based
stereo network can produce highly accurate depth maps, which are then converted
into a high-quality 3D human model through an efficient multi-view fusion
strategy. Overall, our method enables automatic reconstruction of human models
with quality on par to high-end dense-view camera rigs, and this is achieved
using a much more light-weight hardware setup. Experiments show that our method
outperforms state-of-the-art methods by a large margin both qualitatively and
quantitatively.Comment: Accepted by ECCV202
MRF Stereo Matching with Statistical Estimation of Parameters
For about the last ten years, stereo matching in computer vision has been treated as a combinatorial optimization problem. Assuming that the points in stereo images form a Markov Random Field (MRF), a variety of combinatorial optimization algorithms has been developed to optimize their underlying cost functions. In many of these algorithms, the MRF parameters of the cost functions have often been manually tuned or heuristically determined for achieving good performance results. Recently, several algorithms for statistical, hence, automatic estimation of the parameters have been published. Overall, these algorithms perform well in labeling, but they lack in performance for handling discontinuity in labeling along the surface borders.
In this dissertation, we develop an algorithm for optimization of the cost function with automatic estimation of the MRF parameters – the data and smoothness parameters. Both the parameters are estimated statistically and applied in the cost function with support of adaptive neighborhood defined based on color similarity. With the proposed algorithm, discontinuity handling with higher consistency than of the existing algorithms is achieved along surface borders. The data parameters are pre-estimated from one of the stereo images by applying a hypothesis, called noise equivalence hypothesis, to eliminate interdependency between the estimations of the data and smoothness parameters. The smoothness parameters are estimated applying a combination of maximum likelihood and disparity gradient constraint, to eliminate nested inference for the estimation. The parameters for handling discontinuities in data and smoothness are defined statistically as well. We model cost functions to match the images symmetrically for improved matching performance and also to detect occlusions. Finally, we fill the occlusions in the disparity map by applying several existing and proposed algorithms and show that our best proposed segmentation based least squares algorithm performs better than the existing algorithms.
We conduct experiments with the proposed algorithm on publicly available ground truth test datasets provided by the Middlebury College. Experiments show that results better than the existing algorithms’ are delivered by the proposed algorithm having the MRF parameters estimated automatically. In addition, applying the parameter estimation technique in existing stereo matching algorithm, we observe significant improvement in computational time
Viewfinder: final activity report
The VIEW-FINDER project (2006-2009) is an 'Advanced Robotics' project that seeks to apply a semi-autonomous robotic system to inspect ground safety in the event of a fire. Its primary aim is to gather data (visual and chemical) in order to assist rescue personnel. A base station combines the gathered information with information retrieved from off-site sources.
The project addresses key issues related to map building and reconstruction, interfacing local command information with external sources, human-robot interfaces and semi-autonomous robot navigation.
The VIEW-FINDER system is a semi-autonomous; the individual robot-sensors operate autonomously within the limits of the task assigned to them, that is, they will autonomously navigate through and inspect an area. Human operators monitor their operations and send high level task requests as well as low level commands through the interface to any nodes in the entire system. The human interface has to ensure the human supervisor and human interveners are provided a reduced but good and relevant overview of the ground and the robots and human rescue workers therein
Livrable D3.3 of the PERSEE project : 2D coding tools
49Livrable D3.3 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D3.3 du projet. Son titre : 2D coding tool
A Multicamera System for Gesture Tracking With Three Dimensional Hand Pose Estimation
The goal of any visual tracking system is to successfully detect then follow an object of interest through a sequence of images. The difficulty of tracking an object depends on the dynamics, the motion and the characteristics of the object as well as on the environ ment. For example, tracking an articulated, self-occluding object such as a signing hand has proven to be a very difficult problem. The focus of this work is on tracking and pose estimation with applications to hand gesture interpretation. An approach that attempts to integrate the simplicity of a region tracker with single hand 3D pose estimation methods is presented. Additionally, this work delves into the pose estimation problem. This is ac complished by both analyzing hand templates composed of their morphological skeleton, and addressing the skeleton\u27s inherent instability. Ligature points along the skeleton are flagged in order to determine their effect on skeletal instabilities. Tested on real data, the analysis finds the flagging of ligature points to proportionally increase the match strength of high similarity image-template pairs by about 6%. The effectiveness of this approach is further demonstrated in a real-time multicamera hand tracking system that tracks hand gestures through three-dimensional space as well as estimate the three-dimensional pose of the hand
- …