7 research outputs found

    Generalized Multi-Camera Scene Reconstruction Using Graph Cuts

    Get PDF
    Reconstructing a 3-D scene from more than one camera is a classical problem in computer vision. One of the major sources of difficulty is the fact that not all scene elements are visible from all cameras. In the last few years, two promising approaches have been developed [. . .] that formulate the scene reconstruction problem in terms of energy minimization, and minimize the energy using graph cuts. These energy minimization approaches treat the input images symmetrically, handle visibility constraints correctly, and allow spatial smoothness to be enforced. However, these algorithm propose different problem formulations, and handle a limited class of smoothness terms. One algorithm [. . .] uses a problem formulation that is restricted to two-camera stereo, and imposes smoothness between a pair of cameras. The other algorithm [. . .] can handle an arbitrary number of cameras, but imposes smoothness only with respect to a single camera. In this paper we give a more general energy minimization formulation for the problem, which allows a larger class of spatial smoothness constraints. We show that our formulation includes both of the previous approaches as special cases, as well as permitting new energy functions. Experimental results on real data with ground truth are also included.Engineering and Applied Science

    Modified belief propagation for reconstruction of office environments

    Get PDF
    Belief Propagation (BP) is an algorithm that has found broad application in many areas of computer science. The range of these areas includes Error Correcting Codes, Kalman filters, particle filters, and -- most relevantly -- stereo computer vision. Many of the currently best algorithms for stereo vision benchmarks, e.g. the Middlebury dataset, use Belief Propagation. This dissertation describes improvements to the core algorithm to improve its applicability and usefulness for computer vision applications. A Belief Propagation solution to a computer vision problem is commonly based on specification of a Markov Random Field that it optimizes. Both Markov Random Fields and Belief Propagation have at their core some definition of nodes and neighborhoods' for each node. Each node has a subset of the other nodes defined to be its neighborhood. In common usages for stereo computer vision, the neighborhoods are defined as a pixel's immediate four spatial neighbors. For any given node, this neighborhood definition may or may not be correct for the specific scene. In a setting with video cameras, I expand the neighborhood definition to include corresponding nodes in temporal neighborhoods in addition to spatial neighborhoods. This amplifies the problem of erroneous neighborhood assignments. Part of this dissertation addresses the erroneous neighborhood assignment problem. Often, no single algorithm is always the best. The Markov Random Field formulation appears amiable to integration of other algorithms: I explore that potential here by integrating priors from independent algorithms. This dissertation makes core improvements to BP such that it is more robust to erroneous neighborhood assignments, is more robust in regions with inputs that are near-uniform, and can be biased in a sensitive manner towards higher level priors. These core improvements are demonstrated by the presented results: application to office environments, real-world datasets, and benchmark datasets

    Geometry-driven feature detection

    Get PDF
    Matching images taken from different viewpoints is a fundamental step for many computer vision applications including 3D reconstruction, scene recognition, virtual reality, robot localization, etc. The typical approaches detect feature keypoints based on local properties to achieve robustness to viewpoint changes, and establish correspondences between keypoints to recover the 3D geometry or determine the similarity between images. The complexity of perspective distortion challenges the detection of viewpoint invariant features; the lack of 3D geometric information about local features makes their matching inefficient. In this thesis, I explore feature detection based on 3D geometric information for improved projective invariance. The main novel research contributions of this thesis are as follows. First, I give a projective invariant feature detection method that exploits 3D structures recovered from simple stereo matching. By leveraging the rich geometric information of the detected features, I present an efficient 3D matching algorithm to handle large viewpoint changes. Second, I propose a compact high-level feature detector that robustly extracts repetitive structures in urban scenes, which allows efficient wide-baseline matching. I further introduce a novel single-view reconstruction approach to recover the 3D dense geometry of the repetition-based features

    Numérisation 3D de visages par une approche de super-résolution spatio-temporelle non-rigide

    Get PDF
    La mesure de la forme 3D du visage est une problématique qui attire de plus en plus de chercheurs et qui trouve son application dans des domaines divers tels que la biométrie, l animation et la chirurgie faciale. Les solutions actuelles sont souvent basées sur des systèmes projecteur/caméra et utilisent de la lumière structurée pour compenser l insuffisance de la texture faciale. L information 3D est ensuite calculée en décodant la distorsion des patrons projetés sur le visage. Une des techniques les plus utilisées de la lumière structurée est la codification sinusoïdale par décalage de phase qui permet une numérisation 3D de résolution pixélique. Cette technique exige une étape de déroulement de phase, sensible à l éclairage ambiant surtout quand le nombre de patrons projetés est limité. En plus, la projection de plusieurs patrons impacte le délai de numérisation et peut générer des artefacts surtout pour la capture d un visage en mouvement. Une alternative aux approches projecteur-caméra consiste à estimer l information 3D par appariement stéréo suivi par une triangulation optique. Cependant, le modèle calculé par cette technique est généralement non-dense et manque de précision. Des travaux récents proposent la super-résolution pour densifier et débruiter les images de profondeur. La super-résolution a été particulièrement proposée pour les caméras 3D TOF (Time-Of-Flight) qui fournissent des scans 3D très bruités. Ce travail de thèse propose une solution de numérisation 3D à faible coût avec un schéma de super-résolution spatio-temporelle. Elle utilise un système multi-caméra étalonné assisté par une source de projection non-étalonnée. Elle est particulièrement adaptée à la reconstruction 3D de visages, i.e. rapide et mobile. La solution proposée est une approche hybride qui associe la stéréovision et la codification sinusoïdale par décalage de phase, et qui non seulement profite de leurs avantages mais qui surmonte leurs faiblesses. Le schéma de la super-résolution proposé permet de corriger l information 3D, de compléter la vue scannée du visage en traitant son aspect déformable.3D face measurement is increasingly demanded for many applications such as bio-metrics, animation and facial surgery. Current solutions often employ a structured light camera/projector device to overcome the relatively uniform appearance of skin. Depth in-formation is recovered by decoding patterns of the projected structured light. One of the most widely used structured-light coding is sinusoidal phase shifting which allows a 3Ddense resolution. Current solutions mostly utilize more than three phase-shifted sinusoidal patterns to recover the depth information, thus impacting the acquisition delay. They further require projector-camera calibration whose accuracy is crucial for phase to depth estimation step. Also, they need an unwrapping stage which is sensitive to ambient light, especially when the number of patterns decreases. An alternative to projector-camera systems consists of recovering depth information by stereovision using a multi-camera system. A stereo matching step finds correspondence between stereo images and the 3D information is obtained by optical triangulation. However, the model computed in this way generally is quite sparse. To up sample and denoise depth images, researchers looked into super-resolution techniques. Super-resolution was especially proposed for time-of-flight cameras which have very low data quality and a very high random noise. This thesis proposes a3D acquisition solution with a 3D space-time non-rigid super-resolution capability, using a calibrated multi-camera system coupled with a non calibrated projector device, which is particularly suited to 3D face scanning, i.e. rapid and easily movable. The proposed solution is a hybrid stereovision and phase-shifting approach, using two shifted patterns and a texture image, which not only takes advantage of the assets of stereovision and structured light but also overcomes their weaknesses. The super-resolution scheme involves a 3D non-rigid registration for 3D artifacts correction in the presence of small non-rigid deformations as facial expressions.LYON-Ecole Centrale (690812301) / SudocSudocFranceF

    Template reduction of feature point models for rigid objects and application to tracking in microscope images.

    Get PDF
    This thesis addresses the problem of tracking rigid objects in video sequences. A novel approach to reducing the template size of shapes is presented. The reduced shape template can be used to enhance the performance of tracking, detection and recognition algorithms. The main idea consists of pre-calculating all possible positions and orientations that a shape can undergo for a given state space. From these states, it is possible to extract a set of points that uniquely and robustly characterises the shape for the considered state space. An algorithm, based on the Hough transform, has been developed to achieve this for discrete shapes, i.e. sets of points, projected in an image when the state space is bounded. An extended discussion on particle filters, that serves as an introduction to the topic, is presented, as well as some generic improvements. The introduction of these improvements allow the data to be better sampled by incorporating additional measurements and knowledge about the velocity of the tracked object. A partial re-initialisation scheme is also presented that enables faster recovery of the system when the object is temporarily occluded.A stencil estimator is introduced to identify the position of an object in an image. Some of its properties are discussed and demonstrated. The estimator can be efficiently evaluated using the bounded Hough transform algorithm. The performance of the stencilled Hough transform can be further enhanced with a methodology that decimates the stencils while maintaining the robustness of the tracker. Performance evaluations have demonstrated the relevance of the approach. Although the methods presented in this thesis could be adapted to full 3-D object motion, motions that maintain the same view of the object in front of a camera are more specifically studied
    corecore