1,646 research outputs found

    Enhanced tracking and recognition of moving objects by reasoning about spatio-temporal continuity.

    Get PDF
    A framework for the logical and statistical analysis and annotation of dynamic scenes containing occlusion and other uncertainties is presented. This framework consists of three elements; an object tracker module, an object recognition/classification module and a logical consistency, ambiguity and error reasoning engine. The principle behind the object tracker and object recognition modules is to reduce error by increasing ambiguity (by merging objects in close proximity and presenting multiple hypotheses). The reasoning engine deals with error, ambiguity and occlusion in a unified framework to produce a hypothesis that satisfies fundamental constraints on the spatio-temporal continuity of objects. Our algorithm finds a globally consistent model of an extended video sequence that is maximally supported by a voting function based on the output of a statistical classifier. The system results in an annotation that is significantly more accurate than what would be obtained by frame-by-frame evaluation of the classifier output. The framework has been implemented and applied successfully to the analysis of team sports with a single camera. Key words: Visua

    Cooperative multitarget tracking with efficient split and merge handling

    Get PDF
    Copyright © 2006 IEEEFor applications such as behavior recognition it is important to maintain the identity of multiple targets, while tracking them in the presence of splits and merges, or occlusion of the targets by background obstacles. Here we propose an algorithm to handle multiple splits and merges of objects based on dynamic programming and a new geometric shape matching measure. We then cooperatively combine Kalman filter-based motion and shape tracking with the efficient and novel geometric shape matching algorithm. The system is fully automatic and requires no manual input of any kind for initialization of tracking. The target track initialization problem is formulated as computation of shortest paths in a directed and attributed graph using Dijkstra's shortest path algorithm. This scheme correctly initializes multiple target tracks for tracking even in the presence of clutter and segmentation errors which may occur in detecting a target. We present results on a large number of real world image sequences, where upto 17 objects have been tracked simultaneously in real-time, despite clutter, splits, and merges in measurements of objects. The complete tracking system including segmentation of moving objects works at 25 Hz on 352times288 pixel color image sequences on a 2.8-GHz Pentium-4 workstationPankaj Kumar, Surendra Ranganath, Kuntal Sengupta, and Huang Weimi

    3D Least Squares Based Surface Reconstruction

    Get PDF
    Diese Arbeit präsentiert einen vollständig dreidimensionalen (3D) Algorithmus zur Oberflächenrekonstruktion aus Bildfolgen mit großer Basis. Die rekonstruierten Oberflächen werden durch Dreiecksgitter beschrieben, was eine einfache Integration von Bild- und Geometrie-basierten Bedingungen ermöglicht. Die vorgestellte Arbeit erweitert den erfolgreichen Ansatz von Heipke (1990) zur 2,5D Rekonstruktion zur vollständigen 3D Rekonstruktion. Verdeckung und nicht-Lambertsche Spiegelung werden durch robuste kleinste Quadrate Ausgleichung zur Schätzung des Modells berücksichtigt. Ausgangsdaten sind Bilder von verschiedenen Positionen, abgeleitete genaue Orientierungen der Bilder und eine begrenzte Zahl von 3D Punkten (Bartelsen and Mayer 2010). Die erste Neuerung des vorgestellten Ansatzes besteht in der Art und Weise, wie zusätzliche Punkte (Unbekannte) in dem Dreiecksgitter aus den vorgegebenen 3D Punkten positioniert werden. Dank den genauen Positionen dieser zusätzlichen Punkte werden präzisere und genauere rekonstruierte Oberflächen bezüglich Form und Anpassung der Bildtextur erhalten. Die zweite Neuerung besteht darin, dass individuelle Bias-Parameter für verschiedene Bilder und angepasste Gewichtungen für unterschiedliche Bildbeobachtungen verwendet werden, um damit unterschiedliche Intensitäten verschiedener Bilder als auch Ausreißer zu berücksichtigen. Die dritte Neuerung sind die verwendete Faktorisierung der Design-Matrix und die Art und Weise, wie die Gitter in Ebenen zerlegt werden, um die Laufzeit zu reduzieren. Das wesentliche Element des vorgestellten Modells besteht in der Varianz der Intensitätswerte der Bildbeobachtungen innerhalb eines Dreiecks. Mit dem vorgestellten Ansatz können genaue 3D Oberflächen für unterschiedliche Arten von Szenen rekonstruiert werden. Ergebnisse werden als VRML (Virtual Reality Modeling Language) Modelle ausgegeben, welche sowohl das Potential als auch die derzeitigen Grenzen des Ansatzes aufzeigen.This thesis presents a fully three dimensional (3D) surface reconstruction algorithm from wide-baseline image sequences. Triangle meshes represent the reconstructed surfaces allowing for an easy integration of image- and geometry-based constraints. We extend the successful approach for 2.5D reconstruction of Heipke (1990) to full 3D. To take into account occlusion and non-Lambertian reflection, we apply robust least squares adjustment to estimate the model. The input for our approach are images taken from different positions and derived accurate image orientations as well as sparse 3D points (Bartelsen and Mayer 2010). The first novelty of our approach is the way we position additional 3D points (unknowns) in the triangle meshes constructed from given 3D points. Owing to the precise positions of these additional 3D points, we obtain more precise and accurate reconstructed surfaces in terms of shape and fit of texture. The second novelty is to apply individual bias parameters for different images and adapted weights for different image observations to account for differences in the intensity values for different images as well as to consider outliers in the estimation. The third novelty is the way we factorize the design matrix and divide the meshes into layers to reduce the run time. The essential element for our model is the variance of the intensity values of image observations inside a triangle. Applying the approach, we can reconstruct accurate 3D surfaces for different types of scenes. Results are presented in the form of VRML (Virtual Reality Modeling Language) models, demonstrating the potential of the approach as well as its current shortcomings

    Finding Optimal Views for 3D Face Shape Modeling

    Get PDF
    A fundamental problem in multi-view 3D face modeling is the determination of the set of optimal views (poses) required for accurate 3D shape estimation of a generic face. There is no analytical solution to this problem, instead (partial) solutions require (near) exhaustive combinatorial search, hence the inherent computational difficulty of this task. We build on our previous modeling framework [Silhouette-based 3D face shape recovery, Model-based 3D face capture using shape-from-silhouettes] which uses an efficient contour-based silhouette method and extend it by aggressive pruning of the view-sphere with view clustering and various imaging constraints. A multi-view optimization search is performed using both model-based (eigenheads) and data-driven (visual hull) methods, yielding comparable best views. These constitute the first reported set of optimal views for 3D face shape capture and provide useful empirical guidelines for the design of 3D face recognition systems.Engineering and Applied Science

    Single View Modeling and View Synthesis

    Get PDF
    This thesis develops new algorithms to produce 3D content from a single camera. Today, amateurs can use hand-held camcorders to capture and display the 3D world in 2D, using mature technologies. However, there is always a strong desire to record and re-explore the 3D world in 3D. To achieve this goal, current approaches usually make use of a camera array, which suffers from tedious setup and calibration processes, as well as lack of portability, limiting its application to lab experiments. In this thesis, I try to produce the 3D contents using a single camera, making it as simple as shooting pictures. It requires a new front end capturing device rather than a regular camcorder, as well as more sophisticated algorithms. First, in order to capture the highly detailed object surfaces, I designed and developed a depth camera based on a novel technique called light fall-off stereo (LFS). The LFS depth camera outputs color+depth image sequences and achieves 30 fps, which is necessary for capturing dynamic scenes. Based on the output color+depth images, I developed a new approach that builds 3D models of dynamic and deformable objects. While the camera can only capture part of a whole object at any instance, partial surfaces are assembled together to form a complete 3D model by a novel warping algorithm. Inspired by the success of single view 3D modeling, I extended my exploration into 2D-3D video conversion that does not utilize a depth camera. I developed a semi-automatic system that converts monocular videos into stereoscopic videos, via view synthesis. It combines motion analysis with user interaction, aiming to transfer as much depth inferring work from the user to the computer. I developed two new methods that analyze the optical flow in order to provide additional qualitative depth constraints. The automatically extracted depth information is presented in the user interface to assist with user labeling work. In this thesis, I developed new algorithms to produce 3D contents from a single camera. Depending on the input data, my algorithm can build high fidelity 3D models for dynamic and deformable objects if depth maps are provided. Otherwise, it can turn the video clips into stereoscopic video

    3D facial shape estimation from a single image under arbitrary pose and illumination.

    Get PDF
    Humans have the uncanny ability to perceive the world in three dimensions (3D), otherwise known as depth perception. The amazing thing about this ability to determine distances is that it depends only on a simple two-dimensional (2D) image in the retina. It is an interesting problem to explain and mimic this phenomenon of getting a three-dimensional perception of a scene from a flat 2D image of the retina. The main objective of this dissertation is the computational aspect of this human ability to reconstruct the world in 3D using only 2D images from the retina. Specifically, the goal of this work is to recover 3D facial shape information from a single image of unknown pose and illumination. Prior shape and texture models from real data, which are metric in nature, are incorporated into the 3D shape recovery framework. The output recovered shape, likewise, is metric, unlike previous shape-from-shading (SFS) approaches that only provide relative shape. This work starts first with the simpler case of general illumination and fixed frontal pose. Three optimization approaches were developed to solve this 3D shape recovery problem, starting from a brute-force iterative approach to a computationally efficient regression method (Method II-PCR), where the classical shape-from-shading equation is cast as a regression framework. Results show that the output of the regression-like approach is faster in timing and similar in error metrics when compared to its iterative counterpart. The best of the three algorithms above, Method II-PCR, is compared to its two predecessors, namely: (a) Castelan et al. [1] and (b) Ahmed et al. [2]. Experimental results show that the proposed method (Method II-PCR) is superior in all aspects compared to the previous state-of-the-art. Robust statistics was also incorporated into the shape recovery framework to deal with noise and occlusion. Using multiple-view geometry concepts [3], the fixed frontal pose was relaxed to arbitrary pose. The best of the three algorithms above, Method II-PCR, once again is used as the primary 3D shape recovery method. Results show that the pose-invariant 3D shape recovery version (for input with pose) has similar error values compared to the frontal-pose version (for input with frontal pose), for input images of the same subject. Sensitivity experiments indicate that the proposed method is, indeed, invariant to pose, at least for the pan angle range of (-50° to 50°). The next major part of this work is the development of 3D facial shape recovery methods, given only the input 2D shape information, instead of both texture and 2D shape. The simpler case of output 3D sparse shapes was dealt with, initially. The proposed method, which also use a regression-based optimization approach, was compared with state-of-the art algorithms, showing decent performance. There were five conclusions that drawn from the sparse experiments, namely, the proposed approach: (a) is competitive due to its linear and non-iterative nature, (b) does not need explicit training, as opposed to [4], (c) has comparable results to [4], at a shorter computational time, (d) better in all aspects than Zhang and Samaras [5], and (e) has the limitation, together with [4] and [5], in terms of the need to manually annotate the input 2D feature points. The proposed method was then extended to output 3D dense shapes simply by replacing the sparse model with its dense equivalent, in the regression framework inside the 3D face recovery approach. The numerical values of the mean height and surface orientation error indicate that even if shading information is unavailable, a decent 3D dense reconstruction is still possible
    • …
    corecore