71,133 research outputs found

    Template-based Monocular 3-D Shape Reconstruction And Tracking Using Laplacian Meshes

    Get PDF
    This thesis addresses the problem of recovering the 3-D shape of a deformable object in single images, or image sequences acquired by a monocular video camera, given that a 3-D template shape and a template image of the object are available. While being a very challenging problem in computer vision, being able to reconstruct and track 3-D deformable objects in videos allows us to develop many potential applications ranging from sports and entertainments to engineering and medical imaging. This thesis extends the scope of deformable object modeling to real-world applications of fully 3-D modeling of deformable objects from video streams with a number of contributions. We show that by extending the Laplacian formalism, which was first introduced in the Graphics community to regularize 3-D meshes, we can turn the monocular 3-D shape reconstruction of a deformable object given correspondences with a reference image into a much better-posed problem with far fewer degrees of freedom than the original one. This has proved key to achieving real-time performance while preserving both sufficient flexibility and robustness. Our real-time 3-D reconstruction and tracking system of deformable objects can very quickly reject outlier correspondences and accurately reconstruct the object shape in 3D. Frame-to-frame tracking is exploited to track the object under difficult settings such as large deformations, occlusions, illumination changes, and motion blur. We present an approach to solving the problem of dense image registration and 3-D shape reconstruction of deformable objects in the presence of occlusions and minimal texture. A main ingredient is the pixel-wise relevancy score that we use to weigh the influence of the image information from a pixel in the image energy cost function. A careful design of the framework is essential for obtaining state-of-the-art results in recovering 3-D deformations of both well- and poorly-textured objects in the presence of occlusions. We study the problem of reconstructing 3-D deformable objects interacting with rigid ones. Imposing real physical constraints allows us to model the interactions of objects in the real world more accurately and more realistically. In particular, we study the problem of a ball colliding with a bat observed by high speed cameras. We provide quantitative measurements of the impact that are compared with simulation-based methods to evaluate which simulation predictions most accurately describe a physical quantity of interest and to improve the models. Based on the diffuse property of the tracked deformable object, we propose a method to estimate the environment irradiance map represented by a set of low frequency spherical harmonics. The obtained irradiance map can be used to realistically illuminate 2-D and 3-D virtual contents in the context of augmented reality on deformable objects. The results compare favorably with baseline methods. In collaboration with Disney Research, we develop an augmented reality coloring book application that runs in real-time on mobile devices. The app allows the children to see the coloring work by showing animated characters with texture lifted from their colors on the drawing. Deformations of the book page are explicitly modeled by our 3-D tracking and reconstruction method. As a result, accurate color information is extracted to synthesize the character's texture

    Fast human behavior analysis for scene understanding

    Get PDF
    Human behavior analysis has become an active topic of great interest and relevance for a number of applications and areas of research. The research in recent years has been considerably driven by the growing level of criminal behavior in large urban areas and increase of terroristic actions. Also, accurate behavior studies have been applied to sports analysis systems and are emerging in healthcare. When compared to conventional action recognition used in security applications, human behavior analysis techniques designed for embedded applications should satisfy the following technical requirements: (1) Behavior analysis should provide scalable and robust results; (2) High-processing efficiency to achieve (near) real-time operation with low-cost hardware; (3) Extensibility for multiple-camera setup including 3-D modeling to facilitate human behavior understanding and description in various events. The key to our problem statement is that we intend to improve behavior analysis performance while preserving the efficiency of the designed techniques, to allow implementation in embedded environments. More specifically, we look into (1) fast multi-level algorithms incorporating specific domain knowledge, and (2) 3-D configuration techniques for overall enhanced performance. If possible, we explore the performance of the current behavior-analysis techniques for improving accuracy and scalability. To fulfill the above technical requirements and tackle the research problems, we propose a flexible behavior-analysis framework consisting of three processing-layers: (1) pixel-based processing (background modeling with pixel labeling), (2) object-based modeling (human detection, tracking and posture analysis), and (3) event-based analysis (semantic event understanding). In Chapter 3, we specifically contribute to the analysis of individual human behavior. A novel body representation is proposed for posture classification based on a silhouette feature. Only pure binary-shape information is used for posture classification without texture/color or any explicit body models. To this end, we have studied an efficient HV-PCA shape-based descriptor with temporal modeling, which achieves a posture-recognition accuracy rate of about 86% and outperforms other existing proposals. As our human motion scheme is efficient and achieves a fast performance (6-8 frames/second), it enables a fast surveillance system or further analysis of human behavior. In addition, a body-part detection approach is presented. The color and body ratio are combined to provide clues for human body detection and classification. The conventional assumption of up-right body posture is not required. Afterwards, we design and construct a specific framework for fast algorithms and apply them in two applications: tennis sports analysis and surveillance. Chapter 4 deals with tennis sports analysis and presents an automatic real-time system for multi-level analysis of tennis video sequences. First, we employ a 3-D camera model to bridge the pixel-level, object-level and scene-level of tennis sports analysis. Second, a weighted linear model combining the visual cues in the real-world domain is proposed to identify various events. The experimentally found event extraction rate of the system is about 90%. Also, audio signals are combined to enhance the scene analysis performance. The complete proposed application is efficient enough to obtain a real-time or near real-time performance (2-3 frames/second for 720×576 resolution, and 5-7 frames/second for 320×240 resolution, with a P-IV PC running at 3GHz). Chapter 5 addresses surveillance and presents a full real-time behavior-analysis framework, featuring layers at pixel, object, event and visualization level. More specifically, this framework captures the human motion, classifies its posture, infers the semantic event exploiting interaction modeling, and performs the 3-D scene reconstruction. We have introduced our system design based on a specific software architecture, by employing the well-known "4+1" view model. In addition, human behavior analysis algorithms are directly designed for real-time operation and embedded in an experimental runtime AV content-analysis architecture. This executable system is designed to be generic for multiple streaming applications with component-based architectures. To evaluate the performance, we have applied this networked system in a single-camera setup. The experimental platform operates with two Pentium Quadcore engines (2.33 GHz) and 4-GB memory. Performance evaluations have shown that this networked framework is efficient and achieves a fast performance (13-15 frames/second) for monocular video sequences. Moreover, a dual-camera setup is tested within the behavior-analysis framework. After automatic camera calibration is conducted, the 3-D reconstruction and communication among different cameras are achieved. The extra view in the multi-camera setup improves the human tracking and event detection in case of occlusion. This extension of multiple-view fusion improves the event-based semantic analysis by 8.3-16.7% in accuracy rate. The detailed studies of two experimental intelligent applications, i.e., tennis sports analysis and surveillance, have proven their value in several extensive tests in the framework of the European Candela and Cantata ITEA research programs, where our proposed system has demonstrated competitive performance with respect to accuracy and efficiency

    3D Face Tracking and Texture Fusion in the Wild

    Full text link
    We present a fully automatic approach to real-time 3D face reconstruction from monocular in-the-wild videos. With the use of a cascaded-regressor based face tracking and a 3D Morphable Face Model shape fitting, we obtain a semi-dense 3D face shape. We further use the texture information from multiple frames to build a holistic 3D face representation from the video frames. Our system is able to capture facial expressions and does not require any person-specific training. We demonstrate the robustness of our approach on the challenging 300 Videos in the Wild (300-VW) dataset. Our real-time fitting framework is available as an open source library at http://4dface.org
    • …
    corecore