1,194 research outputs found

    Covariate Analysis for View-point Independent Gait Recognition

    No full text
    Many studies have shown that gait can be deployed as a biometric. Few of these have addressed the effects of view-point and covariate factors on the recognition process. We describe the first analysis which combines view-point invariance for gait recognition which is based on a model-based pose estimation approach from a single un-calibrated camera. A set of experiments are carried out to explore how such factors including clothing, carrying conditions and view-point can affect the identification process using gait. Based on a covariate-based probe dataset of over 270 samples, a recognition rate of 73.4% is achieved using the KNN classifier. This confirms that people identification using dynamic gait features is still perceivable with better recognition rate even under the different covariate factors. As such, this is an important step in translating research from the laboratory to a surveillance environment

    An initial matching and mapping for dense 3D object tracking in augmented reality applications

    Get PDF
    Augmented Reality (AR) applications rely on efficient and robust methods of tracking. One type of tracking uses dense 3D point data representations of the object to track. As opposed to sparse, dense tracking approaches are highly accurate and precise by considering all of the available data from a camera. A major challenge to dense tracking is that it requires a rough initial matching and mapping to begin. A matching means that from a known object, we can determine the object exists in the scene, and a mapping means that we can identify the position and orientation of an object with respect to the camera. Current methods to provide the initial matching and mapping require the user to manually input parameters, or wait an extended amount of time for a brute force automatic approach. The research presented in this thesis develops an automatic initial matching and mapping for dense tracking for AR, facilitating natural AR systems that track 3D objects. To do this, an existing offline method for registration of ideal 3D object point sets is proposed as a starting point. The method is improved and optimized in four steps to address the requirements and challenges for dense tracking in AR with a noisy consumer sensor. A series of experiments verifies the suitability of the optimizations, using increasingly large and more complex scene point clouds, and the results are presented

    Fitting a 3D Morphable Model to Edges: A Comparison Between Hard and Soft Correspondences

    Get PDF
    We propose a fully automatic method for fitting a 3D morphable model to single face images in arbitrary pose and lighting. Our approach relies on geometric features (edges and landmarks) and, inspired by the iterated closest point algorithm, is based on computing hard correspondences between model vertices and edge pixels. We demonstrate that this is superior to previous work that uses soft correspondences to form an edge-derived cost surface that is minimised by nonlinear optimisation.Comment: To appear in ACCV 2016 Workshop on Facial Informatic

    Multi-View Unsupervised Image Generation with Cross Attention Guidance

    Full text link
    The growing interest in novel view synthesis, driven by Neural Radiance Field (NeRF) models, is hindered by scalability issues due to their reliance on precisely annotated multi-view images. Recent models address this by fine-tuning large text2image diffusion models on synthetic multi-view data. Despite robust zero-shot generalization, they may need post-processing and can face quality issues due to the synthetic-real domain gap. This paper introduces a novel pipeline for unsupervised training of a pose-conditioned diffusion model on single-category datasets. With the help of pretrained self-supervised Vision Transformers (DINOv2), we identify object poses by clustering the dataset through comparing visibility and locations of specific object parts. The pose-conditioned diffusion model, trained on pose labels, and equipped with cross-frame attention at inference time ensures cross-view consistency, that is further aided by our novel hard-attention guidance. Our model, MIRAGE, surpasses prior work in novel view synthesis on real images. Furthermore, MIRAGE is robust to diverse textures and geometries, as demonstrated with our experiments on synthetic images generated with pretrained Stable Diffusion

    LiveCap: Real-time Human Performance Capture from Monocular Video

    Full text link
    We present the first real-time human performance capture approach that reconstructs dense, space-time coherent deforming geometry of entire humans in general everyday clothing from just a single RGB video. We propose a novel two-stage analysis-by-synthesis optimization whose formulation and implementation are designed for high performance. In the first stage, a skinned template model is jointly fitted to background subtracted input video, 2D and 3D skeleton joint positions found using a deep neural network, and a set of sparse facial landmark detections. In the second stage, dense non-rigid 3D deformations of skin and even loose apparel are captured based on a novel real-time capable algorithm for non-rigid tracking using dense photometric and silhouette constraints. Our novel energy formulation leverages automatically identified material regions on the template to model the differing non-rigid deformation behavior of skin and apparel. The two resulting non-linear optimization problems per-frame are solved with specially-tailored data-parallel Gauss-Newton solvers. In order to achieve real-time performance of over 25Hz, we design a pipelined parallel architecture using the CPU and two commodity GPUs. Our method is the first real-time monocular approach for full-body performance capture. Our method yields comparable accuracy with off-line performance capture techniques, while being orders of magnitude faster

    Real-time Virtual Object Insertion for Moving 360° Videos

    Get PDF
    We propose an approach for real-time insertion of virtual objects into pre-recorded moving-camera 360° video. First, we reconstruct camera motion and sparse scene content via structure from motion on stitched equirectangular video. Then, to plausibly reproduce real-world lighting conditions for virtual objects, we use inverse tone mapping to recover high dynamic range environment maps which vary spatially along the camera path. We implement our approach into the Unity rendering engine for real-time virtual object insertion via differential rendering, with dynamic lighting, image-based shadowing, and user interaction. This expands the use and flexibility of 360° video for interactive computer graphics and visual effects applications

    Robust Estimation of Trifocal Tensors Using Natural Features for Augmented Reality Systems

    Get PDF
    Augmented reality deals with the problem of dynamically augmenting or enhancing the real world with computer generated virtual scenes. Registration is one of the most pivotal problems currently limiting AR applications. In this paper, a novel registration method using natural features based on online estimation of trifocal tensors is proposed. This method consists of two stages: offline initialization and online registration. Initialization involves specifying four points in two reference images respectively to build the world coordinate system on which a virtual object will be augmented. In online registration, the natural feature correspondences detected from the reference views are tracked in the current frame to build the feature triples. Then these triples are used to estimate the corresponding trifocal tensors in the image sequence by which the four specified points are transferred to compute the registration matrix for augmentation. The estimated registration matrix will be used as an initial estimate for a nonlinear optimization method that minimizes the actual residual errors based on the Levenberg-Marquardt (LM) minimization method, thus making the results more robust and stable. This paper also proposes a robust method for estimating the trifocal tensors, where a modified RANSAC algorithm is used to remove outliers. Compared with standard RANSAC, our method can significantly reduce computation complexity, while overcoming the disturbance of mismatches. Some experiments have been carried out to demonstrate the validity of the proposed approach

    Automatic Fitting of a Deformable Face Mask Using a Single Image

    Get PDF
    We propose an automatic method for person-independent fitting of a deformable 3D face mask model under varying illumination conditions. Principle Component Analysis is utilised to build a face model which is then used within a particle filter based approach to fit the mask to the image. By subdividing a coarse mask and using a novel texture mapping technique, we further apply the 3D face model to fit into lower resolution images. The illumination invariance is achieved by representing each face as a combination of harmonic images within the weighting function of the particle filter. We demonstrate the performance of our approach on the IMM Face Database and the Extended Yale Face Database and show that it out performs the Active Shape Models approach

    SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning

    Full text link
    In fisheye images, rich distinct distortion patterns are regularly distributed in the image plane. These distortion patterns are independent of the visual content and provide informative cues for rectification. To make the best of such rectification cues, we introduce SimFIR, a simple framework for fisheye image rectification based on self-supervised representation learning. Technically, we first split a fisheye image into multiple patches and extract their representations with a Vision Transformer (ViT). To learn fine-grained distortion representations, we then associate different image patches with their specific distortion patterns based on the fisheye model, and further subtly design an innovative unified distortion-aware pretext task for their learning. The transfer performance on the downstream rectification task is remarkably boosted, which verifies the effectiveness of the learned representations. Extensive experiments are conducted, and the quantitative and qualitative results demonstrate the superiority of our method over the state-of-the-art algorithms as well as its strong generalization ability on real-world fisheye images.Comment: Accepted to ICCV 202
    • …
    corecore