1,194 research outputs found
Covariate Analysis for View-point Independent Gait Recognition
Many studies have shown that gait can be deployed as a biometric. Few of these have addressed the effects of view-point and covariate factors on the recognition process. We describe the first analysis which combines view-point invariance for gait recognition which is based on a model-based pose estimation approach from a single un-calibrated camera. A set of experiments are carried out to explore how such factors including clothing, carrying conditions and view-point can affect the identification process using gait. Based on a covariate-based probe dataset of over 270 samples, a recognition rate of 73.4% is achieved using the KNN classifier. This confirms that people identification using dynamic gait features is still perceivable with better recognition rate even under the different covariate factors. As such, this is an important step in translating research from the laboratory to a surveillance environment
An initial matching and mapping for dense 3D object tracking in augmented reality applications
Augmented Reality (AR) applications rely on efficient and robust methods of tracking. One type of tracking uses dense 3D point data representations of the object to track. As opposed to sparse, dense tracking approaches are highly accurate and precise by considering all of the available data from a camera. A major challenge to dense tracking is that it requires a rough initial matching and mapping to begin. A matching means that from a known object, we can determine the object exists in the scene, and a mapping means that we can identify the position and orientation of an object with respect to the camera. Current methods to provide the initial matching and mapping require the user to manually input parameters, or wait an extended amount of time for a brute force automatic approach.
The research presented in this thesis develops an automatic initial matching and mapping for dense tracking for AR, facilitating natural AR systems that track 3D objects. To do this, an existing offline method for registration of ideal 3D object point sets is proposed as a starting point. The method is improved and optimized in four steps to address the requirements and challenges for dense tracking in AR with a noisy consumer sensor. A series of experiments verifies the suitability of the optimizations, using increasingly large and more complex scene point clouds, and the results are presented
Fitting a 3D Morphable Model to Edges: A Comparison Between Hard and Soft Correspondences
We propose a fully automatic method for fitting a 3D morphable model to
single face images in arbitrary pose and lighting. Our approach relies on
geometric features (edges and landmarks) and, inspired by the iterated closest
point algorithm, is based on computing hard correspondences between model
vertices and edge pixels. We demonstrate that this is superior to previous work
that uses soft correspondences to form an edge-derived cost surface that is
minimised by nonlinear optimisation.Comment: To appear in ACCV 2016 Workshop on Facial Informatic
Multi-View Unsupervised Image Generation with Cross Attention Guidance
The growing interest in novel view synthesis, driven by Neural Radiance Field
(NeRF) models, is hindered by scalability issues due to their reliance on
precisely annotated multi-view images. Recent models address this by
fine-tuning large text2image diffusion models on synthetic multi-view data.
Despite robust zero-shot generalization, they may need post-processing and can
face quality issues due to the synthetic-real domain gap. This paper introduces
a novel pipeline for unsupervised training of a pose-conditioned diffusion
model on single-category datasets. With the help of pretrained self-supervised
Vision Transformers (DINOv2), we identify object poses by clustering the
dataset through comparing visibility and locations of specific object parts.
The pose-conditioned diffusion model, trained on pose labels, and equipped with
cross-frame attention at inference time ensures cross-view consistency, that is
further aided by our novel hard-attention guidance. Our model, MIRAGE,
surpasses prior work in novel view synthesis on real images. Furthermore,
MIRAGE is robust to diverse textures and geometries, as demonstrated with our
experiments on synthetic images generated with pretrained Stable Diffusion
LiveCap: Real-time Human Performance Capture from Monocular Video
We present the first real-time human performance capture approach that
reconstructs dense, space-time coherent deforming geometry of entire humans in
general everyday clothing from just a single RGB video. We propose a novel
two-stage analysis-by-synthesis optimization whose formulation and
implementation are designed for high performance. In the first stage, a skinned
template model is jointly fitted to background subtracted input video, 2D and
3D skeleton joint positions found using a deep neural network, and a set of
sparse facial landmark detections. In the second stage, dense non-rigid 3D
deformations of skin and even loose apparel are captured based on a novel
real-time capable algorithm for non-rigid tracking using dense photometric and
silhouette constraints. Our novel energy formulation leverages automatically
identified material regions on the template to model the differing non-rigid
deformation behavior of skin and apparel. The two resulting non-linear
optimization problems per-frame are solved with specially-tailored
data-parallel Gauss-Newton solvers. In order to achieve real-time performance
of over 25Hz, we design a pipelined parallel architecture using the CPU and two
commodity GPUs. Our method is the first real-time monocular approach for
full-body performance capture. Our method yields comparable accuracy with
off-line performance capture techniques, while being orders of magnitude
faster
Real-time Virtual Object Insertion for Moving 360° Videos
We propose an approach for real-time insertion of virtual objects into pre-recorded moving-camera 360° video. First, we reconstruct camera motion and sparse scene content via structure from motion on stitched equirectangular video. Then, to plausibly reproduce real-world lighting conditions for virtual objects, we use inverse tone mapping to recover high dynamic range environment maps which vary spatially along the camera path. We implement our approach into the Unity rendering engine for real-time virtual object insertion via differential rendering, with dynamic lighting, image-based shadowing, and user interaction. This expands the use and flexibility of 360° video for interactive computer graphics and visual effects applications
Robust Estimation of Trifocal Tensors Using Natural Features for Augmented Reality Systems
Augmented reality deals with the problem of dynamically augmenting or enhancing the real world with computer generated virtual scenes. Registration is one of the most pivotal problems currently limiting AR applications. In this paper, a novel registration method using natural features based on online estimation of trifocal tensors is proposed. This method consists of two stages: offline initialization and online registration. Initialization involves specifying four points in two reference images respectively to build the world coordinate system on which a virtual object will be augmented. In online registration, the natural feature correspondences detected from the reference views are tracked in the current frame to build the feature triples. Then these triples are used to estimate the corresponding trifocal tensors in the image sequence by which the four specified points are transferred to compute the registration matrix for augmentation. The estimated registration matrix will be used as an initial estimate for a nonlinear optimization method that minimizes the actual residual errors based on the Levenberg-Marquardt (LM) minimization method, thus making the results more robust and stable. This paper also proposes a robust method for estimating the trifocal tensors, where a modified RANSAC algorithm is used to remove outliers. Compared with standard RANSAC, our method can significantly reduce computation complexity, while overcoming the disturbance of mismatches. Some experiments have been carried out to demonstrate the validity of the proposed approach
Automatic Fitting of a Deformable Face Mask Using a Single Image
We propose an automatic method for person-independent fitting of a deformable 3D face mask model under varying illumination conditions. Principle Component Analysis is utilised to build a face model which is then used within a particle filter based approach to fit the mask to the image. By subdividing a coarse mask and using a novel texture mapping technique, we further apply the 3D face model to fit into lower resolution images. The illumination invariance is achieved by representing each face as a combination of harmonic images within the weighting function of the particle filter. We demonstrate the performance of our approach on the IMM Face Database and the Extended Yale Face Database and show that it out performs the Active Shape Models approach
SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning
In fisheye images, rich distinct distortion patterns are regularly
distributed in the image plane. These distortion patterns are independent of
the visual content and provide informative cues for rectification. To make the
best of such rectification cues, we introduce SimFIR, a simple framework for
fisheye image rectification based on self-supervised representation learning.
Technically, we first split a fisheye image into multiple patches and extract
their representations with a Vision Transformer (ViT). To learn fine-grained
distortion representations, we then associate different image patches with
their specific distortion patterns based on the fisheye model, and further
subtly design an innovative unified distortion-aware pretext task for their
learning. The transfer performance on the downstream rectification task is
remarkably boosted, which verifies the effectiveness of the learned
representations. Extensive experiments are conducted, and the quantitative and
qualitative results demonstrate the superiority of our method over the
state-of-the-art algorithms as well as its strong generalization ability on
real-world fisheye images.Comment: Accepted to ICCV 202
- …