25 research outputs found
4D Temporally Coherent Light-field Video
Light-field video has recently been used in virtual and augmented reality
applications to increase realism and immersion. However, existing light-field
methods are generally limited to static scenes due to the requirement to
acquire a dense scene representation. The large amount of data and the absence
of methods to infer temporal coherence pose major challenges in storage,
compression and editing compared to conventional video. In this paper, we
propose the first method to extract a spatio-temporally coherent light-field
video representation. A novel method to obtain Epipolar Plane Images (EPIs)
from a spare light-field camera array is proposed. EPIs are used to constrain
scene flow estimation to obtain 4D temporally coherent representations of
dynamic light-fields. Temporal coherence is achieved on a variety of
light-field datasets. Evaluation of the proposed light-field scene flow against
existing multi-view dense correspondence approaches demonstrates a significant
improvement in accuracy of temporal coherence.Comment: Published in 3D Vision (3DV) 201
Temporally Coherent General Dynamic Scene Reconstruction
Existing techniques for dynamic scene reconstruction from multiple
wide-baseline cameras primarily focus on reconstruction in controlled
environments, with fixed calibrated cameras and strong prior constraints. This
paper introduces a general approach to obtain a 4D representation of complex
dynamic scenes from multi-view wide-baseline static or moving cameras without
prior knowledge of the scene structure, appearance, or illumination.
Contributions of the work are: An automatic method for initial coarse
reconstruction to initialize joint estimation; Sparse-to-dense temporal
correspondence integrated with joint multi-view segmentation and reconstruction
to introduce temporal coherence; and a general robust approach for joint
segmentation refinement and dense reconstruction of dynamic scenes by
introducing shape constraint. Comparison with state-of-the-art approaches on a
variety of complex indoor and outdoor scenes, demonstrates improved accuracy in
both multi-view segmentation and dense reconstruction. This paper demonstrates
unsupervised reconstruction of complete temporally coherent 4D scene models
with improved non-rigid object segmentation and shape reconstruction and its
application to free-viewpoint rendering and virtual reality.Comment: Submitted to IJCV 2019. arXiv admin note: substantial text overlap
with arXiv:1603.0338
View-dependent representation of shape and appearance from multiple view video.
Over the past decade, markerless performance capture, through multiple synchronised cameras, has emerged as an alternative to traditional motion capture techniques, allowing the simultaneous acquisition of shape, motion and appearance. This technology is capable of capturing the subtle details of human motion, e.g. clothing, skin and hair dynamics, which cannot be achieved through current marker based capture techniques. Markerless performance capture has the potential to revolutionise digital content creation in many creative industries, but must overcome several hurdles before it can be seen as a practical mainstream technology. One limitation of the technology is the enormous size of the generated data. This thesis addresses issues surrounding compact appearance representation of virtual characters generated through markerless performance capture, optimisation of the underlying 3D geometry and delivery of interactive content over the internet. Current approaches to multiple camera texture representation effectively reduce the storage requirements by discarding huge amounts of view dependent and dynamic appearance information. This information is important for reproducing the realism of the captured multiple view video. The first contribution of this thesis introduces a novel multiple layer texture representation (MLTR) for multiple view video. The MLTR preserves dynamic, view dependent appearance information by resampling the captured frames into a hierarchical set of texture maps ordered by surface visibility. The MLTR also enables computationally efficient view-dependent rendering by pre-computing visibility testing and reduces projective texturing to a simple texture lookup. The representation is quantitatively evaluated and shown to reduce the storage cost by > 90% without a significant effect on visual quality. The second contribution outlines the ideal properties for the optimal representation of 4D video and takes steps in achieving this goal. Using the MLTR, spatial and temporal consistency is enforced using a Markov random field framework, allowing video compression algorithms to make further storage reductions through increased spatial and temporal redundancies. An optical flow-based multiple camera alignment method is also introduced to reduce visual artefacts, such as blurring and ghosting, that are caused by approximate geometry and camera calibration errors. This results in clearer and sharper textures with a lower storage footprint. In order to facilitate high quality free-viewpoint rendering, two shape optimisation methods are proposed. The first combines the strengths of the visual hull, multiple view stereo and temporally consistent geometry to match visually important features using a non-rigid iterative closest point method. The second is based on a bundle adjustment formulation which jointly refines shape and calibration. While, these methods achieve the objective of enhancing the geometry and/or camera calibration parameters, further research is required to improve the resulting shape. Finally, it is shown how the methods developed in this thesis could be used to deliver interactive 4D video to consumers via a WebGL enabled internet browser, e.g. Firefox or Chrome. Existing methods for parametric motion graphs are adapted and combined with an efficient WebGL renderer to allow interactive 4D character delivery over the Internet. This demonstrates for the first time that 4D video has the potential to provide interactive content via the internet which opens this technology up to the widest possible audience
View-dependent representation of shape and appearance from multiple view video.
Over the past decade, markerless performance capture, through multiple synchronised cameras, has emerged as an alternative to traditional motion capture techniques, allowing the simultaneous acquisition of shape, motion and appearance. This technology is capable of capturing the subtle details of human motion, e.g. clothing, skin and hair dynamics, which cannot be achieved through current marker based capture techniques. Markerless performance capture has the potential to revolutionise digital content creation in many creative industries, but must overcome several hurdles before it can be seen as a practical mainstream technology. One limitation of the technology is the enormous size of the generated data. This thesis addresses issues surrounding compact appearance representation of virtual characters generated through markerless performance capture, optimisation of the underlying 3D geometry and delivery of interactive content over the internet.
Current approaches to multiple camera texture representation effectively reduce the storage requirements by discarding huge amounts of view dependent and dynamic appearance information. This information is important for reproducing the realism of the captured multiple view video. The first contribution of this thesis introduces a novel multiple layer texture representation (MLTR) for multiple view video. The MLTR preserves dynamic, view dependent appearance information by resampling the captured frames into a hierarchical set of texture maps ordered by surface visibility. The MLTR also enables computationally efficient view-dependent rendering by pre-computing visibility testing and reduces projective texturing to a simple texture lookup. The representation is quantitatively evaluated and shown to reduce the storage cost by > 90% without a significant effect on visual quality.
The second contribution outlines the ideal properties for the optimal representation of 4D video and takes steps in achieving this goal. Using the MLTR, spatial and temporal consistency is enforced using a Markov random field framework, allowing video compression algorithms to make further storage reductions through increased spatial and temporal redundancies. An optical flow-based multiple camera alignment method is also introduced to reduce visual artefacts, such as blurring and ghosting, that are caused by approximate
geometry and camera calibration errors. This results in clearer and sharper textures with a lower storage footprint.
In order to facilitate high quality free-viewpoint rendering, two shape optimisation methods are proposed. The first combines the strengths of the visual hull, multiple view stereo and temporally consistent geometry to match visually important features using a non-rigid iterative closest point method. The second is based on a bundle adjustment formulation which jointly refines shape and calibration. While, these methods achieve the objective of enhancing the geometry and/or camera calibration parameters, further research is required to improve the resulting shape.
Finally, it is shown how the methods developed in this thesis could be used to deliver interactive 4D video to consumers via a WebGL enabled internet browser, e.g. Firefox or Chrome. Existing methods for parametric motion graphs are adapted and combined with an efficient WebGL renderer to allow interactive 4D character delivery over the Internet. This demonstrates for the first time that 4D video has the potential to provide interactive content via the internet which opens this technology up to the widest possible audience
Hybrid Skeleton Driven Surface Registration for Temporally Consistent Volumetric Video
This paper presents a hybrid skeleton-driven surface registration (HSDSR) approach to generate temporally consistent meshes from multiple view video of human subjects. 2D pose detections from multiple view video are used to estimate 3D skeletal pose on a per-frame basis. The 3D pose is embedded into a 3D surface reconstruction allowing any frame to be reposed into the shape from any other frame in the captured sequence. Skeletal motion transfer is performed by selecting a reference frame from the surface reconstruction data and reposing it to match the pose estimation of other frames in a sequence. This allows an initial coarse alignment to be performed prior to refinement by a patch-based non-rigid mesh deformation. The proposed approach overcomes limitations of previous work by reposing a reference mesh to match the pose of a target mesh reconstruction, providing a closer starting point for further non-rigid mesh deformation. It is shown that the proposed approach is able to achieve comparable results to existing model-based and model-free approaches. Finally, it is demonstrated that this framework provides an intuitive way for artists and animators to edit volumetric video
Online interactive 4D character animation
This paper presents a framework for creating realistic virtual characters
that can be delivered via the Internet and interactively controlled
in a WebGL enabled web-browser. Four-dimensional performance
capture is used to capture realistic human motion and appearance.
The captured data is processed into efficient and compact
representations for geometry and texture. Motions are analysed
against a high-level, user-defined motion graph and suitable
inter- and intra-motion transitions are identified. This processed
data is stored on a webserver and downloaded by a client application
when required. A Javascript-based character animation engine
is used to manage the state of the character which responds to user
input and sends required frames to a WebGL-based renderer for
display. Through the efficient geometry, texture and motion graph
representations, a game character capable of performing a range of
motions can be represented in 40-50 MB of data. This highlights
the potential use of four-dimensional performance capture for creating
web-based content. Datasets are made available for further
research and an online demo is provided
4D Video Textures for Interactive Character Appearance
4D Video Textures (4DVT) introduce a novel representation for rendering video-realistic interactive character animation from a database of 4D actor performance captured in a multiple camera studio. 4D performance capture reconstructs dynamic shape and appearance over time but is limited to free-viewpoint video replay of the same motion. Interactive animation from 4D performance capture has so far been limited to surface shape only. 4DVT is the final piece in the puzzle enabling video-realistic interactive animation through two contributions: a layered view-dependent texture map representation which supports efficient storage, transmission and rendering from multiple view video capture; and a rendering approach that combines multiple 4DVT sequences in a parametric motion space, maintaining video quality rendering of dynamic surface appearance whilst allowing high-level interactive control of character motion and viewpoint. 4DVT is demonstrated for multiple characters and evaluated both quantitatively and through a user-study which confirms that the visual quality of captured video is maintained. The 4DVT representation achieves >90% reduction in size and halves the rendering cost
4D Video Textures for Interactive Character Appearance
4D Video Textures (4DVT) introduce a novel representation for rendering video-realistic interactive character animation from a database of 4D actor performance captured in a multiple camera studio. 4D performance capture reconstructs dynamic shape and appearance over time but is limited to free-viewpoint video replay of the same motion. Interactive animation from 4D performance capture has so far been limited to surface shape only. 4DVT is the final piece in the puzzle enabling video-realistic interactive animation through two contributions: a layered view-dependent texture map representation which supports efficient storage, transmission and rendering from multiple view video capture; and a rendering approach that combines multiple 4DVT sequences in a parametric motion space, maintaining video quality rendering of dynamic surface appearance whilst allowing high-level interactive control of character motion and viewpoint. 4DVT is demonstrated for multiple characters and evaluated both quantitatively and through a user-study which confirms that the visual quality of captured video is maintained. The 4DVT representation achieves >90% reduction in size and halves the rendering cost
A Novel Multiple Camera RGB-D Calibration Approach Using Simulated Annealing
The development of a cost-effective surface scanning system tailored for live animal image capture can play an important role in biomedical research. The primary aim was to introduce a low-cost system, achieving a surface reconstruction error of less than 2mm, and enabling rapid acquisition speeds of approximately 1 second for a complete 360-degree surface capture. Leveraging a five RGB-D camera configuration, our approach offers a simple, low-cost alternative to conventional lab-based 3D scanning setups. Key to our methodology is a novel calibration strategy aimed at refining intrinsic and extrinsic camera parameters simultaneously for improved accuracy. We introduce a novel 3D calibration object, extending existing techniques employing ArUco markers, and implement a depth correction matrix to enhance depth accuracy. By utilizing Simulated Annealing optimization alongside our custom calibration object, we achieve superior results compared to conventional optimization techniques. Our obtained results show that the proposed depth correction method can reduce the reprojection error from 3.12 to 2.89 pixels. Furthermore, despite the simplicity of our reconstruction method, we observe around a 22% improvement in surface reconstruction compared to factory calibration parameters. Our findings underscore the practicality and efficacy of our proposed system, paving the way for enhanced 3D surface reconstruction for real-world surface capture