View-dependent representation of shape and appearance from multiple view video.

Abstract

Over the past decade, markerless performance capture, through multiple synchronised cameras, has emerged as an alternative to traditional motion capture techniques, allowing the simultaneous acquisition of shape, motion and appearance. This technology is capable of capturing the subtle details of human motion, e.g. clothing, skin and hair dynamics, which cannot be achieved through current marker based capture techniques. Markerless performance capture has the potential to revolutionise digital content creation in many creative industries, but must overcome several hurdles before it can be seen as a practical mainstream technology. One limitation of the technology is the enormous size of the generated data. This thesis addresses issues surrounding compact appearance representation of virtual characters generated through markerless performance capture, optimisation of the underlying 3D geometry and delivery of interactive content over the internet. Current approaches to multiple camera texture representation effectively reduce the storage requirements by discarding huge amounts of view dependent and dynamic appearance information. This information is important for reproducing the realism of the captured multiple view video. The first contribution of this thesis introduces a novel multiple layer texture representation (MLTR) for multiple view video. The MLTR preserves dynamic, view dependent appearance information by resampling the captured frames into a hierarchical set of texture maps ordered by surface visibility. The MLTR also enables computationally efficient view-dependent rendering by pre-computing visibility testing and reduces projective texturing to a simple texture lookup. The representation is quantitatively evaluated and shown to reduce the storage cost by > 90% without a significant effect on visual quality. The second contribution outlines the ideal properties for the optimal representation of 4D video and takes steps in achieving this goal. Using the MLTR, spatial and temporal consistency is enforced using a Markov random field framework, allowing video compression algorithms to make further storage reductions through increased spatial and temporal redundancies. An optical flow-based multiple camera alignment method is also introduced to reduce visual artefacts, such as blurring and ghosting, that are caused by approximate geometry and camera calibration errors. This results in clearer and sharper textures with a lower storage footprint. In order to facilitate high quality free-viewpoint rendering, two shape optimisation methods are proposed. The first combines the strengths of the visual hull, multiple view stereo and temporally consistent geometry to match visually important features using a non-rigid iterative closest point method. The second is based on a bundle adjustment formulation which jointly refines shape and calibration. While, these methods achieve the objective of enhancing the geometry and/or camera calibration parameters, further research is required to improve the resulting shape. Finally, it is shown how the methods developed in this thesis could be used to deliver interactive 4D video to consumers via a WebGL enabled internet browser, e.g. Firefox or Chrome. Existing methods for parametric motion graphs are adapted and combined with an efficient WebGL renderer to allow interactive 4D character delivery over the Internet. This demonstrates for the first time that 4D video has the potential to provide interactive content via the internet which opens this technology up to the widest possible audience

    Similar works