105 research outputs found

    Single View Modeling and View Synthesis

    Get PDF
    This thesis develops new algorithms to produce 3D content from a single camera. Today, amateurs can use hand-held camcorders to capture and display the 3D world in 2D, using mature technologies. However, there is always a strong desire to record and re-explore the 3D world in 3D. To achieve this goal, current approaches usually make use of a camera array, which suffers from tedious setup and calibration processes, as well as lack of portability, limiting its application to lab experiments. In this thesis, I try to produce the 3D contents using a single camera, making it as simple as shooting pictures. It requires a new front end capturing device rather than a regular camcorder, as well as more sophisticated algorithms. First, in order to capture the highly detailed object surfaces, I designed and developed a depth camera based on a novel technique called light fall-off stereo (LFS). The LFS depth camera outputs color+depth image sequences and achieves 30 fps, which is necessary for capturing dynamic scenes. Based on the output color+depth images, I developed a new approach that builds 3D models of dynamic and deformable objects. While the camera can only capture part of a whole object at any instance, partial surfaces are assembled together to form a complete 3D model by a novel warping algorithm. Inspired by the success of single view 3D modeling, I extended my exploration into 2D-3D video conversion that does not utilize a depth camera. I developed a semi-automatic system that converts monocular videos into stereoscopic videos, via view synthesis. It combines motion analysis with user interaction, aiming to transfer as much depth inferring work from the user to the computer. I developed two new methods that analyze the optical flow in order to provide additional qualitative depth constraints. The automatically extracted depth information is presented in the user interface to assist with user labeling work. In this thesis, I developed new algorithms to produce 3D contents from a single camera. Depending on the input data, my algorithm can build high fidelity 3D models for dynamic and deformable objects if depth maps are provided. Otherwise, it can turn the video clips into stereoscopic video

    Scene relighting and editing for improved object insertion

    Get PDF
    Abstract. The goal of this thesis is to develop a scene relighting and object insertion pipeline using Neural Radiance Fields (NeRF) to incorporate one or more objects into an outdoor environment scene. The output is a 3D mesh that embodies decomposed bidirectional reflectance distribution function (BRDF) characteristics, which interact with varying light source positions and strengths. To achieve this objective, the thesis is divided into two sub-tasks. The first sub-task involves extracting visual information about the outdoor environment from a sparse set of corresponding images. A neural representation is constructed, providing a comprehensive understanding of the constituent elements, such as materials, geometry, illumination, and shadows. The second sub-task involves generating a neural representation of the inserted object using either real-world images or synthetic data. To accomplish these objectives, the thesis draws on existing literature in computer vision and computer graphics. Different approaches are assessed to identify their advantages and disadvantages, with detailed descriptions of the chosen techniques provided, highlighting their functioning to produce the ultimate outcome. Overall, this thesis aims to provide a framework for compositing and relighting that is grounded in NeRF and allows for the seamless integration of objects into outdoor environments. The outcome of this work has potential applications in various domains, such as visual effects, gaming, and virtual reality

    Envisioning a Next Generation Extended Reality Conferencing System with Efficient Photorealistic Human Rendering

    Full text link
    Meeting online is becoming the new normal. Creating an immersive experience for online meetings is a necessity towards more diverse and seamless environments. Efficient photorealistic rendering of human 3D dynamics is the core of immersive meetings. Current popular applications achieve real-time conferencing but fall short in delivering photorealistic human dynamics, either due to limited 2D space or the use of avatars that lack realistic interactions between participants. Recent advances in neural rendering, such as the Neural Radiance Field (NeRF), offer the potential for greater realism in metaverse meetings. However, the slow rendering speed of NeRF poses challenges for real-time conferencing. We envision a pipeline for a future extended reality metaverse conferencing system that leverages monocular video acquisition and free-viewpoint synthesis to enhance data and hardware efficiency. Towards an immersive conferencing experience, we explore an accelerated NeRF-based free-viewpoint synthesis algorithm for rendering photorealistic human dynamics more efficiently. We show that our algorithm achieves comparable rendering quality while performing training and inference 44.5% and 213% faster than state-of-the-art methods, respectively. Our exploration provides a design basis for constructing metaverse conferencing systems that can handle complex application scenarios, including dynamic scene relighting with customized themes and multi-user conferencing that harmonizes real-world people into an extended world.Comment: Accepted to CVPR 2023 ECV Worksho

    Neural Radiance Fields: Past, Present, and Future

    Full text link
    The various aspects like modeling and interpreting 3D environments and surroundings have enticed humans to progress their research in 3D Computer Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in Computer Graphics, Robotics, Computer Vision, and the possible scope of High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D models have gained traction from res with more than 1000 preprints related to NeRFs published. This paper serves as a bridge for people starting to study these fields by building on the basics of Mathematics, Geometry, Computer Vision, and Computer Graphics to the difficulties encountered in Implicit Representations at the intersection of all these disciplines. This survey provides the history of rendering, Implicit Learning, and NeRFs, the progression of research on NeRFs, and the potential applications and implications of NeRFs in today's world. In doing so, this survey categorizes all the NeRF-related research in terms of the datasets used, objective functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation

    3D visualization processes for recreating and studying organismal form

    Get PDF
    The study of biological form is a vital goal of evolutionary biology and functional morphology. We review an emerging set of methods that allow scientists to create and study accurate 3D models of living organisms and animate those models for biomechanical and fluid dynamic analyses. The methods for creating such models include 3D photogrammetry, laser and CT-scanning, and 3D software. New multi-camera devices can be used to create accurate 3D models of living animals in the wild and captivity. New websites and virtual reality/augmented reality devices now enable the visualization and sharing of these data. We provide examples of these approaches for animals ranging from large whales to lizards and show applications for several areas: Natural history collections; body condition/scaling, bioinspired robotics, computational fluids dynamics (CFD), machine learning, and education. We provide two data sets to demonstrate the efficacy of CFD and machine learning approaches and conclude with a prospectus

    Automatic 3D facial modelling with deformable models.

    Get PDF
    Facial modelling and animation has been an active research subject in computer graphics since the 1970s. Due to extremely complex biomechanical structures of human faces and peoples visual familiarity with human faces, modelling and animating realistic human faces is still one of greatest challenges in computer graphics. Since we are so familiar with human faces and very sensitive to unnatural subtle changes in human faces, it usually requires a tremendous amount of artistry and manual work to create a convincing facial model and animation. There is a clear need of developing automatic techniques for facial modelling in order to reduce manual labouring. In order to obtain a realistic facial model of an individual, it is now common to make use of 3D scanners to capture range scans from the individual and then fit a template to the range scans. However, most existing template-fitting methods require manually selected landmarks to warp the template to the range scans. It would be tedious to select landmarks by hand over a large set of range scans. Another way to reduce repeated work is synthesis by reusing existing data. One example is expression cloning, which copies facial expression from one face to another instead of creating them from scratch. This aim of this study is to develop a fully automatic framework for template-based facial modelling, facial expression transferring and facial expression tracking from range scans. In this thesis, the author developed an extension of the iterative closest points (ICP) algorithm, which is able to match a template with range scans in different scales, and a deformable model, which can be used to recover the shapes of range scans and to establish correspondences between facial models. With the registration method and the deformable model, the author proposed a fully automatic approach to reconstructing facial models and textures from range scans without re-quiring any manual interventions. In order to reuse existing data for facial modelling, the author formulated and solved the problem of facial expression transferring in the framework of discrete differential geometry. The author also applied his methods to face tracking for 4D range scans. The results demonstrated the robustness of the registration method and the capabilities of the deformable model. A number of possible directions for future work were pointed out

    From Capture to Display: A Survey on Volumetric Video

    Full text link
    Volumetric video, which offers immersive viewing experiences, is gaining increasing prominence. With its six degrees of freedom, it provides viewers with greater immersion and interactivity compared to traditional videos. Despite their potential, volumetric video services poses significant challenges. This survey conducts a comprehensive review of the existing literature on volumetric video. We firstly provide a general framework of volumetric video services, followed by a discussion on prerequisites for volumetric video, encompassing representations, open datasets, and quality assessment metrics. Then we delve into the current methodologies for each stage of the volumetric video service pipeline, detailing capturing, compression, transmission, rendering, and display techniques. Lastly, we explore various applications enabled by this pioneering technology and we present an array of research challenges and opportunities in the domain of volumetric video services. This survey aspires to provide a holistic understanding of this burgeoning field and shed light on potential future research trajectories, aiming to bring the vision of volumetric video to fruition.Comment: Submitte
    corecore