8 research outputs found

    CamP: Camera Preconditioning for Neural Radiance Fields

    Full text link
    Neural Radiance Fields (NeRF) can be optimized to obtain high-fidelity 3D scene reconstructions of objects and large-scale scenes. However, NeRFs require accurate camera parameters as input -- inaccurate camera parameters result in blurry renderings. Extrinsic and intrinsic camera parameters are usually estimated using Structure-from-Motion (SfM) methods as a pre-processing step to NeRF, but these techniques rarely yield perfect estimates. Thus, prior works have proposed jointly optimizing camera parameters alongside a NeRF, but these methods are prone to local minima in challenging settings. In this work, we analyze how different camera parameterizations affect this joint optimization problem, and observe that standard parameterizations exhibit large differences in magnitude with respect to small perturbations, which can lead to an ill-conditioned optimization problem. We propose using a proxy problem to compute a whitening transform that eliminates the correlation between camera parameters and normalizes their effects, and we propose to use this transform as a preconditioner for the camera parameters during joint optimization. Our preconditioned camera optimization significantly improves reconstruction quality on scenes from the Mip-NeRF 360 dataset: we reduce error rates (RMSE) by 67% compared to state-of-the-art NeRF approaches that do not optimize for cameras like Zip-NeRF, and by 29% relative to state-of-the-art joint optimization approaches using the camera parameterization of SCNeRF. Our approach is easy to implement, does not significantly increase runtime, can be applied to a wide variety of camera parameterizations, and can straightforwardly be incorporated into other NeRF-like models.Comment: SIGGRAPH Asia 2023, Project page: https://camp-nerf.github.i

    Cardiovascular disease risk assessment using a deep-learning-based retinal biomarker: a comparison with existing risk scores.

    Get PDF
    AimsThis study aims to evaluate the ability of a deep-learning-based cardiovascular disease (CVD) retinal biomarker, Reti-CVD, to identify individuals with intermediate- and high-risk for CVD.Methods and resultsWe defined the intermediate- and high-risk groups according to Pooled Cohort Equation (PCE), QRISK3, and modified Framingham Risk Score (FRS). Reti-CVD's prediction was compared to the number of individuals identified as intermediate- and high-risk according to standard CVD risk assessment tools, and sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated to assess the results. In the UK Biobank, among 48 260 participants, 20 643 (42.8%) and 7192 (14.9%) were classified into the intermediate- and high-risk groups according to PCE, and QRISK3, respectively. In the Singapore Epidemiology of Eye Diseases study, among 6810 participants, 3799 (55.8%) were classified as intermediate- and high-risk group according to modified FRS. Reti-CVD identified PCE-based intermediate- and high-risk groups with a sensitivity, specificity, PPV, and NPV of 82.7%, 87.6%, 86.5%, and 84.0%, respectively. Reti-CVD identified QRISK3-based intermediate- and high-risk groups with a sensitivity, specificity, PPV, and NPV of 82.6%, 85.5%, 49.9%, and 96.6%, respectively. Reti-CVD identified intermediate- and high-risk groups according to the modified FRS with a sensitivity, specificity, PPV, and NPV of 82.1%, 80.6%, 76.4%, and 85.5%, respectively.ConclusionThe retinal photograph biomarker (Reti-CVD) was able to identify individuals with intermediate and high-risk for CVD, in accordance with existing risk assessment tools

    Scene Rerendering

    No full text
    Thesis (Ph.D.)--University of Washington, 2022Taking a good photograph can be a time-consuming process, and it usually takes several attempts to capture a moment correctly. This difficulty stems from the many factors that make up a photo, such as framing, perspective, exposure, focus, or subject pose. Getting even one of these factors wrong can spoil a picture, even if the rest are perfect. To make matters worse, many of these factors are often out of our control; for example, a wind gust may displace the subject's hair, or a bird may fly by and occlude the shot. What if we could go back and fix some of these aspects? In my thesis, I explore techniques for "scene rerendering" which enable rich modification of media after capture. First, I propose Nerfies, the first method capable of photo-realistically reconstructing a non-rigidly deforming scene using photos and videos captured casually from mobiles phones. Nerfies augments neural radiance fields (NeRF) by optimizing an additional continuous volumetric deformation field that warps each observed point into a canonical 5D NeRF. I show that these NeRF-like deformation fields are prone to local minima and propose a coarse-to-fine optimization method that allows for more robust optimization. By adopting principles from geometry processing and physical simulation to NeRF-like models, I also propose an elastic regularization of the deformation field that further improves robustness. I demonstrate how Nerfies can turn casually captured selfie photos/videos into deformable NeRF models that allow for photorealistic renderings of the subject from arbitrary viewpoints. Deformation-based approaches such as Nerfies struggle to model changes in topology (e.g., slicing a lemon), as topological changes require a discontinuity in the deformation field, but these deformation fields are necessarily continuous. I address this limitation in HyperNeRF, in which I propose lifting NeRFs into a higher-dimensional space and by representing the 5D radiance field corresponding to each input image as a slice through this "hyperspace." This approach is inspired by level set methods, which model the evolution of surfaces as slices through a higher dimensional surface. Next, I present PhotoShape, an approach that creates photorealistic, relightable 3D models automatically. PhotoShape automatically assigns high-quality, realistic appearance models to large-scale 3D shape collections. By generating many synthetic renderings, I train a convolutional neural network to classify materials in real photos and employ 3D-2D alignment techniques to transfer materials to different parts of each shape model. The key idea is to jointly leverage three types of online data – shape collections, material collections, and photo collections, using the photos as reference to guide the assignment of materials to shapes. Finally, I show how we can exploit methods for scene rerendering to solve \emph{inverse} problems.I propose LatentFusion, a framework for performing 3D reconstruction and rendering using a neural network. This neural network takes posed images of an object as input and can render it from any novel viewpoint. I show how LatentFusion can be used for 6D object pose estimation by optimizing the input pose as a free parameter using gradient descent. Also, since this method incorporates objects at inference time, it can perform pose estimation on unseen objects without additional training--- an immense benefit over existing methods which require training a separate network for every new object
    corecore