94 research outputs found

    ATT3D: Amortized Text-to-3D Object Synthesis

    Full text link
    Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields. DreamFusion recently achieved high-quality results but requires a lengthy, per-prompt optimization to create 3D objects. To address this, we amortize optimization over text prompts by training on many prompts simultaneously with a unified model, instead of separately. With this, we share computation across a prompt set, training in less time than per-prompt optimization. Our framework - Amortized text-to-3D (ATT3D) - enables knowledge-sharing between prompts to generalize to unseen setups and smooth interpolations between text for novel assets and simple animations.Comment: 22 pages, 20 figure

    Variational Barycentric Coordinates

    Full text link
    We propose a variational technique to optimize for generalized barycentric coordinates that offers additional control compared to existing models. Prior work represents barycentric coordinates using meshes or closed-form formulae, in practice limiting the choice of objective function. In contrast, we directly parameterize the continuous function that maps any coordinate in a polytope's interior to its barycentric coordinates using a neural field. This formulation is enabled by our theoretical characterization of barycentric coordinates, which allows us to construct neural fields that parameterize the entire function class of valid coordinates. We demonstrate the flexibility of our model using a variety of objective functions, including multiple smoothness and deformation-aware energies; as a side contribution, we also present mathematically-justified means of measuring and minimizing objectives like total variation on discontinuous neural fields. We offer a practical acceleration strategy, present a thorough validation of our algorithm, and demonstrate several applications.Comment: https://anadodik.github.io

    Learning 3D Shape Completion under Weak Supervision

    Full text link
    We address the problem of 3D shape completion from sparse and noisy point clouds, a fundamental problem in computer vision and robotics. Recent approaches are either data-driven or learning-based: Data-driven approaches rely on a shape model whose parameters are optimized to fit the observations; Learning-based approaches, in contrast, avoid the expensive optimization step by learning to directly predict complete shapes from incomplete observations in a fully-supervised setting. However, full supervision is often not available in practice. In this work, we propose a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision. While we also learn a shape prior on synthetic data, we amortize, i.e., learn, maximum likelihood fitting using deep neural networks resulting in efficient shape completion without sacrificing accuracy. On synthetic benchmarks based on ShapeNet and ModelNet as well as on real robotics data from KITTI and Kinect, we demonstrate that the proposed amortized maximum likelihood approach is able to compete with recent fully supervised baselines and outperforms data-driven approaches, while requiring less supervision and being significantly faster

    SIMD column-parallel polygon rendering

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (p. 171-173).by Matthew Willard Eldridge.M.S

    Frequency Based Radiance Cache for Rendering Animations

    Get PDF
    International audienceWe propose a method to render animation sequences with direct distant lighting that only shades a fraction of the total pixels. We leverage frequency-based analyses of light transport to determine shading and image sampling rates across an animation using a samples cache. To do so, we derive frequency bandwidths that account for the complexity of distant lights, visibility, BRDF, and temporal coherence during animation. We finaly apply a cross-bilateral filter when rendering our final images from sparse sets of shading points placed according to our frequency-based oracles (generally < 25% of the pixels, per frame)

    Memory sharing for interactive ray tracing on clusters

    Get PDF
    ManuscriptWe present recent results in the application of distributed shared memory to image parallel ray tracing on clusters. Image parallel rendering is traditionally limited to scenes that are small enough to be replicated in the memory of each node, because any processor may require access to any piece of the scene. We solve this problem by making all of a cluster's memory available through software distributed shared memory layers. With gigabit ethernet connections, this mechanism is sufficiently fast for interactive rendering of multi-gigabyte datasets. Object- and page-based distributed shared memories are compared, and optimizations for efficient memory use are discussed

    Architectures for online simulation-based inference applied to robot motion planning

    Get PDF
    Robotic systems have enjoyed significant adoption in industrial and field applications in structured environments, where clear specifications of the task and observations are available. Deploying robots in unstructured and dynamic environments remains a challenge, being addressed through emerging advances in machine learning. The key open issues in this area include the difficulty of achieving coverage of all factors of variation in the domain of interest, satisfying safety constraints, etc. One tool that has played a crucial role in addressing these issues is simulation - which is used to generate data, and sometimes as a world representation within the decision-making loop. When physical simulation modules are used in this way, a number of computational problems arise. Firstly, a suitable simulation representation and fidelity is required for the specific task of interest. Secondly, we need to perform parameter inference of physical variables being used in the simulation models. Thirdly, there is the need for data assimilation, which must be achieved in real-time if the resulting model is to be used within the online decision-making loop. These are the motivating problems for this thesis. In the first section of the thesis, we tackle the inference problem with respect to a fluid simulation model, where a sensorised UAV performs path planning with the objective of acquiring data including gas concentration/identity and IMU-based wind estimation readings. The task for the UAV is to localise the source of a gas leak, while accommodating the subsequent dispersion of the gas in windy conditions. We present a formulation of this problem that allows us to perform online and real-time active inference efficiently through problem-specific simplifications. In the second section of the thesis, we explore the problem of robot motion planning when the true state is not fully observable, and actions influence how much of the state is subsequently observed. This is motivated by the practical problem of a robot performing suction in the surgical automation setting. The objective is the efficient removal of liquid while respecting a safety constraint - to not touch the underlying tissue if possible. If the problem were represented in full generality, as one of planning under uncertainty and hidden state, it could be hard to find computationally efficient solutions. Once again, we make problem-specific simplifications. Crucially, instead of reasoning in general about fluid flows and arbitrary surfaces, we exploit the observations that the decision can be informed by the contour tree skeleton of the volume, and the configurations in which the fluid would come to rest if unperturbed. This allows us to address the problem as one of iterative shortest path computation, whose costs are informed by a model estimating the shape of the underlying surface. In the third and final section of the thesis, we propose a model for real-time parameter estimation directly from raw pixel observations. Through the use of a Variational Recurrent Neural Network model, where the latent space is further structured by penalising for fit to data from a physical simulation, we devise an efficient online inference scheme. This is first shown in the context of a representative dynamic manipulation task for a robot. This task involves reasoning about a bouncing ball that it must catch – using as input the raw video from an environment-mounted camera and accommodating noise and variations in the object and environmental conditions. We then show that the same architecture lends itself to solving inference problems involving more complex dynamics, by applying this to measurement inversion of ultrafast X-Ray scattering data to infer molecular geometry

    Transformer-Based Learned Optimization

    Full text link
    We propose a new approach to learned optimization where we represent the computation of an optimizer's update step using a neural network. The parameters of the optimizer are then learned by training on a set of optimization tasks with the objective to perform minimization efficiently. Our innovation is a new neural network architecture, Optimus, for the learned optimizer inspired by the classic BFGS algorithm. As in BFGS, we estimate a preconditioning matrix as a sum of rank-one updates but use a Transformer-based neural network to predict these updates jointly with the step length and direction. In contrast to several recent learned optimization-based approaches, our formulation allows for conditioning across the dimensions of the parameter space of the target problem while remaining applicable to optimization tasks of variable dimensionality without retraining. We demonstrate the advantages of our approach on a benchmark composed of objective functions traditionally used for the evaluation of optimization algorithms, as well as on the real world-task of physics-based visual reconstruction of articulated 3d human motion.Comment: Accepted to the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 (CVPR) in Vancouver, Canad
    • …
    corecore