94 research outputs found
ATT3D: Amortized Text-to-3D Object Synthesis
Text-to-3D modelling has seen exciting progress by combining generative
text-to-image models with image-to-3D methods like Neural Radiance Fields.
DreamFusion recently achieved high-quality results but requires a lengthy,
per-prompt optimization to create 3D objects. To address this, we amortize
optimization over text prompts by training on many prompts simultaneously with
a unified model, instead of separately. With this, we share computation across
a prompt set, training in less time than per-prompt optimization. Our framework
- Amortized text-to-3D (ATT3D) - enables knowledge-sharing between prompts to
generalize to unseen setups and smooth interpolations between text for novel
assets and simple animations.Comment: 22 pages, 20 figure
Variational Barycentric Coordinates
We propose a variational technique to optimize for generalized barycentric
coordinates that offers additional control compared to existing models. Prior
work represents barycentric coordinates using meshes or closed-form formulae,
in practice limiting the choice of objective function. In contrast, we directly
parameterize the continuous function that maps any coordinate in a polytope's
interior to its barycentric coordinates using a neural field. This formulation
is enabled by our theoretical characterization of barycentric coordinates,
which allows us to construct neural fields that parameterize the entire
function class of valid coordinates. We demonstrate the flexibility of our
model using a variety of objective functions, including multiple smoothness and
deformation-aware energies; as a side contribution, we also present
mathematically-justified means of measuring and minimizing objectives like
total variation on discontinuous neural fields. We offer a practical
acceleration strategy, present a thorough validation of our algorithm, and
demonstrate several applications.Comment: https://anadodik.github.io
Learning 3D Shape Completion under Weak Supervision
We address the problem of 3D shape completion from sparse and noisy point
clouds, a fundamental problem in computer vision and robotics. Recent
approaches are either data-driven or learning-based: Data-driven approaches
rely on a shape model whose parameters are optimized to fit the observations;
Learning-based approaches, in contrast, avoid the expensive optimization step
by learning to directly predict complete shapes from incomplete observations in
a fully-supervised setting. However, full supervision is often not available in
practice. In this work, we propose a weakly-supervised learning-based approach
to 3D shape completion which neither requires slow optimization nor direct
supervision. While we also learn a shape prior on synthetic data, we amortize,
i.e., learn, maximum likelihood fitting using deep neural networks resulting in
efficient shape completion without sacrificing accuracy. On synthetic
benchmarks based on ShapeNet and ModelNet as well as on real robotics data from
KITTI and Kinect, we demonstrate that the proposed amortized maximum likelihood
approach is able to compete with recent fully supervised baselines and
outperforms data-driven approaches, while requiring less supervision and being
significantly faster
SIMD column-parallel polygon rendering
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (p. 171-173).by Matthew Willard Eldridge.M.S
Frequency Based Radiance Cache for Rendering Animations
International audienceWe propose a method to render animation sequences with direct distant lighting that only shades a fraction of the total pixels. We leverage frequency-based analyses of light transport to determine shading and image sampling rates across an animation using a samples cache. To do so, we derive frequency bandwidths that account for the complexity of distant lights, visibility, BRDF, and temporal coherence during animation. We finaly apply a cross-bilateral filter when rendering our final images from sparse sets of shading points placed according to our frequency-based oracles (generally < 25% of the pixels, per frame)
Memory sharing for interactive ray tracing on clusters
ManuscriptWe present recent results in the application of distributed shared memory to image parallel ray tracing on clusters. Image parallel rendering is traditionally limited to scenes that are small enough to be replicated in the memory of each node, because any processor may require access to any piece of the scene. We solve this problem by making all of a cluster's memory available through software distributed shared memory layers. With gigabit ethernet connections, this mechanism is sufficiently fast for interactive rendering of multi-gigabyte datasets. Object- and page-based distributed shared memories are compared, and optimizations for efficient memory use are discussed
Architectures for online simulation-based inference applied to robot motion planning
Robotic systems have enjoyed significant adoption in industrial and field applications
in structured environments, where clear specifications of the task and observations are
available. Deploying robots in unstructured and dynamic environments remains a
challenge, being addressed through emerging advances in machine learning. The key
open issues in this area include the difficulty of achieving coverage of all factors of
variation in the domain of interest, satisfying safety constraints, etc. One tool that has
played a crucial role in addressing these issues is simulation - which is used to generate
data, and sometimes as a world representation within the decision-making loop.
When physical simulation modules are used in this way, a number of computational
problems arise. Firstly, a suitable simulation representation and fidelity is required
for the specific task of interest. Secondly, we need to perform parameter inference of
physical variables being used in the simulation models. Thirdly, there is the need for
data assimilation, which must be achieved in real-time if the resulting model is to be
used within the online decision-making loop. These are the motivating problems for
this thesis.
In the first section of the thesis, we tackle the inference problem with respect to
a fluid simulation model, where a sensorised UAV performs path planning with the
objective of acquiring data including gas concentration/identity and IMU-based wind
estimation readings. The task for the UAV is to localise the source of a gas leak, while
accommodating the subsequent dispersion of the gas in windy conditions. We present
a formulation of this problem that allows us to perform online and real-time active
inference efficiently through problem-specific simplifications.
In the second section of the thesis, we explore the problem of robot motion planning
when the true state is not fully observable, and actions influence how much of the
state is subsequently observed. This is motivated by the practical problem of a robot
performing suction in the surgical automation setting. The objective is the efficient
removal of liquid while respecting a safety constraint - to not touch the underlying
tissue if possible. If the problem were represented in full generality, as one of planning
under uncertainty and hidden state, it could be hard to find computationally efficient
solutions. Once again, we make problem-specific simplifications. Crucially, instead of
reasoning in general about fluid flows and arbitrary surfaces, we exploit the observations
that the decision can be informed by the contour tree skeleton of the volume, and the
configurations in which the fluid would come to rest if unperturbed. This allows us
to address the problem as one of iterative shortest path computation, whose costs are
informed by a model estimating the shape of the underlying surface.
In the third and final section of the thesis, we propose a model for real-time parameter
estimation directly from raw pixel observations. Through the use of a Variational
Recurrent Neural Network model, where the latent space is further structured by
penalising for fit to data from a physical simulation, we devise an efficient online
inference scheme. This is first shown in the context of a representative dynamic
manipulation task for a robot. This task involves reasoning about a bouncing ball that it
must catch – using as input the raw video from an environment-mounted camera and
accommodating noise and variations in the object and environmental conditions. We
then show that the same architecture lends itself to solving inference problems involving
more complex dynamics, by applying this to measurement inversion of ultrafast X-Ray
scattering data to infer molecular geometry
Transformer-Based Learned Optimization
We propose a new approach to learned optimization where we represent the
computation of an optimizer's update step using a neural network. The
parameters of the optimizer are then learned by training on a set of
optimization tasks with the objective to perform minimization efficiently. Our
innovation is a new neural network architecture, Optimus, for the learned
optimizer inspired by the classic BFGS algorithm. As in BFGS, we estimate a
preconditioning matrix as a sum of rank-one updates but use a Transformer-based
neural network to predict these updates jointly with the step length and
direction. In contrast to several recent learned optimization-based approaches,
our formulation allows for conditioning across the dimensions of the parameter
space of the target problem while remaining applicable to optimization tasks of
variable dimensionality without retraining. We demonstrate the advantages of
our approach on a benchmark composed of objective functions traditionally used
for the evaluation of optimization algorithms, as well as on the real
world-task of physics-based visual reconstruction of articulated 3d human
motion.Comment: Accepted to the IEEE/CVF Conference on Computer Vision and Pattern
Recognition 2023 (CVPR) in Vancouver, Canad
- …