21 research outputs found
Seeing a Rose in Five Thousand Ways
What is a rose, visually? A rose comprises its intrinsics, including the
distribution of geometry, texture, and material specific to its object
category. With knowledge of these intrinsic properties, we may render roses of
different sizes and shapes, in different poses, and under different lighting
conditions. In this work, we build a generative model that learns to capture
such object intrinsics from a single image, such as a photo of a bouquet. Such
an image includes multiple instances of an object type. These instances all
share the same intrinsics, but appear different due to a combination of
variance within these intrinsics and differences in extrinsic factors, such as
pose and illumination. Experiments show that our model successfully learns
object intrinsics (distribution of geometry, texture, and material) for a wide
range of objects, each from a single Internet image. Our method achieves
superior results on multiple downstream tasks, including intrinsic image
decomposition, shape and image generation, view synthesis, and relighting.Comment: Project page: https://cs.stanford.edu/~yzzhang/projects/rose
Self-Supervised Localisation between Range Sensors and Overhead Imagery
Publicly available satellite imagery can be an ubiquitous, cheap, and
powerful tool for vehicle localisation when a prior sensor map is unavailable.
However, satellite images are not directly comparable to data from ground range
sensors because of their starkly different modalities. We present a learned
metric localisation method that not only handles the modality difference, but
is cheap to train, learning in a self-supervised fashion without metrically
accurate ground truth. By evaluating across multiple real-world datasets, we
demonstrate the robustness and versatility of our method for various sensor
configurations. We pay particular attention to the use of millimetre wave
radar, which, owing to its complex interaction with the scene and its immunity
to weather and lighting, makes for a compelling and valuable use case.Comment: Robotics: Science and Systems (RSS) 202
Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark
We introduce Stanford-ORB, a new real-world 3D Object inverse Rendering
Benchmark. Recent advances in inverse rendering have enabled a wide range of
real-world applications in 3D content generation, moving rapidly from research
and commercial use cases to consumer devices. While the results continue to
improve, there is no real-world benchmark that can quantitatively assess and
compare the performance of various inverse rendering methods. Existing
real-world datasets typically only consist of the shape and multi-view images
of objects, which are not sufficient for evaluating the quality of material
recovery and object relighting. Methods capable of recovering material and
lighting often resort to synthetic data for quantitative evaluation, which on
the other hand does not guarantee generalization to complex real-world
environments. We introduce a new dataset of real-world objects captured under a
variety of natural scenes with ground-truth 3D scans, multi-view images, and
environment lighting. Using this dataset, we establish the first comprehensive
real-world evaluation benchmark for object inverse rendering tasks from
in-the-wild scenes, and compare the performance of various existing methods.Comment: NeurIPS 2023 Datasets and Benchmarks Track. The first two authors
contributed equally to this work. Project page:
https://stanfordorb.github.io
SEGA: Structural Entropy Guided Anchor View for Graph Contrastive Learning
In contrastive learning, the choice of ``view'' controls the information that
the representation captures and influences the performance of the model.
However, leading graph contrastive learning methods generally produce views via
random corruption or learning, which could lead to the loss of essential
information and alteration of semantic information. An anchor view that
maintains the essential information of input graphs for contrastive learning
has been hardly investigated. In this paper, based on the theory of graph
information bottleneck, we deduce the definition of this anchor view; put
differently, \textit{the anchor view with essential information of input graph
is supposed to have the minimal structural uncertainty}. Furthermore, guided by
structural entropy, we implement the anchor view, termed \textbf{SEGA}, for
graph contrastive learning. We extensively validate the proposed anchor view on
various benchmarks regarding graph classification under unsupervised,
semi-supervised, and transfer learning and achieve significant performance
boosts compared to the state-of-the-art methods.Comment: ICML'2
MagicPony: Learning Articulated 3D Animals in the Wild
We consider the problem of learning a function that can estimate the 3D
shape, articulation, viewpoint, texture, and lighting of an articulated animal
like a horse, given a single test image. We present a new method, dubbed
MagicPony, that learns this function purely from in-the-wild single-view images
of the object category, with minimal assumptions about the topology of
deformation. At its core is an implicit-explicit representation of articulated
shape and appearance, combining the strengths of neural fields and meshes. In
order to help the model understand an object's shape and pose, we distil the
knowledge captured by an off-the-shelf self-supervised vision transformer and
fuse it into the 3D model. To overcome common local optima in viewpoint
estimation, we further introduce a new viewpoint sampling scheme that comes at
no added training cost. Compared to prior works, we show significant
quantitative and qualitative improvements on this challenging task. The model
also demonstrates excellent generalisation in reconstructing abstract drawings
and artefacts, despite the fact that it is only trained on real images.Comment: Project Page: https://3dmagicpony.github.io