2,667 research outputs found
GRASS: Generative Recursive Autoencoders for Shape Structures
We introduce a novel neural network architecture for encoding and synthesis
of 3D shapes, particularly their structures. Our key insight is that 3D shapes
are effectively characterized by their hierarchical organization of parts,
which reflects fundamental intra-shape relationships such as adjacency and
symmetry. We develop a recursive neural net (RvNN) based autoencoder to map a
flat, unlabeled, arbitrary part layout to a compact code. The code effectively
captures hierarchical structures of man-made 3D objects of varying structural
complexities despite being fixed-dimensional: an associated decoder maps a code
back to a full hierarchy. The learned bidirectional mapping is further tuned
using an adversarial setup to yield a generative model of plausible structures,
from which novel structures can be sampled. Finally, our structure synthesis
framework is augmented by a second trained module that produces fine-grained
part geometry, conditioned on global and local structural context, leading to a
full generative pipeline for 3D shapes. We demonstrate that without
supervision, our network learns meaningful structural hierarchies adhering to
perceptual grouping principles, produces compact codes which enable
applications such as shape classification and partial matching, and supports
shape synthesis and interpolation with significant variations in topology and
geometry.Comment: Corresponding author: Kai Xu ([email protected]
Training Complex Models with Multi-Task Weak Supervision
As machine learning models continue to increase in complexity, collecting
large hand-labeled training sets has become one of the biggest roadblocks in
practice. Instead, weaker forms of supervision that provide noisier but cheaper
labels are often used. However, these weak supervision sources have diverse and
unknown accuracies, may output correlated labels, and may label different tasks
or apply at different levels of granularity. We propose a framework for
integrating and modeling such weak supervision sources by viewing them as
labeling different related sub-tasks of a problem, which we refer to as the
multi-task weak supervision setting. We show that by solving a matrix
completion-style problem, we can recover the accuracies of these multi-task
sources given their dependency structure, but without any labeled data, leading
to higher-quality supervision for training an end model. Theoretically, we show
that the generalization error of models trained with this approach improves
with the number of unlabeled data points, and characterize the scaling with
respect to the task and dependency structures. On three fine-grained
classification problems, we show that our approach leads to average gains of
20.2 points in accuracy over a traditional supervised approach, 6.8 points over
a majority vote baseline, and 4.1 points over a previously proposed weak
supervision method that models tasks separately
Joint Generative Modeling of Scene Graphs and Images via Diffusion Models
In this paper, we present a novel generative task: joint scene graph - image
generation. While previous works have explored image generation conditioned on
scene graphs or layouts, our task is distinctive and important as it involves
generating scene graphs themselves unconditionally from noise, enabling
efficient and interpretable control for image generation. Our task is
challenging, requiring the generation of plausible scene graphs with
heterogeneous attributes for nodes (objects) and edges (relations among
objects), including continuous object bounding boxes and discrete object and
relation categories. We introduce a novel diffusion model, DiffuseSG, that
jointly models the adjacency matrix along with heterogeneous node and edge
attributes. We explore various types of encodings for the categorical data,
relaxing it into a continuous space. With a graph transformer being the
denoiser, DiffuseSG successively denoises the scene graph representation in a
continuous space and discretizes the final representation to generate the clean
scene graph. Additionally, we introduce an IoU regularization to enhance the
empirical performance. Our model significantly outperforms existing methods in
scene graph generation on the Visual Genome and COCO-Stuff datasets, both on
standard and newly introduced metrics that better capture the problem
complexity. Moreover, we demonstrate the additional benefits of our model in
two downstream applications: 1) excelling in a series of scene graph completion
tasks, and 2) improving scene graph detection models by using extra training
samples generated from DiffuseSG
- …