8,340 research outputs found
Programmable Agents
We build deep RL agents that execute declarative programs expressed in formal
language. The agents learn to ground the terms in this language in their
environment, and can generalize their behavior at test time to execute new
programs that refer to objects that were not referenced during training. The
agents develop disentangled interpretable representations that allow them to
generalize to a wide variety of zero-shot semantic tasks
CSGNet: Neural Shape Parser for Constructive Solid Geometry
We present a neural architecture that takes as input a 2D or 3D shape and
outputs a program that generates the shape. The instructions in our program are
based on constructive solid geometry principles, i.e., a set of boolean
operations on shape primitives defined recursively. Bottom-up techniques for
this shape parsing task rely on primitive detection and are inherently slow
since the search space over possible primitive combinations is large. In
contrast, our model uses a recurrent neural network that parses the input shape
in a top-down manner, which is significantly faster and yields a compact and
easy-to-interpret sequence of modeling instructions. Our model is also more
effective as a shape detector compared to existing state-of-the-art detection
techniques. We finally demonstrate that our network can be trained on novel
datasets without ground-truth program annotations through policy gradient
techniques.Comment: Accepted at CVPR-201
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns
visual concepts, words, and semantic parsing of sentences without explicit
supervision on any of them; instead, our model learns by simply looking at
images and reading paired questions and answers. Our model builds an
object-based scene representation and translates sentences into executable,
symbolic programs. To bridge the learning of two modules, we use a
neuro-symbolic reasoning module that executes these programs on the latent
scene representation. Analogical to human concept learning, the perception
module learns visual concepts based on the language description of the object
being referred to. Meanwhile, the learned visual concepts facilitate learning
new words and parsing new sentences. We use curriculum learning to guide the
searching over the large compositional space of images and language. Extensive
experiments demonstrate the accuracy and efficiency of our model on learning
visual concepts, word representations, and semantic parsing of sentences.
Further, our method allows easy generalization to new object attributes,
compositions, language concepts, scenes and questions, and even new program
domains. It also empowers applications including visual question answering and
bidirectional image-text retrieval.Comment: ICLR 2019 (Oral). Project page: http://nscl.csail.mit.edu
Improving Unsupervised Visual Program Inference with Code Rewriting Families
Programs offer compactness and structure that makes them an attractive
representation for visual data. We explore how code rewriting can be used to
improve systems for inferring programs from visual data. We first propose
Sparse Intermittent Rewrite Injection (SIRI), a framework for unsupervised
bootstrapped learning. SIRI sparsely applies code rewrite operations over a
dataset of training programs, injecting the improved programs back into the
training set. We design a family of rewriters for visual programming domains:
parameter optimization, code pruning, and code grafting. For three shape
programming languages in 2D and 3D, we show that using SIRI with our family of
rewriters improves performance: better reconstructions and faster convergence
rates, compared with bootstrapped learning methods that do not use rewriters or
use them naively. Finally, we demonstrate that our family of rewriters can be
effectively used at test time to improve the output of SIRI predictions. For 2D
and 3D CSG, we outperform or match the reconstruction performance of recent
domain-specific neural architectures, while producing more parsimonious
programs that use significantly fewer primitives.Comment: Accepted at ICCV 23 (oral). Website:
https://bardofcodes.github.io/coref
Learning to Infer Graphics Programs from Hand-Drawn Images
We introduce a model that learns to convert simple hand drawings into
graphics programs written in a subset of \LaTeX. The model combines techniques
from deep learning and program synthesis. We learn a convolutional neural
network that proposes plausible drawing primitives that explain an image. These
drawing primitives are like a trace of the set of primitive commands issued by
a graphics program. We learn a model that uses program synthesis techniques to
recover a graphics program from that trace. These programs have constructs like
variable bindings, iterative loops, or simple kinds of conditionals. With a
graphics program in hand, we can correct errors made by the deep network,
measure similarity between drawings by use of similar high-level geometric
structures, and extrapolate drawings. Taken together these results are a step
towards agents that induce useful, human-readable programs from perceptual
input
- …