858 research outputs found
Learning to Infer Graphics Programs from Hand-Drawn Images
We introduce a model that learns to convert simple hand drawings into
graphics programs written in a subset of \LaTeX. The model combines techniques
from deep learning and program synthesis. We learn a convolutional neural
network that proposes plausible drawing primitives that explain an image. These
drawing primitives are like a trace of the set of primitive commands issued by
a graphics program. We learn a model that uses program synthesis techniques to
recover a graphics program from that trace. These programs have constructs like
variable bindings, iterative loops, or simple kinds of conditionals. With a
graphics program in hand, we can correct errors made by the deep network,
measure similarity between drawings by use of similar high-level geometric
structures, and extrapolate drawings. Taken together these results are a step
towards agents that induce useful, human-readable programs from perceptual
input
Breaking the habit: measuring and predicting departures from routine in individual human mobility
Researchers studying daily life mobility patterns have recently shown that humans are typically highly predictable in their movements. However, no existing work has examined the boundaries of this predictability, where human behaviour transitions temporarily from routine patterns to highly unpredictable states. To address this shortcoming, we tackle two interrelated challenges. First, we develop a novel information-theoretic metric, called instantaneous entropy, to analyse an individual’s mobility patterns and identify temporary departures from routine. Second, to predict such departures in the future, we propose the first Bayesian framework that explicitly models breaks from routine, showing that it outperforms current state-of-the-art predictor
Occlusion resistant learning of intuitive physics from videos
To reach human performance on complex tasks, a key ability for artificial
systems is to understand physical interactions between objects, and predict
future outcomes of a situation. This ability, often referred to as intuitive
physics, has recently received attention and several methods were proposed to
learn these physical rules from video sequences. Yet, most of these methods are
restricted to the case where no, or only limited, occlusions occur. In this
work we propose a probabilistic formulation of learning intuitive physics in 3D
scenes with significant inter-object occlusions. In our formulation, object
positions are modeled as latent variables enabling the reconstruction of the
scene. We then propose a series of approximations that make this problem
tractable. Object proposals are linked across frames using a combination of a
recurrent interaction network, modeling the physics in object space, and a
compositional renderer, modeling the way in which objects project onto pixel
space. We demonstrate significant improvements over state-of-the-art in the
intuitive physics benchmark of IntPhys. We apply our method to a second dataset
with increasing levels of occlusions, showing it realistically predicts
segmentation masks up to 30 frames in the future. Finally, we also show results
on predicting motion of objects in real videos
Probabilistic Adaptive Computation Time
We present a probabilistic model with discrete latent variables that control
the computation time in deep learning models such as ResNets and LSTMs. A prior
on the latent variables expresses the preference for faster computation. The
amount of computation for an input is determined via amortized maximum a
posteriori (MAP) inference. MAP inference is performed using a novel stochastic
variational optimization method. The recently proposed Adaptive Computation
Time mechanism can be seen as an ad-hoc relaxation of this model. We
demonstrate training using the general-purpose Concrete relaxation of discrete
variables. Evaluation on ResNet shows that our method matches the
speed-accuracy trade-off of Adaptive Computation Time, while allowing for
evaluation with a simple deterministic procedure that has a lower memory
footprint
Variational Saccading: Efficient Inference for Large Resolution Images
Image classification with deep neural networks is typically restricted to
images of small dimensionality such as 224 x 244 in Resnet models [24]. This
limitation excludes the 4000 x 3000 dimensional images that are taken by modern
smartphone cameras and smart devices. In this work, we aim to mitigate the
prohibitive inferential and memory costs of operating in such large dimensional
spaces. To sample from the high-resolution original input distribution, we
propose using a smaller proxy distribution to learn the co-ordinates that
correspond to regions of interest in the high-dimensional space. We introduce a
new principled variational lower bound that captures the relationship of the
proxy distribution's posterior and the original image's co-ordinate space in a
way that maximizes the conditional classification likelihood. We empirically
demonstrate on one synthetic benchmark and one real world large resolution DSLR
camera image dataset that our method produces comparable results with ~10x
faster inference and lower memory consumption than a model that utilizes the
entire original input distribution. Finally, we experiment with a more complex
setting using mini-maps from Starcraft II [56] to infer the number of
characters in a complex 3d-rendered scene. Even in such complicated scenes our
model provides strong localization: a feature missing from traditional
classification models.Comment: Published BMVC 2019 & NIPS 2018 Bayesian Deep Learning Worksho
- …