Search CORE

4,879 research outputs found

Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks

Author: Bouman Katherine L.
Freeman William T.
Wu Jiajun
Xue Tianfan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/05/2019
Field of study

We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods that have tackled this problem in a deterministic or non-parametric way, we propose to model future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. To synthesize realistic movement of objects, we propose a novel network structure, namely a Cross Convolutional Network; this network encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, and on real-world video frames. We present analyses of the learned network representations, showing it is implicitly learning a compact encoding of object appearance and motion. We also demonstrate a few of its applications, including visual analogy-making and video extrapolation.Comment: Journal preprint of arXiv:1607.02586 (IEEE TPAMI, 2019). The first two authors contributed equally to this work. Project page: http://visualdynamics.csail.mit.ed

arXiv.org e-Print Archive

DSpace@MIT

Caltech Authors

Correctnes of belief propagation in Gaussian graphical models of arbitrary topology

Author: William T. Freeman
Yair Weiss
Publication venue
Publication date: 01/01/1999
Field of study

Local "belief propagation" rules of the sort proposed byPearl [12] are guaranteed to converge to the correct posterior probabilities in singly connected graphical models. Recently, a number of researchers have empirically demonstrated good performance of "loopy belief propagation" -- using these same rules on graphs with loops. Perhaps the most dramatic instance is the near Shannonlimit performance of "Turbo codes", whose decoding algorithm is equivalentto loopy belief propagation. Except for th

CiteSeerX

A Comparative Evaluation of Approximate Probabilistic Simulation and Deep Neural Networks as Accounts of Human Physical Scene Understanding

Author: Freeman William T.
Tenenbaum Joshua B.
Wu Jiajun
Zhang Chengkai
Zhang Renqiao
Publication venue
Publication date: 01/01/2016
Field of study

Humans demonstrate remarkable abilities to predict physical events in complex scenes. Two classes of models for physical scene understanding have recently been proposed: "Intuitive Physics Engines", or IPEs, which posit that people make predictions by running approximate probabilistic simulations in causal mental models similar in nature to video-game physics engines, and memory-based models, which make judgments based on analogies to stored experiences of previously encountered scenes and physical outcomes. Versions of the latter have recently been instantiated in convolutional neural network (CNN) architectures. Here we report four experiments that, to our knowledge, are the first rigorous comparisons of simulation-based and CNN-based models, where both approaches are concretely instantiated in algorithms that can run on raw image inputs and produce as outputs physical judgments such as whether a stack of blocks will fall. Both approaches can achieve super-human accuracy levels and can quantitatively predict human judgments to a similar degree, but only the simulation-based models generalize to novel situations in ways that people do, and are qualitatively consistent with systematic perceptual illusions and judgment asymmetries that people show.Comment: Accepted to CogSci 2016 as an oral presentatio

arXiv.org e-Print Archive

eScholarship - University of California

4D Frequency Analysis of Computational Cameras for Depth of Field Extension

Author: Anat Levin
Anat Levin
Paul Green
Samuel W. Hasinoff
William T. Freeman
William T. Freeman
Publication venue
Publication date: 01/01/2009
Field of study

Depth of field (DOF), the range of scene depths that appear sharp in a photograph, poses a fundamental tradeoff in photography---wide apertures are important to reduce imaging noise, but they also increase defocus blur. Recent advances in computational imaging modify the acquisition process to extend the DOF through deconvolution. Because deconvolution quality is a tight function of the frequency power spectrum of the defocus kernel, designs with high spectra are desirable. In this paper we study how to design effective extended-DOF systems, and show an upper bound on the maximal power spectrum that can be achieved. We analyze defocus kernels in the 4D light field space and show that in the frequency domain, only a low-dimensional 3D manifold contributes to focus. Thus, to maximize the defocus spectrum, imaging systems should concentrate their limited energy on this manifold. We review several computational imaging systems and show either that they spend energy outside the focal manifold or do not achieve a high spectrum over the DOF. Guided by this analysis we introduce the lattice-focal lens, which concentrates energy at the low-dimensional focal manifold and achieves a higher power spectrum than previous designs. We have built a prototype lattice-focal lens and present extended depth of field results

CiteSeerX

DSpace@MIT

Crossref

Ambient Sound Provides Supervision for Visual Learning

Author: Freeman William T.
McDermott Josh H.
Owens Andrew
Torralba Antonio
Wu Jiajun
Publication venue
Publication date: 01/09/2016
Field of study

The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds.Comment: ECCV 201

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Synthesizing Normalized Faces from Facial Identity Features

Author: Belanger David
Cole Forrester
Freeman William T.
Krishnan Dilip
Mosseri Inbar
Sarna Aaron
Publication venue
Publication date: 17/10/2017
Field of study

We present a method for synthesizing a frontal, neutral-expression image of a person's face given an input face photograph. This is achieved by learning to generate facial landmarks and textures from features extracted from a facial-recognition network. Unlike previous approaches, our encoding feature vector is largely invariant to lighting, pose, and facial expression. Exploiting this invariance, we train our decoder network using only frontal, neutral-expression photographs. Since these photographs are well aligned, we can decompose them into a sparse set of landmark points and aligned texture maps. The decoder then predicts landmarks and textures independently and combines them using a differentiable image warping operation. The resulting images can be used for a number of applications, such as analyzing facial attributes, exposure and white balance adjustment, or creating a 3-D avatar

arXiv.org e-Print Archive

Crossref

Unsupervised Training for 3D Morphable Model Regression

Author: Cole Forrester
Freeman William T.
Genova Kyle
Maschinot Aaron
Sarna Aaron
Vlasic Daniel
Publication venue
Publication date: 15/06/2018
Field of study

We present a method for training a regression network from image pixels to 3D morphable model coordinates using only unlabeled photographs. The training loss is based on features from a facial recognition network, computed on-the-fly by rendering the predicted faces with a differentiable renderer. To make training from features feasible and avoid network fooling effects, we introduce three objectives: a batch distribution loss that encourages the output distribution to match the distribution of the morphable model, a loopback loss that ensures the network can correctly reinterpret its own output, and a multi-view identity loss that compares the features of the predicted 3D face and the input photograph from multiple viewing angles. We train a regression network using these objectives, a set of unlabeled photographs, and the morphable model itself, and demonstrate state-of-the-art results.Comment: CVPR 2018 version with supplemental material (http://openaccess.thecvf.com/content_cvpr_2018/html/Genova_Unsupervised_Training_for_CVPR_2018_paper.html

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks

Author: Bouman Katherine L.
Freeman William T.
Wu Jiajun
Xue Tianfan
Publication venue
Publication date: 09/07/2016
Field of study

We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods, which have tackled this problem in a deterministic or non-parametric way, we propose a novel approach that models future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. Future frame synthesis is challenging, as it involves low- and high-level image and motion understanding. We propose a novel network structure, namely a Cross Convolutional Network to aid in synthesizing future frames; this network structure encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, as well as on real-wold videos. We also show that our model can be applied to tasks such as visual analogy-making, and present an analysis of the learned network representations.Comment: The first two authors contributed equally to this wor

arXiv.org e-Print Archive

Caltech Authors

Physical Primitive Decomposition

Author: Freeman William T.
Liu Zhijian
Tenenbaum Joshua B.
Wu Jiajun
Publication venue
Publication date: 13/09/2018
Field of study

Objects are made of parts, each with distinct geometry, physics, functionality, and affordances. Developing such a distributed, physical, interpretable representation of objects will facilitate intelligent agents to better explore and interact with the world. In this paper, we study physical primitive decomposition---understanding an object through its components, each with physical and geometric attributes. As annotated data for object parts and physics are rare, we propose a novel formulation that learns physical primitives by explaining both an object's appearance and its behaviors in physical events. Our model performs well on block towers and tools in both synthetic and real scenarios; we also demonstrate that visual and physical observations often provide complementary signals. We further present ablation and behavioral studies to better understand our model and contrast it with human performance.Comment: ECCV 2018. Project page: http://ppd.csail.mit.edu

arXiv.org e-Print Archive

DSpace@MIT

Crossref