87 research outputs found
Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood
Training energy-based models (EBMs) with maximum likelihood estimation on
high-dimensional data can be both challenging and time-consuming. As a result,
there a noticeable gap in sample quality between EBMs and other generative
frameworks like GANs and diffusion models. To close this gap, inspired by the
recent efforts of learning EBMs by maximimizing diffusion recovery likelihood
(DRL), we propose cooperative diffusion recovery likelihood (CDRL), an
effective approach to tractably learn and sample from a series of EBMs defined
on increasingly noisy versons of a dataset, paired with an initializer model
for each EBM. At each noise level, the initializer model learns to amortize the
sampling process of the EBM, and the two models are jointly estimated within a
cooperative training framework. Samples from the initializer serve as starting
points that are refined by a few sampling steps from the EBM. With the refined
samples, the EBM is optimized by maximizing recovery likelihood, while the
initializer is optimized by learning from the difference between the refined
samples and the initial samples. We develop a new noise schedule and a variance
reduction technique to further improve the sample quality. Combining these
advances, we significantly boost the FID scores compared to existing EBM
methods on CIFAR-10 and ImageNet 32x32, with a 2x speedup over DRL. In
addition, we extend our method to compositional generation and image inpainting
tasks, and showcase the compatibility of CDRL with classifier-free guidance for
conditional generation, achieving similar trade-offs between sample quality and
sample diversity as in diffusion models
Learning Dynamic Generator Model by Alternating Back-Propagation Through Time
This paper studies the dynamic generator model for spatial-temporal processes
such as dynamic textures and action sequences in video data. In this model,
each time frame of the video sequence is generated by a generator model, which
is a non-linear transformation of a latent state vector, where the non-linear
transformation is parametrized by a top-down neural network. The sequence of
latent state vectors follows a non-linear auto-regressive model, where the
state vector of the next frame is a non-linear transformation of the state
vector of the current frame as well as an independent noise vector that
provides randomness in the transition. The non-linear transformation of this
transition model can be parametrized by a feedforward neural network. We show
that this model can be learned by an alternating back-propagation through time
algorithm that iteratively samples the noise vectors and updates the parameters
in the transition model and the generator model. We show that our training
method can learn realistic models for dynamic textures and action patterns.Comment: 10 page
Motion-Based Generator Model: Unsupervised Disentanglement of Appearance, Trackable and Intrackable Motions in Dynamic Patterns
Dynamic patterns are characterized by complex spatial and motion patterns.
Understanding dynamic patterns requires a disentangled representational model
that separates the factorial components. A commonly used model for dynamic
patterns is the state space model, where the state evolves over time according
to a transition model and the state generates the observed image frames
according to an emission model. To model the motions explicitly, it is natural
for the model to be based on the motions or the displacement fields of the
pixels. Thus in the emission model, we let the hidden state generate the
displacement field, which warps the trackable component in the previous image
frame to generate the next frame while adding a simultaneously emitted residual
image to account for the change that cannot be explained by the deformation.
The warping of the previous image is about the trackable part of the change of
image frame, while the residual image is about the intrackable part of the
image. We use a maximum likelihood algorithm to learn the model that iterates
between inferring latent noise vectors that drive the transition model and
updating the parameters given the inferred latent vectors. Meanwhile we adopt a
regularization term to penalize the norms of the residual images to encourage
the model to explain the change of image frames by trackable motion. Unlike
existing methods on dynamic patterns, we learn our model in unsupervised
setting without ground truth displacement fields. In addition, our model
defines a notion of intrackability by the separation of warped component and
residual component in each image frame. We show that our method can synthesize
realistic dynamic pattern, and disentangling appearance, trackable and
intrackable motions. The learned models are useful for motion transfer, and it
is natural to adopt it to define and measure intrackability of a dynamic
pattern
Living in a Simulation? An Empirical Investigation of a Smart Driving-Simulation Testing System
The internet of things (IoT) generally refers to the embedding of computing and communication devices in various types of physical objects (e.g., automobiles) used in people’s daily lives. This paper draws on feedback intervention theory to investigate the impact of IoT-enabled immediate feedback interventions on individual task performance. Our research context is a smart test-simulation service based on internet-of-vehicles (IoV) technology that was implemented by a large driver-training service provider in China. This system captures and analyzes data streams from onboard sensors and cameras installed in vehicles in real time and immediately provides individual students with information about errors made during simulation tests. We postulate that the focal smart service functions as a feedback intervention (FI) that can improve task performance. We also hypothesize that student training schedules moderate this effect and propose an interaction effect on student performance based on feedback timing and the number of FI cues. We collected data about students’ demographics, their training session records, and information about their simulation test(s) and/or their official driving skills field tests and used a quasi-experimental method along with propensity score matching to empirically validate our research model. Difference-in-difference analysis and multiple regression results support the significant impact of the simulation test as an FI on student performance on the official driving skills field test. Our results also supported the interaction effect between feedback timing and the number of corrective FI cues on official test performance. This paper concludes with a discussion of the theoretical contributions and practical significance of our research
Generating Transferable Adversarial Simulation Scenarios for Self-Driving via Neural Rendering
Self-driving software pipelines include components that are learned from a
significant number of training examples, yet it remains challenging to evaluate
the overall system's safety and generalization performance. Together with
scaling up the real-world deployment of autonomous vehicles, it is of critical
importance to automatically find simulation scenarios where the driving
policies will fail. We propose a method that efficiently generates adversarial
simulation scenarios for autonomous driving by solving an optimal control
problem that aims to maximally perturb the policy from its nominal trajectory.
Given an image-based driving policy, we show that we can inject new objects
in a neural rendering representation of the deployment scene, and optimize
their texture in order to generate adversarial sensor inputs to the policy. We
demonstrate that adversarial scenarios discovered purely in the neural renderer
(surrogate scene) can often be successfully transferred to the deployment
scene, without further optimization. We demonstrate this transfer occurs both
in simulated and real environments, provided the learned surrogate scene is
sufficiently close to the deployment scene.Comment: Conference paper submitted to CoRL 2
- …