Search CORE

1,285 research outputs found

Channel-Recurrent Autoencoding for Image Modeling

Author: Shang Wenling
Sohn Kihyuk
Tian Yuandong
Publication venue
Publication date: 11/03/2018
Field of study

Despite recent successes in synthesizing faces and bedrooms, existing generative models struggle to capture more complex image types, potentially due to the oversimplification of their latent space constructions. To tackle this issue, building on Variational Autoencoders (VAEs), we integrate recurrent connections across channels to both inference and generation steps, allowing the high-level features to be captured in global-to-local, coarse-to-fine manners. Combined with adversarial loss, our channel-recurrent VAE-GAN (crVAE-GAN) outperforms VAE-GAN in generating a diverse spectrum of high resolution images while maintaining the same level of computational efficacy. Our model produces interpretable and expressive latent representations to benefit downstream tasks such as image completion. Moreover, we propose two novel regularizations, namely the KL objective weighting scheme over time steps and mutual information maximization between transformed latent variables and the outputs, to enhance the training.Comment: Code: https://github.com/WendyShang/crVAE. Supplementary Materials: http://www-personal.umich.edu/~shangw/wacv18_supplementary_material.pd

arXiv.org e-Print Archive

Crossref

CARPe Posterum: A Convolutional Approach for Real-time Pedestrian Path Prediction

Author: Mendieta Matías
Tabkhi Hamed
Publication venue
Publication date: 18/05/2021
Field of study

Pedestrian path prediction is an essential topic in computer vision and video understanding. Having insight into the movement of pedestrians is crucial for ensuring safe operation in a variety of applications including autonomous vehicles, social robots, and environmental monitoring. Current works in this area utilize complex generative or recurrent methods to capture many possible futures. However, despite the inherent real-time nature of predicting future paths, little work has been done to explore accurate and computationally efficient approaches for this task. To this end, we propose a convolutional approach for real-time pedestrian path prediction, CARPe. It utilizes a variation of Graph Isomorphism Networks in combination with an agile convolutional neural network design to form a fast and accurate path prediction approach. Notable results in both inference speed and prediction accuracy are achieved, improving FPS considerably in comparison to current state-of-the-art methods while delivering competitive accuracy on well-known path prediction datasets.Comment: AAAI-21 Camera Read

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

ContextVP: Fully Context-Aware Video Prediction

Author: A Geiger
A Graves
Alex Graves
C Ionescu
C Ionescu
P Baldi
RS Sutton
S Hochreiter
X Glorot
Z Wang
Publication venue
Publication date: 09/09/2018
Field of study

Video prediction models based on convolutional networks, recurrent networks, and their combinations often result in blurry predictions. We identify an important contributing factor for imprecise predictions that has not been studied adequately in the literature: blind spots, i.e., lack of access to all relevant past information for accurately predicting the future. To address this issue, we introduce a fully context-aware architecture that captures the entire available past context for each pixel using Parallel Multi-Dimensional LSTM units and aggregates it using blending units. Our model outperforms a strong baseline network of 20 recurrent convolutional layers and yields state-of-the-art performance for next step prediction on three challenging real-world video datasets: Human 3.6M, Caltech Pedestrian, and UCF-101. Moreover, it does so with fewer parameters than several recently proposed models, and does not rely on deep convolutional networks, multi-scale architectures, separation of background and foreground modeling, motion flow learning, or adversarial training. These results highlight that full awareness of past context is of crucial importance for video prediction.Comment: 19 pages. ECCV 2018 oral presentation. Project webpage is at https://wonmin-byeon.github.io/publication/2018-ecc

arXiv.org e-Print Archive

Crossref

Spatial Evolutionary Generative Adversarial Networks

Author: Al-Dujaili Abdullah
Arora Sanjeev
Mitchell Melanie
Popovici Elena
Tolstikhin Ilya O
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/05/2019
Field of study

Generative adversary networks (GANs) suffer from training pathologies such as instability and mode collapse. These pathologies mainly arise from a lack of diversity in their adversarial interactions. Evolutionary generative adversarial networks apply the principles of evolutionary computation to mitigate these problems. We hybridize two of these approaches that promote training diversity. One, E-GAN, at each batch, injects mutation diversity by training the (replicated) generator with three independent objective functions then selecting the resulting best performing generator for the next batch. The other, Lipizzaner, injects population diversity by training a two-dimensional grid of GANs with a distributed evolutionary algorithm that includes neighbor exchanges of additional training adversaries, performance based selection and population-based hyper-parameter tuning. We propose to combine mutation and population approaches to diversity improvement. We contribute a superior evolutionary GANs training method, Mustangs, that eliminates the single loss function used across Lipizzaner's grid. Instead, each training round, a loss function is selected with equal probability, from among the three E-GAN uses. Experimental analyses on standard benchmarks, MNIST and CelebA, demonstrate that Mustangs provides a statistically faster training method resulting in more accurate networks

arXiv.org e-Print Archive

Crossref