36,720 research outputs found
Neural 3D Morphable Models: Spiral Convolutional Networks for 3D Shape Representation Learning and Generation
Generative models for 3D geometric data arise in many important applications
in 3D computer vision and graphics. In this paper, we focus on 3D deformable
shapes that share a common topological structure, such as human faces and
bodies. Morphable Models and their variants, despite their linear formulation,
have been widely used for shape representation, while most of the recently
proposed nonlinear approaches resort to intermediate representations, such as
3D voxel grids or 2D views. In this work, we introduce a novel graph
convolutional operator, acting directly on the 3D mesh, that explicitly models
the inductive bias of the fixed underlying graph. This is achieved by enforcing
consistent local orderings of the vertices of the graph, through the spiral
operator, thus breaking the permutation invariance property that is adopted by
all the prior work on Graph Neural Networks. Our operator comes by construction
with desirable properties (anisotropic, topology-aware, lightweight,
easy-to-optimise), and by using it as a building block for traditional deep
generative architectures, we demonstrate state-of-the-art results on a variety
of 3D shape datasets compared to the linear Morphable Model and other graph
convolutional operators.Comment: to appear at ICCV 201
Generative Adversarial Networks (GANs): Challenges, Solutions, and Future Directions
Generative Adversarial Networks (GANs) is a novel class of deep generative
models which has recently gained significant attention. GANs learns complex and
high-dimensional distributions implicitly over images, audio, and data.
However, there exists major challenges in training of GANs, i.e., mode
collapse, non-convergence and instability, due to inappropriate design of
network architecture, use of objective function and selection of optimization
algorithm. Recently, to address these challenges, several solutions for better
design and optimization of GANs have been investigated based on techniques of
re-engineered network architectures, new objective functions and alternative
optimization algorithms. To the best of our knowledge, there is no existing
survey that has particularly focused on broad and systematic developments of
these solutions. In this study, we perform a comprehensive survey of the
advancements in GANs design and optimization solutions proposed to handle GANs
challenges. We first identify key research issues within each design and
optimization technique and then propose a new taxonomy to structure solutions
by key research issues. In accordance with the taxonomy, we provide a detailed
discussion on different GANs variants proposed within each solution and their
relationships. Finally, based on the insights gained, we present the promising
research directions in this rapidly growing field.Comment: 42 pages, Figure 13, Table
Learning Face Age Progression: A Pyramid Architecture of GANs
The two underlying requirements of face age progression, i.e. aging accuracy
and identity permanence, are not well studied in the literature. In this paper,
we present a novel generative adversarial network based approach. It separately
models the constraints for the intrinsic subject-specific characteristics and
the age-specific facial changes with respect to the elapsed time, ensuring that
the generated faces present desired aging effects while simultaneously keeping
personalized properties stable. Further, to generate more lifelike facial
details, high-level age-specific features conveyed by the synthesized face are
estimated by a pyramidal adversarial discriminator at multiple scales, which
simulates the aging effects in a finer manner. The proposed method is
applicable to diverse face samples in the presence of variations in pose,
expression, makeup, etc., and remarkably vivid aging effects are achieved. Both
visual fidelity and quantitative evaluations show that the approach advances
the state-of-the-art.Comment: CVPR 2018. V4 and V2 are the same, i.e. the conference version; V3 is
a related but different work, which is mistakenly submitted and will be
submitted as a new arXiv pape
CARPe Posterum: A Convolutional Approach for Real-time Pedestrian Path Prediction
Pedestrian path prediction is an essential topic in computer vision and video
understanding. Having insight into the movement of pedestrians is crucial for
ensuring safe operation in a variety of applications including autonomous
vehicles, social robots, and environmental monitoring. Current works in this
area utilize complex generative or recurrent methods to capture many possible
futures. However, despite the inherent real-time nature of predicting future
paths, little work has been done to explore accurate and computationally
efficient approaches for this task. To this end, we propose a convolutional
approach for real-time pedestrian path prediction, CARPe. It utilizes a
variation of Graph Isomorphism Networks in combination with an agile
convolutional neural network design to form a fast and accurate path prediction
approach. Notable results in both inference speed and prediction accuracy are
achieved, improving FPS considerably in comparison to current state-of-the-art
methods while delivering competitive accuracy on well-known path prediction
datasets.Comment: AAAI-21 Camera Read
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
We study the problem of synthesizing a number of likely future frames from a
single input image. In contrast to traditional methods, which have tackled this
problem in a deterministic or non-parametric way, we propose a novel approach
that models future frames in a probabilistic manner. Our probabilistic model
makes it possible for us to sample and synthesize many possible future frames
from a single input image. Future frame synthesis is challenging, as it
involves low- and high-level image and motion understanding. We propose a novel
network structure, namely a Cross Convolutional Network to aid in synthesizing
future frames; this network structure encodes image and motion information as
feature maps and convolutional kernels, respectively. In experiments, our model
performs well on synthetic data, such as 2D shapes and animated game sprites,
as well as on real-wold videos. We also show that our model can be applied to
tasks such as visual analogy-making, and present an analysis of the learned
network representations.Comment: The first two authors contributed equally to this wor
- …