16,829 research outputs found
Image Denoising with Graph-Convolutional Neural Networks
Recovering an image from a noisy observation is a key problem in signal
processing. Recently, it has been shown that data-driven approaches employing
convolutional neural networks can outperform classical model-based techniques,
because they can capture more powerful and discriminative features. However,
since these methods are based on convolutional operations, they are only
capable of exploiting local similarities without taking into account non-local
self-similarities. In this paper we propose a convolutional neural network that
employs graph-convolutional layers in order to exploit both local and non-local
similarities. The graph-convolutional layers dynamically construct
neighborhoods in the feature space to detect latent correlations in the feature
maps produced by the hidden layers. The experimental results show that the
proposed architecture outperforms classical convolutional neural networks for
the denoising task.Comment: IEEE International Conference on Image Processing (ICIP) 201
Semantic Object Parsing with Graph LSTM
By taking the semantic object parsing task as an exemplar application
scenario, we propose the Graph Long Short-Term Memory (Graph LSTM) network,
which is the generalization of LSTM from sequential data or multi-dimensional
data to general graph-structured data. Particularly, instead of evenly and
fixedly dividing an image to pixels or patches in existing multi-dimensional
LSTM structures (e.g., Row, Grid and Diagonal LSTMs), we take each
arbitrary-shaped superpixel as a semantically consistent node, and adaptively
construct an undirected graph for each image, where the spatial relations of
the superpixels are naturally used as edges. Constructed on such an adaptive
graph topology, the Graph LSTM is more naturally aligned with the visual
patterns in the image (e.g., object boundaries or appearance similarities) and
provides a more economical information propagation route. Furthermore, for each
optimization step over Graph LSTM, we propose to use a confidence-driven scheme
to update the hidden and memory states of nodes progressively till all nodes
are updated. In addition, for each node, the forgets gates are adaptively
learned to capture different degrees of semantic correlation with neighboring
nodes. Comprehensive evaluations on four diverse semantic object parsing
datasets well demonstrate the significant superiority of our Graph LSTM over
other state-of-the-art solutions.Comment: 18 page
Variational Autoencoders for Deforming 3D Mesh Models
3D geometric contents are becoming increasingly popular. In this paper, we
study the problem of analyzing deforming 3D meshes using deep neural networks.
Deforming 3D meshes are flexible to represent 3D animation sequences as well as
collections of objects of the same category, allowing diverse shapes with
large-scale non-linear deformations. We propose a novel framework which we call
mesh variational autoencoders (mesh VAE), to explore the probabilistic latent
space of 3D surfaces. The framework is easy to train, and requires very few
training examples. We also propose an extended model which allows flexibly
adjusting the significance of different latent variables by altering the prior
distribution. Extensive experiments demonstrate that our general framework is
able to learn a reasonable representation for a collection of deformable
shapes, and produce competitive results for a variety of applications,
including shape generation, shape interpolation, shape space embedding and
shape exploration, outperforming state-of-the-art methods.Comment: CVPR 201
Learning to Recognize Actions from Limited Training Examples Using a Recurrent Spiking Neural Model
A fundamental challenge in machine learning today is to build a model that
can learn from few examples. Here, we describe a reservoir based spiking neural
model for learning to recognize actions with a limited number of labeled
videos. First, we propose a novel encoding, inspired by how microsaccades
influence visual perception, to extract spike information from raw video data
while preserving the temporal correlation across different frames. Using this
encoding, we show that the reservoir generalizes its rich dynamical activity
toward signature action/movements enabling it to learn from few training
examples. We evaluate our approach on the UCF-101 dataset. Our experiments
demonstrate that our proposed reservoir achieves 81.3%/87% Top-1/Top-5
accuracy, respectively, on the 101-class data while requiring just 8 video
examples per class for training. Our results establish a new benchmark for
action recognition from limited video examples for spiking neural models while
yielding competetive accuracy with respect to state-of-the-art non-spiking
neural models.Comment: 13 figures (includes supplementary information
Neural 3D Morphable Models: Spiral Convolutional Networks for 3D Shape Representation Learning and Generation
Generative models for 3D geometric data arise in many important applications
in 3D computer vision and graphics. In this paper, we focus on 3D deformable
shapes that share a common topological structure, such as human faces and
bodies. Morphable Models and their variants, despite their linear formulation,
have been widely used for shape representation, while most of the recently
proposed nonlinear approaches resort to intermediate representations, such as
3D voxel grids or 2D views. In this work, we introduce a novel graph
convolutional operator, acting directly on the 3D mesh, that explicitly models
the inductive bias of the fixed underlying graph. This is achieved by enforcing
consistent local orderings of the vertices of the graph, through the spiral
operator, thus breaking the permutation invariance property that is adopted by
all the prior work on Graph Neural Networks. Our operator comes by construction
with desirable properties (anisotropic, topology-aware, lightweight,
easy-to-optimise), and by using it as a building block for traditional deep
generative architectures, we demonstrate state-of-the-art results on a variety
of 3D shape datasets compared to the linear Morphable Model and other graph
convolutional operators.Comment: to appear at ICCV 201
- …