24,787 research outputs found
3D Pick & Mix: Object Part Blending in Joint Shape and Image Manifolds
We present 3D Pick & Mix, a new 3D shape retrieval system that provides users
with a new level of freedom to explore 3D shape and Internet image collections
by introducing the ability to reason about objects at the level of their
constituent parts. While classic retrieval systems can only formulate simple
searches such as "find the 3D model that is most similar to the input image"
our new approach can formulate advanced and semantically meaningful search
queries such as: "find me the 3D model that best combines the design of the
legs of the chair in image 1 but with no armrests, like the chair in image 2".
Many applications could benefit from such rich queries, users could browse
through catalogues of furniture and pick and mix parts, combining for example
the legs of a chair from one shop and the armrests from another shop
Factorization of View-Object Manifolds for Joint Object Recognition and Pose Estimation
Due to large variations in shape, appearance, and viewing conditions, object
recognition is a key precursory challenge in the fields of object manipulation
and robotic/AI visual reasoning in general. Recognizing object categories,
particular instances of objects and viewpoints/poses of objects are three
critical subproblems robots must solve in order to accurately grasp/manipulate
objects and reason about their environments. Multi-view images of the same
object lie on intrinsic low-dimensional manifolds in descriptor spaces (e.g.
visual/depth descriptor spaces). These object manifolds share the same topology
despite being geometrically different. Each object manifold can be represented
as a deformed version of a unified manifold. The object manifolds can thus be
parameterized by its homeomorphic mapping/reconstruction from the unified
manifold. In this work, we develop a novel framework to jointly solve the three
challenging recognition sub-problems, by explicitly modeling the deformations
of object manifolds and factorizing it in a view-invariant space for
recognition. We perform extensive experiments on several challenging datasets
and achieve state-of-the-art results
Space-Time Representation of People Based on 3D Skeletal Data: A Review
Spatiotemporal human representation based on 3D visual perception data is a
rapidly growing research area. Based on the information sources, these
representations can be broadly categorized into two groups based on RGB-D
information or 3D skeleton data. Recently, skeleton-based human representations
have been intensively studied and kept attracting an increasing attention, due
to their robustness to variations of viewpoint, human body scale and motion
speed as well as the realtime, online performance. This paper presents a
comprehensive survey of existing space-time representations of people based on
3D skeletal data, and provides an informative categorization and analysis of
these methods from the perspectives, including information modality,
representation encoding, structure and transition, and feature engineering. We
also provide a brief overview of skeleton acquisition devices and construction
methods, enlist a number of public benchmark datasets with skeleton data, and
discuss potential future research directions.Comment: Our paper has been accepted by the journal Computer Vision and Image
Understanding, see
http://www.sciencedirect.com/science/article/pii/S1077314217300279, Computer
Vision and Image Understanding, 201
Unsupervised Feature Learning of Human Actions as Trajectories in Pose Embedding Manifold
An unsupervised human action modeling framework can provide useful
pose-sequence representation, which can be utilized in a variety of pose
analysis applications. In this work we propose a novel temporal pose-sequence
modeling framework, which can embed the dynamics of 3D human-skeleton joints to
a continuous latent space in an efficient manner. In contrast to end-to-end
framework explored by previous works, we disentangle the task of individual
pose representation learning from the task of learning actions as a trajectory
in pose embedding space. In order to realize a continuous pose embedding
manifold with improved reconstructions, we propose an unsupervised, manifold
learning procedure named Encoder GAN, (or EnGAN). Further, we use the pose
embeddings generated by EnGAN to model human actions using a bidirectional RNN
auto-encoder architecture, PoseRNN. We introduce first-order gradient loss to
explicitly enforce temporal regularity in the predicted motion sequence. A
hierarchical feature fusion technique is also investigated for simultaneous
modeling of local skeleton joints along with global pose variations. We
demonstrate state-of-the-art transfer-ability of the learned representation
against other supervisedly and unsupervisedly learned motion embeddings for the
task of fine-grained action recognition on SBU interaction dataset. Further, we
show the qualitative strengths of the proposed framework by visualizing
skeleton pose reconstructions and interpolations in pose-embedding space, and
low dimensional principal component projections of the reconstructed pose
trajectories.Comment: Accepted at WACV 201
Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis
An important problem for both graphics and vision is to synthesize novel
views of a 3D object from a single image. This is particularly challenging due
to the partial observability inherent in projecting a 3D object onto the image
space, and the ill-posedness of inferring object shape and pose. However, we
can train a neural network to address the problem if we restrict our attention
to specific object categories (in our case faces and chairs) for which we can
gather ample training data. In this paper, we propose a novel recurrent
convolutional encoder-decoder network that is trained end-to-end on the task of
rendering rotated objects starting from a single image. The recurrent structure
allows our model to capture long-term dependencies along a sequence of
transformations. We demonstrate the quality of its predictions for human faces
on the Multi-PIE dataset and for a dataset of 3D chair models, and also show
its ability to disentangle latent factors of variation (e.g., identity and
pose) without using full supervision.Comment: This was published in NIPS 2015 conferenc
Beyond Low-Rank Representations: Orthogonal Clustering Basis Reconstruction with Optimized Graph Structure for Multi-view Spectral Clustering
Low-Rank Representation (LRR) is arguably one of the most powerful paradigms
for Multi-view spectral clustering, which elegantly encodes the multi-view
local graph/manifold structures into an intrinsic low-rank self-expressive data
similarity embedded in high-dimensional space, to yield a better graph
partition than their single-view counterparts. In this paper we revisit it with
a fundamentally different perspective by discovering LRR as essentially a
latent clustered orthogonal projection based representation winged with an
optimized local graph structure for spectral clustering; each column of the
representation is fundamentally a cluster basis orthogonal to others to
indicate its members, which intuitively projects the view-specific feature
representation to be the one spanned by all orthogonal basis to characterize
the cluster structures. Upon this finding, we propose our technique with the
followings: (1) We decompose LRR into latent clustered orthogonal
representation via low-rank matrix factorization, to encode the more flexible
cluster structures than LRR over primal data objects; (2) We convert the
problem of LRR into that of simultaneously learning orthogonal clustered
representation and optimized local graph structure for each view; (3) The
learned orthogonal clustered representations and local graph structures enjoy
the same magnitude for multi-view, so that the ideal multi-view consensus can
be readily achieved. The experiments over multi-view datasets validate its
superiority.Comment: Accepted to appear in Neural Networks, Elsevier, on 9th March 201
Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis
Photorealistic frontal view synthesis from a single face image has a wide
range of applications in the field of face recognition. Although data-driven
deep learning methods have been proposed to address this problem by seeking
solutions from ample face data, this problem is still challenging because it is
intrinsically ill-posed. This paper proposes a Two-Pathway Generative
Adversarial Network (TP-GAN) for photorealistic frontal view synthesis by
simultaneously perceiving global structures and local details. Four landmark
located patch networks are proposed to attend to local textures in addition to
the commonly used global encoder-decoder network. Except for the novel
architecture, we make this ill-posed problem well constrained by introducing a
combination of adversarial loss, symmetry loss and identity preserving loss.
The combined loss function leverages both frontal face distribution and
pre-trained discriminative deep face models to guide an identity preserving
inference of frontal views from profiles. Different from previous deep learning
methods that mainly rely on intermediate features for recognition, our method
directly leverages the synthesized identity preserving image for downstream
tasks like face recognition and attribution estimation. Experimental results
demonstrate that our method not only presents compelling perceptual results but
also outperforms state-of-the-art results on large pose face recognition.Comment: accepted at ICCV 2017, main paper & supplementary material, 11 page
Multi-View Kernels for Low-Dimensional Modeling of Seismic Events
The problem of learning from seismic recordings has been studied for years.
There is a growing interest in developing automatic mechanisms for identifying
the properties of a seismic event. One main motivation is the ability have a
reliable identification of man-made explosions. The availability of multiple
high-dimensional observations has increased the use of machine learning
techniques in a variety of fields. In this work, we propose to use a
kernel-fusion based dimensionality reduction framework for generating
meaningful seismic representations from raw data. The proposed method is tested
on 2023 events that were recorded in Israel and in Jordan. The method achieves
promising results in classification of event type as well as in estimating the
location of the event. The proposed fusion and dimensionality reduction tools
may be applied to other types of geophysical data
Transfer of View-manifold Learning to Similarity Perception of Novel Objects
We develop a model of perceptual similarity judgment based on re-training a
deep convolution neural network (DCNN) that learns to associate different views
of each 3D object to capture the notion of object persistence and continuity in
our visual experience. The re-training process effectively performs distance
metric learning under the object persistency constraints, to modify the
view-manifold of object representations. It reduces the effective distance
between the representations of different views of the same object without
compromising the distance between those of the views of different objects,
resulting in the untangling of the view-manifolds between individual objects
within the same category and across categories. This untangling enables the
model to discriminate and recognize objects within the same category,
independent of viewpoints. We found that this ability is not limited to the
trained objects, but transfers to novel objects in both trained and untrained
categories, as well as to a variety of completely novel artificial synthetic
objects. This transfer in learning suggests the modification of distance
metrics in view- manifolds is more general and abstract, likely at the levels
of parts, and independent of the specific objects or categories experienced
during training. Interestingly, the resulting transformation of feature
representation in the deep networks is found to significantly better match
human perceptual similarity judgment than AlexNet, suggesting that object
persistence could be an important constraint in the development of perceptual
similarity judgment in biological neural networks.Comment: Accepted to ICLR201
Multiple Manifolds Metric Learning with Application to Image Set Classification
In image set classification, a considerable advance has been made by modeling
the original image sets by second order statistics or linear subspace, which
typically lie on the Riemannian manifold. Specifically, they are Symmetric
Positive Definite (SPD) manifold and Grassmann manifold respectively, and some
algorithms have been developed on them for classification tasks. Motivated by
the inability of existing methods to extract discriminatory features for data
on Riemannian manifolds, we propose a novel algorithm which combines multiple
manifolds as the features of the original image sets. In order to fuse these
manifolds, the well-studied Riemannian kernels have been utilized to map the
original Riemannian spaces into high dimensional Hilbert spaces. A metric
Learning method has been devised to embed these kernel spaces into a lower
dimensional common subspace for classification. The state-of-the-art results
achieved on three datasets corresponding to two different classification tasks,
namely face recognition and object categorization, demonstrate the
effectiveness of the proposed method.Comment: 6 pages, 4 figures,ICPR 2018(accepted
- …