3,554 research outputs found
Deep Multimodal Subspace Clustering Networks
We present convolutional neural network (CNN) based approaches for
unsupervised multimodal subspace clustering. The proposed framework consists of
three main stages - multimodal encoder, self-expressive layer, and multimodal
decoder. The encoder takes multimodal data as input and fuses them to a latent
space representation. The self-expressive layer is responsible for enforcing
the self-expressiveness property and acquiring an affinity matrix corresponding
to the data points. The decoder reconstructs the original input data. The
network uses the distance between the decoder's reconstruction and the original
input in its training. We investigate early, late and intermediate fusion
techniques and propose three different encoders corresponding to them for
spatial fusion. The self-expressive layers and multimodal decoders are
essentially the same for different spatial fusion-based approaches. In addition
to various spatial fusion-based methods, an affinity fusion-based network is
also proposed in which the self-expressive layer corresponding to different
modalities is enforced to be the same. Extensive experiments on three datasets
show that the proposed methods significantly outperform the state-of-the-art
multimodal subspace clustering methods
Multi-view Low-rank Sparse Subspace Clustering
Most existing approaches address multi-view subspace clustering problem by
constructing the affinity matrix on each view separately and afterwards propose
how to extend spectral clustering algorithm to handle multi-view data. This
paper presents an approach to multi-view subspace clustering that learns a
joint subspace representation by constructing affinity matrix shared among all
views. Relying on the importance of both low-rank and sparsity constraints in
the construction of the affinity matrix, we introduce the objective that
balances between the agreement across different views, while at the same time
encourages sparsity and low-rankness of the solution. Related low-rank and
sparsity constrained optimization problem is for each view solved using the
alternating direction method of multipliers. Furthermore, we extend our
approach to cluster data drawn from nonlinear subspaces by solving the
corresponding problem in a reproducing kernel Hilbert space. The proposed
algorithm outperforms state-of-the-art multi-view subspace clustering
algorithms on one synthetic and four real-world datasets
Guided Co-training for Large-Scale Multi-View Spectral Clustering
In many real-world applications, we have access to multiple views of the
data, each of which characterizes the data from a distinct aspect. Several
previous algorithms have demonstrated that one can achieve better clustering
accuracy by integrating information from all views appropriately than using
only an individual view. Owing to the effectiveness of spectral clustering,
many multi-view clustering methods are based on it. Unfortunately, they have
limited applicability to large-scale data due to the high computational
complexity of spectral clustering. In this work, we propose a novel multi-view
spectral clustering method for large-scale data. Our approach is structured
under the guided co-training scheme to fuse distinct views, and uses the
sampling technique to accelerate spectral clustering. More specifically, we
first select () landmark points and then approximate the
eigen-decomposition accordingly. The augmented view, which is essential to
guided co-training process, can then be quickly determined by our method. The
proposed algorithm scales linearly with the number of given data. Extensive
experiments have been performed and the results support the advantage of our
method for handling the large-scale multi-view situation
Multi-View Surveillance Video Summarization via Joint Embedding and Sparse Optimization
Most traditional video summarization methods are designed to generate
effective summaries for single-view videos, and thus they cannot fully exploit
the complicated intra and inter-view correlations in summarizing multi-view
videos in a camera network. In this paper, with the aim of summarizing
multi-view videos, we introduce a novel unsupervised framework via joint
embedding and sparse representative selection. The objective function is
two-fold. The first is to capture the multi-view correlations via an embedding,
which helps in extracting a diverse set of representatives. The second is to
use a `2;1- norm to model the sparsity while selecting representative shots for
the summary. We propose to jointly optimize both of the objectives, such that
embedding can not only characterize the correlations, but also indicate the
requirements of sparse representative selection. We present an efficient
alternating algorithm based on half-quadratic minimization to solve the
proposed non-smooth and non-convex objective with convergence analysis. A key
advantage of the proposed approach with respect to the state-of-the-art is that
it can summarize multi-view videos without assuming any prior
correspondences/alignment between them, e.g., uncalibrated camera networks.
Rigorous experiments on several multi-view datasets demonstrate that our
approach clearly outperforms the state-of-the-art methods.Comment: IEEE Trans. on Multimedia, 2017 (In Press
Effective Image Retrieval via Multilinear Multi-index Fusion
Multi-index fusion has demonstrated impressive performances in retrieval task
by integrating different visual representations in a unified framework.
However, previous works mainly consider propagating similarities via neighbor
structure, ignoring the high order information among different visual
representations. In this paper, we propose a new multi-index fusion scheme for
image retrieval. By formulating this procedure as a multilinear based
optimization problem, the complementary information hidden in different indexes
can be explored more thoroughly. Specially, we first build our multiple indexes
from various visual representations. Then a so-called index-specific functional
matrix, which aims to propagate similarities, is introduced for updating the
original index. The functional matrices are then optimized in a unified tensor
space to achieve a refinement, such that the relevant images can be pushed more
closer. The optimization problem can be efficiently solved by the augmented
Lagrangian method with theoretical convergence guarantee. Unlike the
traditional multi-index fusion scheme, our approach embeds the multi-index
subspace structure into the new indexes with sparse constraint, thus it has
little additional memory consumption in online query stage. Experimental
evaluation on three benchmark datasets reveals that the proposed approach
achieves the state-of-the-art performance, i.e., N-score 3.94 on UKBench, mAP
94.1\% on Holiday and 62.39\% on Market-1501.Comment: 12 page
Deep Sparse Subspace Clustering
In this paper, we present a deep extension of Sparse Subspace Clustering,
termed Deep Sparse Subspace Clustering (DSSC). Regularized by the unit sphere
distribution assumption for the learned deep features, DSSC can infer a new
data affinity matrix by simultaneously satisfying the sparsity principle of SSC
and the nonlinearity given by neural networks. One of the appealing advantages
brought by DSSC is: when original real-world data do not meet the
class-specific linear subspace distribution assumption, DSSC can employ neural
networks to make the assumption valid with its hierarchical nonlinear
transformations. To the best of our knowledge, this is among the first deep
learning based subspace clustering methods. Extensive experiments are conducted
on four real-world datasets to show the proposed DSSC is significantly superior
to 12 existing methods for subspace clustering.Comment: The initial version is completed at the beginning of 201
A Proximity-Aware Hierarchical Clustering of Faces
In this paper, we propose an unsupervised face clustering algorithm called
"Proximity-Aware Hierarchical Clustering" (PAHC) that exploits the local
structure of deep representations. In the proposed method, a similarity measure
between deep features is computed by evaluating linear SVM margins. SVMs are
trained using nearest neighbors of sample data, and thus do not require any
external training data. Clusters are then formed by thresholding the similarity
scores. We evaluate the clustering performance using three challenging
unconstrained face datasets, including Celebrity in Frontal-Profile (CFP),
IARPA JANUS Benchmark A (IJB-A), and JANUS Challenge Set 3 (JANUS CS3)
datasets. Experimental results demonstrate that the proposed approach can
achieve significant improvements over state-of-the-art methods. Moreover, we
also show that the proposed clustering algorithm can be applied to curate a set
of large-scale and noisy training dataset while maintaining sufficient amount
of images and their variations due to nuisance factors. The face verification
performance on JANUS CS3 improves significantly by finetuning a DCNN model with
the curated MS-Celeb-1M dataset which contains over three million face images
Multi-feature Distance Metric Learning for Non-rigid 3D Shape Retrieval
In the past decades, feature-learning-based 3D shape retrieval approaches
have been received widespread attention in the computer graphic community.
These approaches usually explored the hand-crafted distance metric or
conventional distance metric learning methods to compute the similarity of the
single feature. The single feature always contains onefold geometric
information, which cannot characterize the 3D shapes well. Therefore, the
multiple features should be used for the retrieval task to overcome the
limitation of single feature and further improve the performance. However, most
conventional distance metric learning methods fail to integrate the
complementary information from multiple features to construct the distance
metric. To address these issue, a novel multi-feature distance metric learning
method for non-rigid 3D shape retrieval is presented in this study, which can
make full use of the complimentary geometric information from multiple shape
features by utilizing the KL-divergences. Minimizing KL-divergence between
different metric of features and a common metric is a consistency constraints,
which can lead the consistency shared latent feature space of the multiple
features. We apply the proposed method to 3D model retrieval, and test our
method on well known benchmark database. The results show that our method
substantially outperforms the state-of-the-art non-rigid 3D shape retrieval
methods
Hierarchically Learned View-Invariant Representations for Cross-View Action Recognition
Recognizing human actions from varied views is challenging due to huge
appearance variations in different views. The key to this problem is to learn
discriminant view-invariant representations generalizing well across views. In
this paper, we address this problem by learning view-invariant representations
hierarchically using a novel method, referred to as Joint Sparse Representation
and Distribution Adaptation (JSRDA). To obtain robust and informative feature
representations, we first incorporate a sample-affinity matrix into the
marginalized stacked denoising Autoencoder (mSDA) to obtain shared features,
which are then combined with the private features. In order to make the feature
representations of videos across views transferable, we then learn a
transferable dictionary pair simultaneously from pairs of videos taken at
different views to encourage each action video across views to have the same
sparse representation. However, the distribution difference across views still
exists because a unified subspace where the sparse representations of one
action across views are the same may not exist when the view difference is
large. Therefore, we propose a novel unsupervised distribution adaptation
method that learns a set of projections that project the source and target
views data into respective low-dimensional subspaces where the marginal and
conditional distribution differences are reduced simultaneously. Therefore, the
finally learned feature representation is view-invariant and robust for
substantial distribution difference across views even the view difference is
large. Experimental results on four multiview datasets show that our approach
outperforms the state-ofthe-art approaches.Comment: Published in IEEE Transactions on Circuits and Systems for Video
Technology, codes can be found at https://yangliu9208.github.io/JSRDA
Multi-View Spectral Clustering Tailored Tensor Low-Rank Representation
This paper explores the problem of multi-view spectral clustering (MVSC)
based on tensor low-rank modeling. Unlike the existing methods that all adopt
an off-the-shelf tensor low-rank norm without considering the special
characteristics of the tensor in MVSC, we design a novel structured tensor
low-rank norm tailored to MVSC. Specifically, we explicitly impose a symmetric
low-rank constraint and a structured sparse low-rank constraint on the frontal
and horizontal slices of the tensor to characterize the intra-view and
inter-view relationships, respectively. Moreover, the two constraints could be
jointly optimized to achieve mutual refinement. On the basis of the novel
tensor low-rank norm, we formulate MVSC as a convex low-rank tensor recovery
problem, which is then efficiently solved with an augmented Lagrange multiplier
based method iteratively. Extensive experimental results on five benchmark
datasets show that the proposed method outperforms state-of-the-art methods to
a significant extent. Impressively, our method is able to produce perfect
clustering. In addition, the parameters of our method can be easily tuned, and
the proposed model is robust to different datasets, demonstrating its potential
in practice
- …