17,170 research outputs found
A Survey on Multi-View Clustering
With advances in information acquisition technologies, multi-view data become
ubiquitous. Multi-view learning has thus become more and more popular in
machine learning and data mining fields. Multi-view unsupervised or
semi-supervised learning, such as co-training, co-regularization has gained
considerable attention. Although recently, multi-view clustering (MVC) methods
have been developed rapidly, there has not been a survey to summarize and
analyze the current progress. Therefore, this paper reviews the common
strategies for combining multiple views of data and based on this summary we
propose a novel taxonomy of the MVC approaches. We further discuss the
relationships between MVC and multi-view representation, ensemble clustering,
multi-task clustering, multi-view supervised and semi-supervised learning.
Several representative real-world applications are elaborated. To promote
future development of MVC, we envision several open problems that may require
further investigation and thorough examination.Comment: 17 pages, 4 figure
Feature Concatenation Multi-view Subspace Clustering
Multi-view clustering aims to achieve more promising clustering results than
single-view clustering by exploring the multi-view information. Since statistic
properties of different views are diverse, even incompatible, few approaches
implement multi-view clustering based on the concatenated features directly.
However, feature concatenation is a natural way to combine multiple views. To
this end, this paper proposes a novel multi-view subspace clustering approach
dubbed Feature Concatenation Multi-view Subspace Clustering (FCMSC).
Specifically, by exploring the consensus information, multi-view data are
concatenated into a joint representation firstly, then, -norm is
integrated into the objective function to deal with the sample-specific and
cluster-specific corruptions of multiple views for benefiting the clustering
performance. Furthermore, by introducing graph Laplacians of multiple views, a
graph regularized FCMSC is also introduced to explore both the consensus
information and complementary information for clustering. It is noteworthy that
the obtained coefficient matrix is not derived by directly applying the
Low-Rank Representation (LRR) to the joint view representation simply. Finally,
an effective algorithm based on the Augmented Lagrangian Multiplier (ALM) is
designed to optimized the objective functions. Comprehensive experiments on six
real world datasets illustrate the superiority of the proposed methods over
several state-of-the-art approaches for multi-view clustering
Low-rank Kernel Learning for Graph-based Clustering
Constructing the adjacency graph is fundamental to graph-based clustering.
Graph learning in kernel space has shown impressive performance on a number of
benchmark data sets. However, its performance is largely determined by the
chosen kernel matrix. To address this issue, the previous multiple kernel
learning algorithm has been applied to learn an optimal kernel from a group of
predefined kernels. This approach might be sensitive to noise and limits the
representation ability of the consensus kernel. In contrast to existing
methods, we propose to learn a low-rank kernel matrix which exploits the
similarity nature of the kernel matrix and seeks an optimal kernel from the
neighborhood of candidate kernels. By formulating graph construction and kernel
learning in a unified framework, the graph and consensus kernel can be
iteratively enhanced by each other. Extensive experimental results validate the
efficacy of the proposed method
Clustering with Similarity Preserving
Graph-based clustering has shown promising performance in many tasks. A key
step of graph-based approach is the similarity graph construction. In general,
learning graph in kernel space can enhance clustering accuracy due to the
incorporation of nonlinearity. However, most existing kernel-based graph
learning mechanisms is not similarity-preserving, hence leads to sub-optimal
performance. To overcome this drawback, we propose a more discriminative graph
learning method which can preserve the pairwise similarities between samples in
an adaptive manner for the first time. Specifically, we require the learned
graph be close to a kernel matrix, which serves as a measure of similarity in
raw data. Moreover, the structure is adaptively tuned so that the number of
connected components of the graph is exactly equal to the number of clusters.
Finally, our method unifies clustering and graph learning which can directly
obtain cluster indicators from the graph itself without performing further
clustering step. The effectiveness of this approach is examined on both single
and multiple kernel learning scenarios in several datasets
Spectral and matrix factorization methods for consistent community detection in multi-layer networks
We consider the problem of estimating a consensus community structure by
combining information from multiple layers of a multi-layer network using
methods based on the spectral clustering or a low-rank matrix factorization. As
a general theme, these "intermediate fusion" methods involve obtaining a low
column rank matrix by optimizing an objective function and then using the
columns of the matrix for clustering. However, the theoretical properties of
these methods remain largely unexplored. In the absence of statistical
guarantees on the objective functions, it is difficult to determine if the
algorithms optimizing the objectives will return good community structures. We
investigate the consistency properties of the global optimizer of some of these
objective functions under the multi-layer stochastic blockmodel. For this
purpose, we derive several new asymptotic results showing consistency of the
intermediate fusion techniques along with the spectral clustering of mean
adjacency matrix under a high dimensional setup, where the number of nodes, the
number of layers and the number of communities of the multi-layer graph grow.
Our numerical study shows that the intermediate fusion techniques outperform
late fusion methods, namely spectral clustering on aggregate spectral kernel
and module allegiance matrix in sparse networks, while they outperform the
spectral clustering of mean adjacency matrix in multi-layer networks that
contain layers with both homophilic and heterophilic communities
Multi-view Metric Learning for Multi-view Video Summarization
Traditional methods on video summarization are designed to generate summaries
for single-view video records; and thus they cannot fully exploit the
redundancy in multi-view video records. In this paper, we present a multi-view
metric learning framework for multi-view video summarization that combines the
advantages of maximum margin clustering with the disagreement minimization
criterion. The learning framework thus has the ability to find a metric that
best separates the data, and meanwhile to force the learned metric to maintain
original intrinsic information between data points, for example geometric
information. Facilitated by such a framework, a systematic solution to the
multi-view video summarization problem is developed. To the best of our
knowledge, it is the first time to address multi-view video summarization from
the viewpoint of metric learning. The effectiveness of the proposed method is
demonstrated by experiments
Guided Co-training for Large-Scale Multi-View Spectral Clustering
In many real-world applications, we have access to multiple views of the
data, each of which characterizes the data from a distinct aspect. Several
previous algorithms have demonstrated that one can achieve better clustering
accuracy by integrating information from all views appropriately than using
only an individual view. Owing to the effectiveness of spectral clustering,
many multi-view clustering methods are based on it. Unfortunately, they have
limited applicability to large-scale data due to the high computational
complexity of spectral clustering. In this work, we propose a novel multi-view
spectral clustering method for large-scale data. Our approach is structured
under the guided co-training scheme to fuse distinct views, and uses the
sampling technique to accelerate spectral clustering. More specifically, we
first select () landmark points and then approximate the
eigen-decomposition accordingly. The augmented view, which is essential to
guided co-training process, can then be quickly determined by our method. The
proposed algorithm scales linearly with the number of given data. Extensive
experiments have been performed and the results support the advantage of our
method for handling the large-scale multi-view situation
Multi-view Unsupervised Feature Selection by Cross-diffused Matrix Alignment
Multi-view high-dimensional data become increasingly popular in the big data
era. Feature selection is a useful technique for alleviating the curse of
dimensionality in multi-view learning. In this paper, we study unsupervised
feature selection for multi-view data, as class labels are usually expensive to
obtain. Traditional feature selection methods are mostly designed for
single-view data and cannot fully exploit the rich information from multi-view
data. Existing multi-view feature selection methods are usually based on noisy
cluster labels which might not preserve sufficient information from multi-view
data. To better utilize multi-view information, we propose a method, CDMA-FS,
to select features for each view by performing alignment on a cross diffused
matrix. We formulate it as a constrained optimization problem and solve it
using Quasi-Newton based method. Experiments results on four real-world
datasets show that the proposed method is more effective than the
state-of-the-art methods in multi-view setting.Comment: 8 page
Multi-view Low-rank Sparse Subspace Clustering
Most existing approaches address multi-view subspace clustering problem by
constructing the affinity matrix on each view separately and afterwards propose
how to extend spectral clustering algorithm to handle multi-view data. This
paper presents an approach to multi-view subspace clustering that learns a
joint subspace representation by constructing affinity matrix shared among all
views. Relying on the importance of both low-rank and sparsity constraints in
the construction of the affinity matrix, we introduce the objective that
balances between the agreement across different views, while at the same time
encourages sparsity and low-rankness of the solution. Related low-rank and
sparsity constrained optimization problem is for each view solved using the
alternating direction method of multipliers. Furthermore, we extend our
approach to cluster data drawn from nonlinear subspaces by solving the
corresponding problem in a reproducing kernel Hilbert space. The proposed
algorithm outperforms state-of-the-art multi-view subspace clustering
algorithms on one synthetic and four real-world datasets
Robust Kernelized Multi-View Self-Representations for Clustering by Tensor Multi-Rank Minimization
Most recently, tensor-SVD is implemented on multi-view self-representation
clustering and has achieved the promising results in many real-world
applications such as face clustering, scene clustering and generic object
clustering. However, tensor-SVD based multi-view self-representation clustering
is proposed originally to solve the clustering problem in the multiple linear
subspaces, leading to unsatisfactory results when dealing with the case of
non-linear subspaces. To handle data clustering from the non-linear subspaces,
a kernelization method is designed by mapping the data from the original input
space to a new feature space in which the transformed data can be clustered by
a multiple linear clustering method. In this paper, we make an optimization
model for the kernelized multi-view self-representation clustering problem. We
also develop a new efficient algorithm based on the alternation direction
method and infer a closed-form solution. Since all the subproblems can be
solved exactly, the proposed optimization algorithm is guaranteed to obtain the
optimal solution. In particular, the original tensor-based multi-view
self-representation clustering problem is a special case of our approach and
can be solved by our algorithm. Experimental results on several popular
real-world clustering datasets demonstrate that our approach achieves the
state-of-the-art performance.Comment: 8 pages, 5 figures, AAAI2018 submitte
- …