14,121 research outputs found
Beyond Low-Rank Representations: Orthogonal Clustering Basis Reconstruction with Optimized Graph Structure for Multi-view Spectral Clustering
Low-Rank Representation (LRR) is arguably one of the most powerful paradigms
for Multi-view spectral clustering, which elegantly encodes the multi-view
local graph/manifold structures into an intrinsic low-rank self-expressive data
similarity embedded in high-dimensional space, to yield a better graph
partition than their single-view counterparts. In this paper we revisit it with
a fundamentally different perspective by discovering LRR as essentially a
latent clustered orthogonal projection based representation winged with an
optimized local graph structure for spectral clustering; each column of the
representation is fundamentally a cluster basis orthogonal to others to
indicate its members, which intuitively projects the view-specific feature
representation to be the one spanned by all orthogonal basis to characterize
the cluster structures. Upon this finding, we propose our technique with the
followings: (1) We decompose LRR into latent clustered orthogonal
representation via low-rank matrix factorization, to encode the more flexible
cluster structures than LRR over primal data objects; (2) We convert the
problem of LRR into that of simultaneously learning orthogonal clustered
representation and optimized local graph structure for each view; (3) The
learned orthogonal clustered representations and local graph structures enjoy
the same magnitude for multi-view, so that the ideal multi-view consensus can
be readily achieved. The experiments over multi-view datasets validate its
superiority.Comment: Accepted to appear in Neural Networks, Elsevier, on 9th March 201
Convex Sparse Spectral Clustering: Single-view to Multi-view
Spectral Clustering (SC) is one of the most widely used methods for data
clustering. It first finds a low-dimensonal embedding of data by computing
the eigenvectors of the normalized Laplacian matrix, and then performs k-means
on to get the final clustering result. In this work, we observe that,
in the ideal case, should be block diagonal and thus sparse.
Therefore we propose the Sparse Spectral Clustering (SSC) method which extends
SC with sparse regularization on . To address the computational issue
of the nonconvex SSC model, we propose a novel convex relaxation of SSC based
on the convex hull of the fixed rank projection matrices. Then the convex SSC
model can be efficiently solved by the Alternating Direction Method of
\canyi{Multipliers} (ADMM). Furthermore, we propose the Pairwise Sparse
Spectral Clustering (PSSC) which extends SSC to boost the clustering
performance by using the multi-view information of data. Experimental
comparisons with several baselines on real-world datasets testify to the
efficacy of our proposed methods
Feature Concatenation Multi-view Subspace Clustering
Multi-view clustering aims to achieve more promising clustering results than
single-view clustering by exploring the multi-view information. Since statistic
properties of different views are diverse, even incompatible, few approaches
implement multi-view clustering based on the concatenated features directly.
However, feature concatenation is a natural way to combine multiple views. To
this end, this paper proposes a novel multi-view subspace clustering approach
dubbed Feature Concatenation Multi-view Subspace Clustering (FCMSC).
Specifically, by exploring the consensus information, multi-view data are
concatenated into a joint representation firstly, then, -norm is
integrated into the objective function to deal with the sample-specific and
cluster-specific corruptions of multiple views for benefiting the clustering
performance. Furthermore, by introducing graph Laplacians of multiple views, a
graph regularized FCMSC is also introduced to explore both the consensus
information and complementary information for clustering. It is noteworthy that
the obtained coefficient matrix is not derived by directly applying the
Low-Rank Representation (LRR) to the joint view representation simply. Finally,
an effective algorithm based on the Augmented Lagrangian Multiplier (ALM) is
designed to optimized the objective functions. Comprehensive experiments on six
real world datasets illustrate the superiority of the proposed methods over
several state-of-the-art approaches for multi-view clustering
Guided Co-training for Large-Scale Multi-View Spectral Clustering
In many real-world applications, we have access to multiple views of the
data, each of which characterizes the data from a distinct aspect. Several
previous algorithms have demonstrated that one can achieve better clustering
accuracy by integrating information from all views appropriately than using
only an individual view. Owing to the effectiveness of spectral clustering,
many multi-view clustering methods are based on it. Unfortunately, they have
limited applicability to large-scale data due to the high computational
complexity of spectral clustering. In this work, we propose a novel multi-view
spectral clustering method for large-scale data. Our approach is structured
under the guided co-training scheme to fuse distinct views, and uses the
sampling technique to accelerate spectral clustering. More specifically, we
first select () landmark points and then approximate the
eigen-decomposition accordingly. The augmented view, which is essential to
guided co-training process, can then be quickly determined by our method. The
proposed algorithm scales linearly with the number of given data. Extensive
experiments have been performed and the results support the advantage of our
method for handling the large-scale multi-view situation
A Survey on Multi-View Clustering
With advances in information acquisition technologies, multi-view data become
ubiquitous. Multi-view learning has thus become more and more popular in
machine learning and data mining fields. Multi-view unsupervised or
semi-supervised learning, such as co-training, co-regularization has gained
considerable attention. Although recently, multi-view clustering (MVC) methods
have been developed rapidly, there has not been a survey to summarize and
analyze the current progress. Therefore, this paper reviews the common
strategies for combining multiple views of data and based on this summary we
propose a novel taxonomy of the MVC approaches. We further discuss the
relationships between MVC and multi-view representation, ensemble clustering,
multi-task clustering, multi-view supervised and semi-supervised learning.
Several representative real-world applications are elaborated. To promote
future development of MVC, we envision several open problems that may require
further investigation and thorough examination.Comment: 17 pages, 4 figure
Multi-View Spectral Clustering via Structured Low-Rank Matrix Factorization
Multi-view data clustering attracts more attention than their single view
counterparts due to the fact that leveraging multiple independent and
complementary information from multi-view feature spaces outperforms the single
one. Multi-view Spectral Clustering aims at yielding the data partition
agreement over their local manifold structures by seeking
eigenvalue-eigenvector decompositions. However, as we observed, such classical
paradigm still suffers from (1) overlooking the flexible local manifold
structure, caused by (2) enforcing the low-rank data correlation agreement
among all views; worse still, (3) LRR is not intuitively flexible to capture
the latent data clustering structures. In this paper, we present the structured
LRR by factorizing into the latent low-dimensional data-cluster
representations, which characterize the data clustering structure for each
view. Upon such representation, (b) the laplacian regularizer is imposed to be
capable of preserving the flexible local manifold structure for each view. (c)
We present an iterative multi-view agreement strategy by minimizing the
divergence objective among all factorized latent data-cluster representations
during each iteration of optimization process, where such latent representation
from each view serves to regulate those from other views, such intuitive
process iteratively coordinates all views to be agreeable. (d) We remark that
such data-cluster representation can flexibly encode the data clustering
structure from any view with adaptive input cluster number. To this end, (e) a
novel non-convex objective function is proposed via the efficient alternating
minimization strategy. The complexity analysis are also presented. The
extensive experiments conducted against the real-world multi-view datasets
demonstrate the superiority over state-of-the-arts.Comment: Accepted to appear at IEEE Trans on Neural Networks and Learning
System
Multi-View Community Detection in Facebook Public Pages
Community detection in social networks is widely studied because of its
importance in uncovering how people connect and interact. However, little
attention has been given to community structure in Facebook public pages. In
this study, we investigate the community detection problem in Facebook
newsgroup pages. In particular, to deal with the diversity of user activities,
we apply multi-view clustering to integrate different views, for example, likes
on posts and likes on comments. In this study, we explore the community
structure in not only a given single page but across multiple pages. The
results show that our method can effectively reduce isolates and improve the
quality of community structure
Low-rank Kernel Learning for Graph-based Clustering
Constructing the adjacency graph is fundamental to graph-based clustering.
Graph learning in kernel space has shown impressive performance on a number of
benchmark data sets. However, its performance is largely determined by the
chosen kernel matrix. To address this issue, the previous multiple kernel
learning algorithm has been applied to learn an optimal kernel from a group of
predefined kernels. This approach might be sensitive to noise and limits the
representation ability of the consensus kernel. In contrast to existing
methods, we propose to learn a low-rank kernel matrix which exploits the
similarity nature of the kernel matrix and seeks an optimal kernel from the
neighborhood of candidate kernels. By formulating graph construction and kernel
learning in a unified framework, the graph and consensus kernel can be
iteratively enhanced by each other. Extensive experimental results validate the
efficacy of the proposed method
Multi-view Low-rank Sparse Subspace Clustering
Most existing approaches address multi-view subspace clustering problem by
constructing the affinity matrix on each view separately and afterwards propose
how to extend spectral clustering algorithm to handle multi-view data. This
paper presents an approach to multi-view subspace clustering that learns a
joint subspace representation by constructing affinity matrix shared among all
views. Relying on the importance of both low-rank and sparsity constraints in
the construction of the affinity matrix, we introduce the objective that
balances between the agreement across different views, while at the same time
encourages sparsity and low-rankness of the solution. Related low-rank and
sparsity constrained optimization problem is for each view solved using the
alternating direction method of multipliers. Furthermore, we extend our
approach to cluster data drawn from nonlinear subspaces by solving the
corresponding problem in a reproducing kernel Hilbert space. The proposed
algorithm outperforms state-of-the-art multi-view subspace clustering
algorithms on one synthetic and four real-world datasets
Feature Selection: A Data Perspective
Feature selection, as a data preprocessing strategy, has been proven to be
effective and efficient in preparing data (especially high-dimensional data)
for various data mining and machine learning problems. The objectives of
feature selection include: building simpler and more comprehensible models,
improving data mining performance, and preparing clean, understandable data.
The recent proliferation of big data has presented some substantial challenges
and opportunities to feature selection. In this survey, we provide a
comprehensive and structured overview of recent advances in feature selection
research. Motivated by current challenges and opportunities in the era of big
data, we revisit feature selection research from a data perspective and review
representative feature selection algorithms for conventional data, structured
data, heterogeneous data and streaming data. Methodologically, to emphasize the
differences and similarities of most existing feature selection algorithms for
conventional data, we categorize them into four main groups: similarity based,
information theoretical based, sparse learning based and statistical based
methods. To facilitate and promote the research in this community, we also
present an open-source feature selection repository that consists of most of
the popular feature selection algorithms
(\url{http://featureselection.asu.edu/}). Also, we use it as an example to show
how to evaluate feature selection algorithms. At the end of the survey, we
present a discussion about some open problems and challenges that require more
attention in future research
- …