Search CORE

14,121 research outputs found

Beyond Low-Rank Representations: Orthogonal Clustering Basis Reconstruction with Optimized Graph Structure for Multi-view Spectral Clustering

Author: Wang Yang
Wu Lin
Publication venue
Publication date: 21/03/2018
Field of study

Low-Rank Representation (LRR) is arguably one of the most powerful paradigms for Multi-view spectral clustering, which elegantly encodes the multi-view local graph/manifold structures into an intrinsic low-rank self-expressive data similarity embedded in high-dimensional space, to yield a better graph partition than their single-view counterparts. In this paper we revisit it with a fundamentally different perspective by discovering LRR as essentially a latent clustered orthogonal projection based representation winged with an optimized local graph structure for spectral clustering; each column of the representation is fundamentally a cluster basis orthogonal to others to indicate its members, which intuitively projects the view-specific feature representation to be the one spanned by all orthogonal basis to characterize the cluster structures. Upon this finding, we propose our technique with the followings: (1) We decompose LRR into latent clustered orthogonal representation via low-rank matrix factorization, to encode the more flexible cluster structures than LRR over primal data objects; (2) We convert the problem of LRR into that of simultaneously learning orthogonal clustered representation and optimized local graph structure for each view; (3) The learned orthogonal clustered representations and local graph structures enjoy the same magnitude for multi-view, so that the ideal multi-view consensus can be readily achieved. The experiments over multi-view datasets validate its superiority.Comment: Accepted to appear in Neural Networks, Elsevier, on 9th March 201

arXiv.org e-Print Archive

Convex Sparse Spectral Clustering: Single-view to Multi-view

Author: Lin Zhouchen
Lu Canyi
Yan Shuicheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/05/2018
Field of study

Spectral Clustering (SC) is one of the most widely used methods for data clustering. It first finds a low-dimensonal embedding

U

of data by computing the eigenvectors of the normalized Laplacian matrix, and then performs k-means on

U^\top

to get the final clustering result. In this work, we observe that, in the ideal case,

UU^\top

should be block diagonal and thus sparse. Therefore we propose the Sparse Spectral Clustering (SSC) method which extends SC with sparse regularization on

UU^\top

. To address the computational issue of the nonconvex SSC model, we propose a novel convex relaxation of SSC based on the convex hull of the fixed rank projection matrices. Then the convex SSC model can be efficiently solved by the Alternating Direction Method of \canyi{Multipliers} (ADMM). Furthermore, we propose the Pairwise Sparse Spectral Clustering (PSSC) which extends SSC to boost the clustering performance by using the multi-view information of data. Experimental comparisons with several baselines on real-world datasets testify to the efficacy of our proposed methods

arXiv.org e-Print Archive

Feature Concatenation Multi-view Subspace Clustering

Author: Li Yaochen
Li Zhongyu
Pang Shanmin
Wang Jun
Zheng Qinghai
Zhu Jihua
Publication venue
Publication date: 05/06/2019
Field of study

Multi-view clustering aims to achieve more promising clustering results than single-view clustering by exploring the multi-view information. Since statistic properties of different views are diverse, even incompatible, few approaches implement multi-view clustering based on the concatenated features directly. However, feature concatenation is a natural way to combine multiple views. To this end, this paper proposes a novel multi-view subspace clustering approach dubbed Feature Concatenation Multi-view Subspace Clustering (FCMSC). Specifically, by exploring the consensus information, multi-view data are concatenated into a joint representation firstly, then,

l_{2,1}

-norm is integrated into the objective function to deal with the sample-specific and cluster-specific corruptions of multiple views for benefiting the clustering performance. Furthermore, by introducing graph Laplacians of multiple views, a graph regularized FCMSC is also introduced to explore both the consensus information and complementary information for clustering. It is noteworthy that the obtained coefficient matrix is not derived by directly applying the Low-Rank Representation (LRR) to the joint view representation simply. Finally, an effective algorithm based on the Augmented Lagrangian Multiplier (ALM) is designed to optimized the objective functions. Comprehensive experiments on six real world datasets illustrate the superiority of the proposed methods over several state-of-the-art approaches for multi-view clustering

arXiv.org e-Print Archive

Guided Co-training for Large-Scale Multi-View Spectral Clustering

Author: Liu Tyng-Luh
Publication venue
Publication date: 18/07/2017
Field of study

In many real-world applications, we have access to multiple views of the data, each of which characterizes the data from a distinct aspect. Several previous algorithms have demonstrated that one can achieve better clustering accuracy by integrating information from all views appropriately than using only an individual view. Owing to the effectiveness of spectral clustering, many multi-view clustering methods are based on it. Unfortunately, they have limited applicability to large-scale data due to the high computational complexity of spectral clustering. In this work, we propose a novel multi-view spectral clustering method for large-scale data. Our approach is structured under the guided co-training scheme to fuse distinct views, and uses the sampling technique to accelerate spectral clustering. More specifically, we first select

p

(

\ll n

) landmark points and then approximate the eigen-decomposition accordingly. The augmented view, which is essential to guided co-training process, can then be quickly determined by our method. The proposed algorithm scales linearly with the number of given data. Extensive experiments have been performed and the results support the advantage of our method for handling the large-scale multi-view situation

arXiv.org e-Print Archive

A Survey on Multi-View Clustering

Author: Bi Jinbo
Chao Guoqing
Sun Shiliang
Publication venue
Publication date: 03/04/2018
Field of study

With advances in information acquisition technologies, multi-view data become ubiquitous. Multi-view learning has thus become more and more popular in machine learning and data mining fields. Multi-view unsupervised or semi-supervised learning, such as co-training, co-regularization has gained considerable attention. Although recently, multi-view clustering (MVC) methods have been developed rapidly, there has not been a survey to summarize and analyze the current progress. Therefore, this paper reviews the common strategies for combining multiple views of data and based on this summary we propose a novel taxonomy of the MVC approaches. We further discuss the relationships between MVC and multi-view representation, ensemble clustering, multi-task clustering, multi-view supervised and semi-supervised learning. Several representative real-world applications are elaborated. To promote future development of MVC, we envision several open problems that may require further investigation and thorough examination.Comment: 17 pages, 4 figure

arXiv.org e-Print Archive

Multi-View Spectral Clustering via Structured Low-Rank Matrix Factorization

Author: Wang Yang
Wu Lin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/12/2017
Field of study

Multi-view data clustering attracts more attention than their single view counterparts due to the fact that leveraging multiple independent and complementary information from multi-view feature spaces outperforms the single one. Multi-view Spectral Clustering aims at yielding the data partition agreement over their local manifold structures by seeking eigenvalue-eigenvector decompositions. However, as we observed, such classical paradigm still suffers from (1) overlooking the flexible local manifold structure, caused by (2) enforcing the low-rank data correlation agreement among all views; worse still, (3) LRR is not intuitively flexible to capture the latent data clustering structures. In this paper, we present the structured LRR by factorizing into the latent low-dimensional data-cluster representations, which characterize the data clustering structure for each view. Upon such representation, (b) the laplacian regularizer is imposed to be capable of preserving the flexible local manifold structure for each view. (c) We present an iterative multi-view agreement strategy by minimizing the divergence objective among all factorized latent data-cluster representations during each iteration of optimization process, where such latent representation from each view serves to regulate those from other views, such intuitive process iteratively coordinates all views to be agreeable. (d) We remark that such data-cluster representation can flexibly encode the data clustering structure from any view with adaptive input cluster number. To this end, (e) a novel non-convex objective function is proposed via the efficient alternating minimization strategy. The complexity analysis are also presented. The extensive experiments conducted against the real-world multi-view datasets demonstrate the superiority over state-of-the-arts.Comment: Accepted to appear at IEEE Trans on Neural Networks and Learning System

arXiv.org e-Print Archive

Multi-View Community Detection in Facebook Public Pages

Author: Barnett George
Chapman Jon W.
Lai Chun-Ming
Wu S. Felix
Xin Zhige
Publication venue
Publication date: 06/12/2018
Field of study

Community detection in social networks is widely studied because of its importance in uncovering how people connect and interact. However, little attention has been given to community structure in Facebook public pages. In this study, we investigate the community detection problem in Facebook newsgroup pages. In particular, to deal with the diversity of user activities, we apply multi-view clustering to integrate different views, for example, likes on posts and likes on comments. In this study, we explore the community structure in not only a given single page but across multiple pages. The results show that our method can effectively reduce isolates and improve the quality of community structure

arXiv.org e-Print Archive

Low-rank Kernel Learning for Graph-based Clustering

Author: Chen Wenyu
Kang Zhao
Wen Liangjian
Xu Zenglin
Publication venue: 'Elsevier BV'
Publication date: 14/03/2019
Field of study

Constructing the adjacency graph is fundamental to graph-based clustering. Graph learning in kernel space has shown impressive performance on a number of benchmark data sets. However, its performance is largely determined by the chosen kernel matrix. To address this issue, the previous multiple kernel learning algorithm has been applied to learn an optimal kernel from a group of predefined kernels. This approach might be sensitive to noise and limits the representation ability of the consensus kernel. In contrast to existing methods, we propose to learn a low-rank kernel matrix which exploits the similarity nature of the kernel matrix and seeks an optimal kernel from the neighborhood of candidate kernels. By formulating graph construction and kernel learning in a unified framework, the graph and consensus kernel can be iteratively enhanced by each other. Extensive experimental results validate the efficacy of the proposed method

arXiv.org e-Print Archive

Multi-view Low-rank Sparse Subspace Clustering

Author: Brbic Maria
Kopriva Ivica
Publication venue: 'Elsevier BV'
Publication date: 29/08/2017
Field of study

Most existing approaches address multi-view subspace clustering problem by constructing the affinity matrix on each view separately and afterwards propose how to extend spectral clustering algorithm to handle multi-view data. This paper presents an approach to multi-view subspace clustering that learns a joint subspace representation by constructing affinity matrix shared among all views. Relying on the importance of both low-rank and sparsity constraints in the construction of the affinity matrix, we introduce the objective that balances between the agreement across different views, while at the same time encourages sparsity and low-rankness of the solution. Related low-rank and sparsity constrained optimization problem is for each view solved using the alternating direction method of multipliers. Furthermore, we extend our approach to cluster data drawn from nonlinear subspaces by solving the corresponding problem in a reproducing kernel Hilbert space. The proposed algorithm outperforms state-of-the-art multi-view subspace clustering algorithms on one synthetic and four real-world datasets

arXiv.org e-Print Archive

Full-text Institutional Repository of the Ruđer Bošković Institute

Feature Selection: A Data Perspective

Author: Cheng Kewei
Li Jundong
Liu Huan
Morstatter Fred
Tang Jiliang
Trevino Robert P.
Wang Suhang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/08/2018
Field of study

Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing data (especially high-dimensional data) for various data mining and machine learning problems. The objectives of feature selection include: building simpler and more comprehensible models, improving data mining performance, and preparing clean, understandable data. The recent proliferation of big data has presented some substantial challenges and opportunities to feature selection. In this survey, we provide a comprehensive and structured overview of recent advances in feature selection research. Motivated by current challenges and opportunities in the era of big data, we revisit feature selection research from a data perspective and review representative feature selection algorithms for conventional data, structured data, heterogeneous data and streaming data. Methodologically, to emphasize the differences and similarities of most existing feature selection algorithms for conventional data, we categorize them into four main groups: similarity based, information theoretical based, sparse learning based and statistical based methods. To facilitate and promote the research in this community, we also present an open-source feature selection repository that consists of most of the popular feature selection algorithms (\url{http://featureselection.asu.edu/}). Also, we use it as an example to show how to evaluate feature selection algorithms. At the end of the survey, we present a discussion about some open problems and challenges that require more attention in future research

arXiv.org e-Print Archive