3,538 research outputs found
FI-GRL: Fast Inductive Graph Representation Learning via Projection-Cost Preservation
Graph representation learning aims at transforming graph data into meaningful
low-dimensional vectors to facilitate the employment of machine learning and
data mining algorithms designed for general data. Most current graph
representation learning approaches are transductive, which means that they
require all the nodes in the graph are known when learning graph
representations and these approaches cannot naturally generalize to unseen
nodes. In this paper, we present a Fast Inductive Graph Representation Learning
framework (FI-GRL) to learn nodes' low-dimensional representations. Our
approach can obtain accurate representations for seen nodes with provable
theoretical guarantees and can easily generalize to unseen nodes. Specifically,
in order to explicitly decouple nodes' relations expressed by the graph, we
transform nodes into a randomized subspace spanned by a random projection
matrix. This stage is guaranteed to preserve the projection-cost of the
normalized random walk matrix which is highly related to the normalized cut of
the graph. Then feature extraction is achieved by conducting singular value
decomposition on the obtained matrix sketch. By leveraging the property of
projection-cost preservation on the matrix sketch, the obtained representation
result is nearly optimal. To deal with unseen nodes, we utilize folding-in
technique to learn their meaningful representations. Empirically, when the
amount of seen nodes are larger than that of unseen nodes, FI-GRL always
achieves excellent results. Our algorithm is fast, simple to implement and
theoretically guaranteed. Extensive experiments on real datasets demonstrate
the superiority of our algorithm on both efficacy and efficiency over both
macroscopic level (clustering) and microscopic level (structural hole
detection) applications.Comment: ICDM 2018, Full Versio
A Framework for Deep Constrained Clustering -- Algorithms and Advances
The area of constrained clustering has been extensively explored by
researchers and used by practitioners. Constrained clustering formulations
exist for popular algorithms such as k-means, mixture models, and spectral
clustering but have several limitations. A fundamental strength of deep
learning is its flexibility, and here we explore a deep learning framework for
constrained clustering and in particular explore how it can extend the field of
constrained clustering. We show that our framework can not only handle standard
together/apart constraints (without the well documented negative effects
reported earlier) generated from labeled side information but more complex
constraints generated from new types of side information such as continuous
values and high-level domain knowledge.Comment: Updated for ECML/PKDD 201
Asymptotically efficient estimators for stochastic blockmodels: the naive MLE, the rank-constrained MLE, and the spectral
We establish asymptotic normality results for estimation of the block
probability matrix in stochastic blockmodel graphs using spectral
embedding when the average degrees grows at the rate of in
, the number of vertices. As a corollary, we show that when is
of full-rank, estimates of obtained from spectral embedding are
asymptotically efficient. When is singular the estimates obtained
from spectral embedding can have smaller mean square error than those obtained
from maximizing the log-likelihood under no rank assumption, and furthermore,
can be almost as efficient as the true MLE that assume known
. Our results indicate, in the context of stochastic
blockmodel graphs, that spectral embedding is not just computationally
tractable, but that the resulting estimates are also admissible, even when
compared to the purportedly optimal but computationally intractable maximum
likelihood estimation under no rank assumption.Comment: 34 pages, 2 figure
Robust Unsupervised Flexible Auto-weighted Local-Coordinate Concept Factorization for Image Clustering
We investigate the high-dimensional data clustering problem by proposing a
novel and unsupervised representation learning model called Robust Flexible
Auto-weighted Local-coordinate Concept Factorization (RFA-LCF). RFA-LCF
integrates the robust flexible CF, robust sparse local-coordinate coding and
the adaptive reconstruction weighting learning into a unified model. The
adaptive weighting is driven by including the joint manifold preserving
constraints on the recovered clean data, basis concepts and new representation.
Specifically, our RFA-LCF uses a L2,1-norm based flexible residue to encode the
mismatch between clean data and its reconstruction, and also applies the robust
adaptive sparse local-coordinate coding to represent the data using a few
nearby basis concepts, which can make the factorization more accurate and
robust to noise. The robust flexible factorization is also performed in the
recovered clean data space for enhancing representations. RFA-LCF also
considers preserving the local manifold structures of clean data space, basis
concept space and the new coordinate space jointly in an adaptive manner way.
Extensive comparisons show that RFA-LCF can deliver enhanced clustering
results.Comment: Accepted at the 44th IEEE International Conference on Acoustics,
Speech, and Signal Processing(ICASSP 2019
The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics
The singular value matrix decomposition plays a ubiquitous role throughout
statistics and related fields. Myriad applications including clustering,
classification, and dimensionality reduction involve studying and exploiting
the geometric structure of singular values and singular vectors.
This paper provides a novel collection of technical and theoretical tools for
studying the geometry of singular subspaces using the two-to-infinity norm.
Motivated by preliminary deterministic Procrustes analysis, we consider a
general matrix perturbation setting in which we derive a new Procrustean matrix
decomposition. Together with flexible machinery developed for the
two-to-infinity norm, this allows us to conduct a refined analysis of the
induced perturbation geometry with respect to the underlying singular vectors
even in the presence of singular value multiplicity. Our analysis yields
singular vector entrywise perturbation bounds for a range of popular matrix
noise models, each of which has a meaningful associated statistical inference
task. In addition, we demonstrate how the two-to-infinity norm is the preferred
norm in certain statistical settings. Specific applications discussed in this
paper include covariance estimation, singular subspace recovery, and multiple
graph inference.
Both our Procrustean matrix decomposition and the technical machinery
developed for the two-to-infinity norm may be of independent interest.Comment: 36 page
Multi-View Spectral Clustering via Structured Low-Rank Matrix Factorization
Multi-view data clustering attracts more attention than their single view
counterparts due to the fact that leveraging multiple independent and
complementary information from multi-view feature spaces outperforms the single
one. Multi-view Spectral Clustering aims at yielding the data partition
agreement over their local manifold structures by seeking
eigenvalue-eigenvector decompositions. However, as we observed, such classical
paradigm still suffers from (1) overlooking the flexible local manifold
structure, caused by (2) enforcing the low-rank data correlation agreement
among all views; worse still, (3) LRR is not intuitively flexible to capture
the latent data clustering structures. In this paper, we present the structured
LRR by factorizing into the latent low-dimensional data-cluster
representations, which characterize the data clustering structure for each
view. Upon such representation, (b) the laplacian regularizer is imposed to be
capable of preserving the flexible local manifold structure for each view. (c)
We present an iterative multi-view agreement strategy by minimizing the
divergence objective among all factorized latent data-cluster representations
during each iteration of optimization process, where such latent representation
from each view serves to regulate those from other views, such intuitive
process iteratively coordinates all views to be agreeable. (d) We remark that
such data-cluster representation can flexibly encode the data clustering
structure from any view with adaptive input cluster number. To this end, (e) a
novel non-convex objective function is proposed via the efficient alternating
minimization strategy. The complexity analysis are also presented. The
extensive experiments conducted against the real-world multi-view datasets
demonstrate the superiority over state-of-the-arts.Comment: Accepted to appear at IEEE Trans on Neural Networks and Learning
System
Multi-Level Network Embedding with Boosted Low-Rank Matrix Approximation
As opposed to manual feature engineering which is tedious and difficult to
scale, network representation learning has attracted a surge of research
interests as it automates the process of feature learning on graphs. The
learned low-dimensional node vector representation is generalizable and eases
the knowledge discovery process on graphs by enabling various off-the-shelf
machine learning tools to be directly applied. Recent research has shown that
the past decade of network embedding approaches either explicitly factorize a
carefully designed matrix to obtain the low-dimensional node vector
representation or are closely related to implicit matrix factorization, with
the fundamental assumption that the factorized node connectivity matrix is
low-rank. Nonetheless, the global low-rank assumption does not necessarily hold
especially when the factorized matrix encodes complex node interactions, and
the resultant single low-rank embedding matrix is insufficient to capture all
the observed connectivity patterns. In this regard, we propose a novel
multi-level network embedding framework BoostNE, which can learn multiple
network embedding representations of different granularity from coarse to fine
without imposing the prevalent global low-rank assumption. The proposed BoostNE
method is also in line with the successful gradient boosting method in ensemble
learning as multiple weak embeddings lead to a stronger and more effective one.
We assess the effectiveness of the proposed BoostNE framework by comparing it
with existing state-of-the-art network embedding methods on various datasets,
and the experimental results corroborate the superiority of the proposed
BoostNE network embedding framework
Face Clustering: Representation and Pairwise Constraints
Clustering face images according to their identity has two important
applications: (i) grouping a collection of face images when no external labels
are associated with images, and (ii) indexing for efficient large scale face
retrieval. The clustering problem is composed of two key parts: face
representation and choice of similarity for grouping faces. We first propose a
representation based on ResNet, which has been shown to perform very well in
image classification problems. Given this representation, we design a
clustering algorithm, Conditional Pairwise Clustering (ConPaC), which directly
estimates the adjacency matrix only based on the similarity between face
images. This allows a dynamic selection of number of clusters and retains
pairwise similarity between faces. ConPaC formulates the clustering problem as
a Conditional Random Field (CRF) model and uses Loopy Belief Propagation to
find an approximate solution for maximizing the posterior probability of the
adjacency matrix. Experimental results on two benchmark face datasets (LFW and
IJB-B) show that ConPaC outperforms well known clustering algorithms such as
k-means, spectral clustering and approximate rank-order. Additionally, our
algorithm can naturally incorporate pairwise constraints to obtain a
semi-supervised version that leads to improved clustering performance. We also
propose an k-NN variant of ConPaC, which has a linear time complexity given a
k-NN graph, suitable for large datasets.Comment: This second version is the same as TIFS version. Some experiment
results are different from v1 because we correct the protocol
Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization
Image clustering is one of the most important computer vision applications,
which has been extensively studied in literature. However, current clustering
methods mostly suffer from lack of efficiency and scalability when dealing with
large-scale and high-dimensional data. In this paper, we propose a new
clustering model, called DEeP Embedded RegularIzed ClusTering (DEPICT), which
efficiently maps data into a discriminative embedding subspace and precisely
predicts cluster assignments. DEPICT generally consists of a multinomial
logistic regression function stacked on top of a multi-layer convolutional
autoencoder. We define a clustering objective function using relative entropy
(KL divergence) minimization, regularized by a prior for the frequency of
cluster assignments. An alternating strategy is then derived to optimize the
objective by updating parameters and estimating cluster assignments.
Furthermore, we employ the reconstruction loss functions in our autoencoder, as
a data-dependent regularization term, to prevent the deep embedding function
from overfitting. In order to benefit from end-to-end optimization and
eliminate the necessity for layer-wise pretraining, we introduce a joint
learning framework to minimize the unified clustering and reconstruction loss
functions together and train all network layers simultaneously. Experimental
results indicate the superiority and faster running time of DEPICT in
real-world clustering tasks, where no labeled data is available for
hyper-parameter tuning
Bayesian Distance Clustering
Model-based clustering is widely-used in a variety of application areas.
However, fundamental concerns remain about robustness. In particular, results
can be sensitive to the choice of kernel representing the within-cluster data
density. Leveraging on properties of pairwise differences between data points,
we propose a class of Bayesian distance clustering methods, which rely on
modeling the likelihood of the pairwise distances in place of the original
data. Although some information in the data is discarded, we gain substantial
robustness to modeling assumptions. The proposed approach represents an
appealing middle ground between distance- and model-based clustering, drawing
advantages from each of these canonical approaches. We illustrate dramatic
gains in the ability to infer clusters that are not well represented by the
usual choices of kernel. A simulation study is included to assess performance
relative to competitors, and we apply the approach to clustering of brain
genome expression data.
Keywords: Distance-based clustering; Mixture model; Model-based clustering;
Model misspecification; Pairwise distance matrix; Partial likelihood;
Robustness
- …