31,899 research outputs found
Spectral Clustering with Imbalanced Data
Spectral clustering is sensitive to how graphs are constructed from data
particularly when proximal and imbalanced clusters are present. We show that
Ratio-Cut (RCut) or normalized cut (NCut) objectives are not tailored to
imbalanced data since they tend to emphasize cut sizes over cut values. We
propose a graph partitioning problem that seeks minimum cut partitions under
minimum size constraints on partitions to deal with imbalanced data. Our
approach parameterizes a family of graphs, by adaptively modulating node
degrees on a fixed node set, to yield a set of parameter dependent cuts
reflecting varying levels of imbalance. The solution to our problem is then
obtained by optimizing over these parameters. We present rigorous limit cut
analysis results to justify our approach. We demonstrate the superiority of our
method through unsupervised and semi-supervised experiments on synthetic and
real data sets.Comment: 24 pages, 7 figures. arXiv admin note: substantial text overlap with
arXiv:1302.513
Clustering and Community Detection with Imbalanced Clusters
Spectral clustering methods which are frequently used in clustering and
community detection applications are sensitive to the specific graph
constructions particularly when imbalanced clusters are present. We show that
ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to
imbalanced cluster sizes since they tend to emphasize cut sizes over cut
values. We propose a graph partitioning problem that seeks minimum cut
partitions under minimum size constraints on partitions to deal with imbalanced
cluster sizes. Our approach parameterizes a family of graphs by adaptively
modulating node degrees on a fixed node set, yielding a set of parameter
dependent cuts reflecting varying levels of imbalance. The solution to our
problem is then obtained by optimizing over these parameters. We present
rigorous limit cut analysis results to justify our approach and demonstrate the
superiority of our method through experiments on synthetic and real datasets
for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted
to IEEE TSIP
Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks
The stochastic block model (SBM) is a flexible probabilistic tool that can be
used to model interactions between clusters of nodes in a network. However, it
does not account for interactions of time varying intensity between clusters.
The extension of the SBM developed in this paper addresses this shortcoming
through a temporal partition: assuming interactions between nodes are recorded
on fixed-length time intervals, the inference procedure associated with the
model we propose allows to cluster simultaneously the nodes of the network and
the time intervals. The number of clusters of nodes and of time intervals, as
well as the memberships to clusters, are obtained by maximizing an exact
integrated complete-data likelihood, relying on a greedy search approach.
Experiments on simulated and real data are carried out in order to assess the
proposed methodology
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
Hierarchical Subquery Evaluation for Active Learning on a Graph
To train good supervised and semi-supervised object classifiers, it is
critical that we not waste the time of the human experts who are providing the
training labels. Existing active learning strategies can have uneven
performance, being efficient on some datasets but wasteful on others, or
inconsistent just between runs on the same dataset. We propose perplexity based
graph construction and a new hierarchical subquery evaluation algorithm to
combat this variability, and to release the potential of Expected Error
Reduction.
Under some specific circumstances, Expected Error Reduction has been one of
the strongest-performing informativeness criteria for active learning. Until
now, it has also been prohibitively costly to compute for sizeable datasets. We
demonstrate our highly practical algorithm, comparing it to other active
learning measures on classification datasets that vary in sparsity,
dimensionality, and size. Our algorithm is consistent over multiple runs and
achieves high accuracy, while querying the human expert for labels at a
frequency that matches their desired time budget.Comment: CVPR 201
- …