66,732 research outputs found
Sampling random graph homomorphisms and applications to network data analysis
A graph homomorphism is a map between two graphs that preserves adjacency
relations. We consider the problem of sampling a random graph homomorphism from
a graph into a large network . We propose two complementary
MCMC algorithms for sampling a random graph homomorphisms and establish bounds
on their mixing times and concentration of their time averages. Based on our
sampling algorithms, we propose a novel framework for network data analysis
that circumvents some of the drawbacks in methods based on independent and
neigborhood sampling. Various time averages of the MCMC trajectory give us
various computable observables, including well-known ones such as homomorphism
density and average clustering coefficient and their generalizations.
Furthermore, we show that these network observables are stable with respect to
a suitably renormalized cut distance between networks. We provide various
examples and simulations demonstrating our framework through synthetic
networks. We also apply our framework for network clustering and classification
problems using the Facebook100 dataset and Word Adjacency Networks of a set of
classic novels.Comment: 51 pages, 33 figures, 2 table
Action recognition in video using a spatial-temporal graph-based feature representation
We propose a video graph based human action recognition
framework. Given an input video sequence, we extract
spatio-temporal local features and construct a video graph to incorporate appearance and motion constraints to reflect the spatio-temporal dependencies among features. them. In particular, we extend a popular dbscan density-based clustering algorithm to form an intuitive video graph. During training, we estimate a linear SVM classifier using the standard Bag-of-words method. During classification, we apply Graph-Cut optimization to find the most frequent action label in the constructed graph and assign this label to the test video sequence. The proposed approach achieves stateof-the-art performance with standard human action recognition benchmarks, namely KTH and UCF-sports datasets and competitive results for the Hollywood (HOHA) dataset
Graph Partitioning using Quantum Annealing on the D-Wave System
In this work, we explore graph partitioning (GP) using quantum annealing on
the D-Wave 2X machine. Motivated by a recently proposed graph-based electronic
structure theory applied to quantum molecular dynamics (QMD) simulations, graph
partitioning is used for reducing the calculation of the density matrix into
smaller subsystems rendering the calculation more computationally efficient.
Unconstrained graph partitioning as community clustering based on the
modularity metric can be naturally mapped into the Hamiltonian of the quantum
annealer. On the other hand, when constraints are imposed for partitioning into
equal parts and minimizing the number of cut edges between parts, a quadratic
unconstrained binary optimization (QUBO) reformulation is required. This
reformulation may employ the graph complement to fit the problem in the Chimera
graph of the quantum annealer. Partitioning into 2 parts, 2^N parts
recursively, and k parts concurrently are demonstrated with benchmark graphs,
random graphs, and small material system density matrix based graphs. Results
for graph partitioning using quantum and hybrid classical-quantum approaches
are shown to equal or out-perform current "state of the art" methods
Projection methods for clustering and semi-supervised classification
This thesis focuses on data projection methods for the purposes of clustering and semi-supervised classification, with a primary focus on clustering. A number of contributions are presented which address this problem in a principled manner; using projection pursuit formulations to identify subspaces which contain useful information for the clustering task. Projection methods are extremely useful in high dimensional applications, and situations in which the data contain irrelevant dimensions which can be counterinformative for the clustering task. The final contribution addresses high dimensionality in the context of a data stream. Data streams and high dimensionality have been identified as two of the key challenges in data clustering. The first piece of work is motivated by identifying the minimum density hyperplane separator in the finite sample setting. This objective is directly related to the problem of discovering clusters defined as connected regions of high data density, which is a widely adopted definition in non-parametric statistics and machine learning. A thorough investigation into the theoretical aspects of this method, as well as the practical task of solving the associated optimisation problem efficiently is presented. The proposed methodology is applied to both clustering and semi-supervised classification problems, and is shown to reliably find low density hyperplane separators in both contexts. The second and third contributions focus on a different approach to clustering based on graph cuts. The minimum normalised graph cut objective has gained considerable attention as relaxations of the objective have been developed, which make them solvable for reasonably well sized problems. This has been adopted by the highly popular spectral clustering methods. The second piece of work focuses on identifying the optimal subspace in which to perform spectral clustering, by minimising the second eigenvalue of the graph Laplacian for a graph defined over the data within that subspace. A rigorous treatment of this objective is presented, and an algorithm is proposed for its optimisation. An approximation method is proposed which allows this method to be applied to much larger problems than would otherwise be possible. An extension of this work deals with the spectral projection pursuit method for semi-supervised classification. iii The third body of work looks at minimising the normalised graph cut using hyperplane separators. This formulation allows for the exact normalised cut to be computed, rather than the spectral relaxation. It also allows for a computationally efficient method for optimisation. The asymptotic properties of the normalised cut based on a hyperplane separator are investigated, and shown to have similarities with the clustering objective based on low density separation. In fact, both the methods in the second and third works are shown to be connected with the first, in that all three have the same solution asymptotically, as their relative scaling parameters are reduced to zero. The final body of work addresses both problems of high dimensionality and incremental clustering in a data stream context. A principled statistical framework is adopted, in which clustering by low density separation again becomes the focal objective. A divisive hierarchical clustering model is proposed, using a collection of low density hyperplanes. The adopted framework provides well founded methodology for determining the number of clusters automatically, and also identifying changes in the data stream which are relevant to the clustering objective. It is apparent that no existing methods can make both of these claims
Spectral Clustering with Imbalanced Data
Spectral clustering is sensitive to how graphs are constructed from data
particularly when proximal and imbalanced clusters are present. We show that
Ratio-Cut (RCut) or normalized cut (NCut) objectives are not tailored to
imbalanced data since they tend to emphasize cut sizes over cut values. We
propose a graph partitioning problem that seeks minimum cut partitions under
minimum size constraints on partitions to deal with imbalanced data. Our
approach parameterizes a family of graphs, by adaptively modulating node
degrees on a fixed node set, to yield a set of parameter dependent cuts
reflecting varying levels of imbalance. The solution to our problem is then
obtained by optimizing over these parameters. We present rigorous limit cut
analysis results to justify our approach. We demonstrate the superiority of our
method through unsupervised and semi-supervised experiments on synthetic and
real data sets.Comment: 24 pages, 7 figures. arXiv admin note: substantial text overlap with
arXiv:1302.513
Clustering and Community Detection with Imbalanced Clusters
Spectral clustering methods which are frequently used in clustering and
community detection applications are sensitive to the specific graph
constructions particularly when imbalanced clusters are present. We show that
ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to
imbalanced cluster sizes since they tend to emphasize cut sizes over cut
values. We propose a graph partitioning problem that seeks minimum cut
partitions under minimum size constraints on partitions to deal with imbalanced
cluster sizes. Our approach parameterizes a family of graphs by adaptively
modulating node degrees on a fixed node set, yielding a set of parameter
dependent cuts reflecting varying levels of imbalance. The solution to our
problem is then obtained by optimizing over these parameters. We present
rigorous limit cut analysis results to justify our approach and demonstrate the
superiority of our method through experiments on synthetic and real datasets
for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted
to IEEE TSIP
- …