66,732 research outputs found

    Sampling random graph homomorphisms and applications to network data analysis

    Full text link
    A graph homomorphism is a map between two graphs that preserves adjacency relations. We consider the problem of sampling a random graph homomorphism from a graph FF into a large network G\mathcal{G}. We propose two complementary MCMC algorithms for sampling a random graph homomorphisms and establish bounds on their mixing times and concentration of their time averages. Based on our sampling algorithms, we propose a novel framework for network data analysis that circumvents some of the drawbacks in methods based on independent and neigborhood sampling. Various time averages of the MCMC trajectory give us various computable observables, including well-known ones such as homomorphism density and average clustering coefficient and their generalizations. Furthermore, we show that these network observables are stable with respect to a suitably renormalized cut distance between networks. We provide various examples and simulations demonstrating our framework through synthetic networks. We also apply our framework for network clustering and classification problems using the Facebook100 dataset and Word Adjacency Networks of a set of classic novels.Comment: 51 pages, 33 figures, 2 table

    Action recognition in video using a spatial-temporal graph-based feature representation

    Get PDF
    We propose a video graph based human action recognition framework. Given an input video sequence, we extract spatio-temporal local features and construct a video graph to incorporate appearance and motion constraints to reflect the spatio-temporal dependencies among features. them. In particular, we extend a popular dbscan density-based clustering algorithm to form an intuitive video graph. During training, we estimate a linear SVM classifier using the standard Bag-of-words method. During classification, we apply Graph-Cut optimization to find the most frequent action label in the constructed graph and assign this label to the test video sequence. The proposed approach achieves stateof-the-art performance with standard human action recognition benchmarks, namely KTH and UCF-sports datasets and competitive results for the Hollywood (HOHA) dataset

    Graph Partitioning using Quantum Annealing on the D-Wave System

    Full text link
    In this work, we explore graph partitioning (GP) using quantum annealing on the D-Wave 2X machine. Motivated by a recently proposed graph-based electronic structure theory applied to quantum molecular dynamics (QMD) simulations, graph partitioning is used for reducing the calculation of the density matrix into smaller subsystems rendering the calculation more computationally efficient. Unconstrained graph partitioning as community clustering based on the modularity metric can be naturally mapped into the Hamiltonian of the quantum annealer. On the other hand, when constraints are imposed for partitioning into equal parts and minimizing the number of cut edges between parts, a quadratic unconstrained binary optimization (QUBO) reformulation is required. This reformulation may employ the graph complement to fit the problem in the Chimera graph of the quantum annealer. Partitioning into 2 parts, 2^N parts recursively, and k parts concurrently are demonstrated with benchmark graphs, random graphs, and small material system density matrix based graphs. Results for graph partitioning using quantum and hybrid classical-quantum approaches are shown to equal or out-perform current "state of the art" methods

    Projection methods for clustering and semi-supervised classification

    Get PDF
    This thesis focuses on data projection methods for the purposes of clustering and semi-supervised classification, with a primary focus on clustering. A number of contributions are presented which address this problem in a principled manner; using projection pursuit formulations to identify subspaces which contain useful information for the clustering task. Projection methods are extremely useful in high dimensional applications, and situations in which the data contain irrelevant dimensions which can be counterinformative for the clustering task. The final contribution addresses high dimensionality in the context of a data stream. Data streams and high dimensionality have been identified as two of the key challenges in data clustering. The first piece of work is motivated by identifying the minimum density hyperplane separator in the finite sample setting. This objective is directly related to the problem of discovering clusters defined as connected regions of high data density, which is a widely adopted definition in non-parametric statistics and machine learning. A thorough investigation into the theoretical aspects of this method, as well as the practical task of solving the associated optimisation problem efficiently is presented. The proposed methodology is applied to both clustering and semi-supervised classification problems, and is shown to reliably find low density hyperplane separators in both contexts. The second and third contributions focus on a different approach to clustering based on graph cuts. The minimum normalised graph cut objective has gained considerable attention as relaxations of the objective have been developed, which make them solvable for reasonably well sized problems. This has been adopted by the highly popular spectral clustering methods. The second piece of work focuses on identifying the optimal subspace in which to perform spectral clustering, by minimising the second eigenvalue of the graph Laplacian for a graph defined over the data within that subspace. A rigorous treatment of this objective is presented, and an algorithm is proposed for its optimisation. An approximation method is proposed which allows this method to be applied to much larger problems than would otherwise be possible. An extension of this work deals with the spectral projection pursuit method for semi-supervised classification. iii The third body of work looks at minimising the normalised graph cut using hyperplane separators. This formulation allows for the exact normalised cut to be computed, rather than the spectral relaxation. It also allows for a computationally efficient method for optimisation. The asymptotic properties of the normalised cut based on a hyperplane separator are investigated, and shown to have similarities with the clustering objective based on low density separation. In fact, both the methods in the second and third works are shown to be connected with the first, in that all three have the same solution asymptotically, as their relative scaling parameters are reduced to zero. The final body of work addresses both problems of high dimensionality and incremental clustering in a data stream context. A principled statistical framework is adopted, in which clustering by low density separation again becomes the focal objective. A divisive hierarchical clustering model is proposed, using a collection of low density hyperplanes. The adopted framework provides well founded methodology for determining the number of clusters automatically, and also identifying changes in the data stream which are relevant to the clustering objective. It is apparent that no existing methods can make both of these claims

    Spectral Clustering with Imbalanced Data

    Full text link
    Spectral clustering is sensitive to how graphs are constructed from data particularly when proximal and imbalanced clusters are present. We show that Ratio-Cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced data since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced data. Our approach parameterizes a family of graphs, by adaptively modulating node degrees on a fixed node set, to yield a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach. We demonstrate the superiority of our method through unsupervised and semi-supervised experiments on synthetic and real data sets.Comment: 24 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1302.513

    Clustering and Community Detection with Imbalanced Clusters

    Full text link
    Spectral clustering methods which are frequently used in clustering and community detection applications are sensitive to the specific graph constructions particularly when imbalanced clusters are present. We show that ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced cluster sizes since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced cluster sizes. Our approach parameterizes a family of graphs by adaptively modulating node degrees on a fixed node set, yielding a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach and demonstrate the superiority of our method through experiments on synthetic and real datasets for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted to IEEE TSIP
    corecore