149,951 research outputs found

    Local Hypergraph Clustering using Capacity Releasing Diffusion

    Full text link
    Local graph clustering is an important machine learning task that aims to find a well-connected cluster near a set of seed nodes. Recent results have revealed that incorporating higher order information significantly enhances the results of graph clustering techniques. The majority of existing research in this area focuses on spectral graph theory-based techniques. However, an alternative perspective on local graph clustering arises from using max-flow and min-cut on the objectives, which offer distinctly different guarantees. For instance, a new method called capacity releasing diffusion (CRD) was recently proposed and shown to preserve local structure around the seeds better than spectral methods. The method was also the first local clustering technique that is not subject to the quadratic Cheeger inequality by assuming a good cluster near the seed nodes. In this paper, we propose a local hypergraph clustering technique called hypergraph CRD (HG-CRD) by extending the CRD process to cluster based on higher order patterns, encoded as hyperedges of a hypergraph. Moreover, we theoretically show that HG-CRD gives results about a quantity called motif conductance, rather than a biased version used in previous experiments. Experimental results on synthetic datasets and real world graphs show that HG-CRD enhances the clustering quality.Comment: 18 pages, 6 figure

    Partitioning into Expanders

    Full text link
    Let G=(V,E) be an undirected graph, lambda_k be the k-th smallest eigenvalue of the normalized laplacian matrix of G. There is a basic fact in algebraic graph theory that lambda_k > 0 if and only if G has at most k-1 connected components. We prove a robust version of this fact. If lambda_k>0, then for some 1\leq \ell\leq k-1, V can be {\em partitioned} into l sets P_1,\ldots,P_l such that each P_i is a low-conductance set in G and induces a high conductance induced subgraph. In particular, \phi(P_i)=O(l^3\sqrt{\lambda_l}) and \phi(G[P_i]) >= \lambda_k/k^2). We make our results algorithmic by designing a simple polynomial time spectral algorithm to find such partitioning of G with a quadratic loss in the inside conductance of P_i's. Unlike the recent results on higher order Cheeger's inequality [LOT12,LRTV12], our algorithmic results do not use higher order eigenfunctions of G. If there is a sufficiently large gap between lambda_k and lambda_{k+1}, more precisely, if \lambda_{k+1} >= \poly(k) lambda_{k}^{1/4} then our algorithm finds a k partitioning of V into sets P_1,...,P_k such that the induced subgraph G[P_i] has a significantly larger conductance than the conductance of P_i in G. Such a partitioning may represent the best k clustering of G. Our algorithm is a simple local search that only uses the Spectral Partitioning algorithm as a subroutine. We expect to see further applications of this simple algorithm in clustering applications

    Graph Theoretical Analysis of local ultraluminous infrared galaxies and quasars

    Full text link
    We present a methodological framework for studying galaxy evolution by utilizing Graph Theory and network analysis tools. We study the evolutionary processes of local ultraluminous infrared galaxies (ULIRGs) and quasars and the underlying physical processes, such as star formation and active galactic nucleus (AGN) activity, through the application of Graph Theoretical analysis tools. We extract, process and analyse mid-infrared spectra of local (z < 0.4) ULIRGs and quasars between 5-38 microns through internally developed Python routines, in order to generate similarity graphs, with the nodes representing ULIRGs being grouped together based on the similarity of their spectra. Additionally, we extract and compare physical features from the mid-IR spectra, such as the polycyclic aromatic hydrocarbons (PAHs) emission and silicate depth absorption features, as indicators of the presence of star-forming regions and obscuring dust, in order to understand the underlying physical mechanisms of each evolutionary stage of ULIRGs. Our analysis identifies five groups of local ULIRGs based on their mid-IR spectra, which is quite consistent with the well established fork classification diagram by providing a higher level classification. We demonstrate how graph clustering algorithms and network analysis tools can be utilized as unsupervised learning techniques for revealing direct or indirect relations between various galaxy properties and evolutionary stages, which provides an alternative methodology to previous works for classification in galaxy evolution. Additionally, our methodology compares the output of several graph clustering algorithms in order to demonstrate the best-performing Graph Theoretical tools for studying galaxy evolution.Comment: Accepted for publication in Astronomy and Computin

    Topological Graph Signal Compression

    Full text link
    Recently emerged Topological Deep Learning (TDL) methods aim to extend current Graph Neural Networks (GNN) by naturally processing higher-order interactions, going beyond the pairwise relations and local neighborhoods defined by graph representations. In this paper we propose a novel TDL-based method for compressing signals over graphs, consisting in two main steps: first, disjoint sets of higher-order structures are inferred based on the original signal --by clustering NN datapoints into KNK\ll N collections; then, a topological-inspired message passing gets a compressed representation of the signal within those multi-element sets. Our results show that our framework improves both standard GNN and feed-forward architectures in compressing temporal link-based signals from two real-word Internet Service Provider Networks' datasets --from 30%30\% up to 90%90\% better reconstruction errors across all evaluation scenarios--, suggesting that it better captures and exploits spatial and temporal correlations over the whole graph-based network structure.Comment: 9 pages, 5 figures, 2 table

    God (Elohim\equiv Elohim), the first small world network

    Full text link
    In this paper, the approach of network mapping of words in literary texts is extended to ''textual factors'': the network nodes are defined as ''concepts''; the links are ''community connexions''. Thereafter, the text network properties are investigated along modern statistical physics approaches of networks, thereby relating network topology and algebraic properties, to literary texts contents. As a practical illustration, the first chapter of the Genesis in the Bible is mapped into a 10 node network, as in the Kabbalah approach, mentioning God (Elohim\equiv Elohim). The characteristics of the network are studied starting from its adjacency matrix, and the corresponding Laplacian matrix. Triplets of nodes are particularly examined in order to emphasize the ''textual (community) connexions'' of each agent "emanation", through the so called clustering coefficients and the overlap index, whence measuring the ''semantic flow'' between the different nodes. It is concluded that this graph is a small-world network, weakly dis-assortative, because its average local clustering coefficient is significantly higher than a random graph constructed on the same vertex set.Comment: 1 figure, 3 Tables, 69 references. arXiv admin note: text overlap with arXiv:1004.524

    A Graph-based Approach for Higher Order Gis Topological Analysis

    Get PDF
    Retrieving structured information from an initial random collection of objects may be carried out by understanding the spatial arrangement between them, assuming no prior knowledge about those objects. As far as topology is concerned, contemporary desktop GIS packages do not generally support further analysis beyond adjacency. Thus, one of the original motivations of this work was to develop new ideas for scene analysis by building up a graph-based technique for better interpretation and understanding of spatial relationships between GIS vector-based objects beyond its first level of adjacency; the final aim is the performance of some kind of local feature organization into a more meaningful global scene by using graph theory. As the example scenario, a LiDAR data set is being used to test the technique that we plan to develop and implement. After the generation of the respective TIN, two different binary classifications were applied to the TIN facets (based on two different slope thresholds) and TIN facets have been aggregated into homogeneous polygons according to their slope characteristics. A graph-based clustering procedure inside these polygonal regions, by establishing a neighbourhood graph, followed by the delineation of cluster shapes and the derivation of cluster characteristics in order to obtain higher level geographic entities information (regarding sets of buildings, vegetation areas, and say, land-use parcels) is object of further work. The results we are expecting to obtain might be useful to support land-use mapping, image understanding or, generally speaking, to support clustering analysis and generalization processes
    corecore