149,951 research outputs found
Local Hypergraph Clustering using Capacity Releasing Diffusion
Local graph clustering is an important machine learning task that aims to
find a well-connected cluster near a set of seed nodes. Recent results have
revealed that incorporating higher order information significantly enhances the
results of graph clustering techniques. The majority of existing research in
this area focuses on spectral graph theory-based techniques. However, an
alternative perspective on local graph clustering arises from using max-flow
and min-cut on the objectives, which offer distinctly different guarantees. For
instance, a new method called capacity releasing diffusion (CRD) was recently
proposed and shown to preserve local structure around the seeds better than
spectral methods. The method was also the first local clustering technique that
is not subject to the quadratic Cheeger inequality by assuming a good cluster
near the seed nodes. In this paper, we propose a local hypergraph clustering
technique called hypergraph CRD (HG-CRD) by extending the CRD process to
cluster based on higher order patterns, encoded as hyperedges of a hypergraph.
Moreover, we theoretically show that HG-CRD gives results about a quantity
called motif conductance, rather than a biased version used in previous
experiments. Experimental results on synthetic datasets and real world graphs
show that HG-CRD enhances the clustering quality.Comment: 18 pages, 6 figure
Partitioning into Expanders
Let G=(V,E) be an undirected graph, lambda_k be the k-th smallest eigenvalue
of the normalized laplacian matrix of G. There is a basic fact in algebraic
graph theory that lambda_k > 0 if and only if G has at most k-1 connected
components. We prove a robust version of this fact. If lambda_k>0, then for
some 1\leq \ell\leq k-1, V can be {\em partitioned} into l sets P_1,\ldots,P_l
such that each P_i is a low-conductance set in G and induces a high conductance
induced subgraph. In particular, \phi(P_i)=O(l^3\sqrt{\lambda_l}) and
\phi(G[P_i]) >= \lambda_k/k^2).
We make our results algorithmic by designing a simple polynomial time
spectral algorithm to find such partitioning of G with a quadratic loss in the
inside conductance of P_i's. Unlike the recent results on higher order
Cheeger's inequality [LOT12,LRTV12], our algorithmic results do not use higher
order eigenfunctions of G. If there is a sufficiently large gap between
lambda_k and lambda_{k+1}, more precisely, if \lambda_{k+1} >= \poly(k)
lambda_{k}^{1/4} then our algorithm finds a k partitioning of V into sets
P_1,...,P_k such that the induced subgraph G[P_i] has a significantly larger
conductance than the conductance of P_i in G. Such a partitioning may represent
the best k clustering of G. Our algorithm is a simple local search that only
uses the Spectral Partitioning algorithm as a subroutine. We expect to see
further applications of this simple algorithm in clustering applications
Graph Theoretical Analysis of local ultraluminous infrared galaxies and quasars
We present a methodological framework for studying galaxy evolution by
utilizing Graph Theory and network analysis tools. We study the evolutionary
processes of local ultraluminous infrared galaxies (ULIRGs) and quasars and the
underlying physical processes, such as star formation and active galactic
nucleus (AGN) activity, through the application of Graph Theoretical analysis
tools. We extract, process and analyse mid-infrared spectra of local (z < 0.4)
ULIRGs and quasars between 5-38 microns through internally developed Python
routines, in order to generate similarity graphs, with the nodes representing
ULIRGs being grouped together based on the similarity of their spectra.
Additionally, we extract and compare physical features from the mid-IR spectra,
such as the polycyclic aromatic hydrocarbons (PAHs) emission and silicate depth
absorption features, as indicators of the presence of star-forming regions and
obscuring dust, in order to understand the underlying physical mechanisms of
each evolutionary stage of ULIRGs. Our analysis identifies five groups of local
ULIRGs based on their mid-IR spectra, which is quite consistent with the well
established fork classification diagram by providing a higher level
classification. We demonstrate how graph clustering algorithms and network
analysis tools can be utilized as unsupervised learning techniques for
revealing direct or indirect relations between various galaxy properties and
evolutionary stages, which provides an alternative methodology to previous
works for classification in galaxy evolution. Additionally, our methodology
compares the output of several graph clustering algorithms in order to
demonstrate the best-performing Graph Theoretical tools for studying galaxy
evolution.Comment: Accepted for publication in Astronomy and Computin
Topological Graph Signal Compression
Recently emerged Topological Deep Learning (TDL) methods aim to extend
current Graph Neural Networks (GNN) by naturally processing higher-order
interactions, going beyond the pairwise relations and local neighborhoods
defined by graph representations. In this paper we propose a novel TDL-based
method for compressing signals over graphs, consisting in two main steps:
first, disjoint sets of higher-order structures are inferred based on the
original signal --by clustering datapoints into collections; then,
a topological-inspired message passing gets a compressed representation of the
signal within those multi-element sets. Our results show that our framework
improves both standard GNN and feed-forward architectures in compressing
temporal link-based signals from two real-word Internet Service Provider
Networks' datasets --from up to better reconstruction errors
across all evaluation scenarios--, suggesting that it better captures and
exploits spatial and temporal correlations over the whole graph-based network
structure.Comment: 9 pages, 5 figures, 2 table
God (), the first small world network
In this paper, the approach of network mapping of words in literary texts is
extended to ''textual factors'': the network nodes are defined as ''concepts'';
the links are ''community connexions''. Thereafter, the text network properties
are investigated along modern statistical physics approaches of networks,
thereby relating network topology and algebraic properties, to literary texts
contents. As a practical illustration, the first chapter of the Genesis in the
Bible is mapped into a 10 node network, as in the Kabbalah approach, mentioning
God (). The characteristics of the network are studied starting
from its adjacency matrix, and the corresponding Laplacian matrix. Triplets of
nodes are particularly examined in order to emphasize the ''textual (community)
connexions'' of each agent "emanation", through the so called clustering
coefficients and the overlap index, whence measuring the ''semantic flow''
between the different nodes. It is concluded that this graph is a small-world
network, weakly dis-assortative, because its average local clustering
coefficient is significantly higher than a random graph constructed on the same
vertex set.Comment: 1 figure, 3 Tables, 69 references. arXiv admin note: text overlap
with arXiv:1004.524
A Graph-based Approach for Higher Order Gis Topological Analysis
Retrieving structured information from an initial random collection of objects may be carried out by understanding the spatial
arrangement between them, assuming no prior knowledge about those objects. As far as topology is concerned, contemporary
desktop GIS packages do not generally support further analysis beyond adjacency. Thus, one of the original motivations of this work
was to develop new ideas for scene analysis by building up a graph-based technique for better interpretation and understanding of
spatial relationships between GIS vector-based objects beyond its first level of adjacency; the final aim is the performance of some
kind of local feature organization into a more meaningful global scene by using graph theory. As the example scenario, a LiDAR
data set is being used to test the technique that we plan to develop and implement. After the generation of the respective TIN, two
different binary classifications were applied to the TIN facets (based on two different slope thresholds) and TIN facets have been
aggregated into homogeneous polygons according to their slope characteristics. A graph-based clustering procedure inside these
polygonal regions, by establishing a neighbourhood graph, followed by the delineation of cluster shapes and the derivation of cluster
characteristics in order to obtain higher level geographic entities information (regarding sets of buildings, vegetation areas, and say,
land-use parcels) is object of further work. The results we are expecting to obtain might be useful to support land-use mapping,
image understanding or, generally speaking, to support clustering analysis and generalization processes
- …