2,874 research outputs found
Little Ball of Fur: A Python Library for Graph Sampling
Sampling graphs is an important task in data mining. In this paper, we
describe Little Ball of Fur a Python library that includes more than twenty
graph sampling algorithms. Our goal is to make node, edge, and
exploration-based network sampling techniques accessible to a large number of
professionals, researchers, and students in a single streamlined framework. We
created this framework with a focus on a coherent application public interface
which has a convenient design, generic input data requirements, and reasonable
baseline settings of algorithms. Here we overview these design foundations of
the framework in detail with illustrative code snippets. We show the practical
usability of the library by estimating various global statistics of social
networks and web graphs. Experiments demonstrate that Little Ball of Fur can
speed up node and whole graph embedding techniques considerably with mildly
deteriorating the predictive value of distilled features.Comment: Code is available here:
https://github.com/benedekrozemberczki/littleballoffu
On Spectral Graph Embedding: A Non-Backtracking Perspective and Graph Approximation
Graph embedding has been proven to be efficient and effective in facilitating
graph analysis. In this paper, we present a novel spectral framework called
NOn-Backtracking Embedding (NOBE), which offers a new perspective that
organizes graph data at a deep level by tracking the flow traversing on the
edges with backtracking prohibited. Further, by analyzing the non-backtracking
process, a technique called graph approximation is devised, which provides a
channel to transform the spectral decomposition on an edge-to-edge matrix to
that on a node-to-node matrix. Theoretical guarantees are provided by bounding
the difference between the corresponding eigenvalues of the original graph and
its graph approximation. Extensive experiments conducted on various real-world
networks demonstrate the efficacy of our methods on both macroscopic and
microscopic levels, including clustering and structural hole spanner detection.Comment: SDM 2018 (Full version including all proofs
Sampling on networks: estimating spectral centrality measures and their impact in evaluating other relevant network measures
We perform an extensive analysis of how sampling impacts the estimate of
several relevant network measures.
In particular, we focus on how a sampling strategy optimized to recover a
particular spectral centrality measure impacts other topological quantities.
Our goal is on one hand to extend the analysis of the behavior of TCEC
[Ruggeri2019], a theoretically-grounded sampling method for eigenvector
centrality estimation.
On the other hand, to demonstrate more broadly how sampling can impact the
estimation of relevant network properties like centrality measures different
than the one aimed at optimizing, community structure and node attribute
distribution.
Finally, we adapt the theoretical framework behind TCEC for the case of
PageRank centrality and propose a sampling algorithm aimed at optimizing its
estimation. We show that, while the theoretical derivation can be suitably
adapted to cover this case, the resulting algorithm suffers of a high
computational complexity that requires further approximations compared to the
eigenvector centrality case.Comment: 8 pages, 5 figure
Unsupervised Structural Embedding Methods for Efficient Collective Network Mining
How can we align accounts of the same user across social networks? Can we identify the professional role of an email user from their patterns of communication? Can we predict the medical effects of chemical compounds from their atomic network structure? Many problems in graph data mining, including all of the above, are defined on multiple networks. The central element to all of these problems is cross-network comparison, whether at the level of individual nodes or entities in the network or at the level of entire networks themselves. To perform this comparison meaningfully, we must describe the entities in each network expressively in terms of patterns that generalize across the networks. Moreover, because the networks in question are often very large, our techniques must be computationally efficient.
In this thesis, we propose scalable unsupervised methods that embed nodes in vector space by mapping nodes with similar structural roles in their respective networks, even if they come from different networks, to similar parts of the embedding space. We perform network alignment by matching nodes across two or more networks based on the similarity of their embeddings, and refine this process by reinforcing the consistency of each node’s alignment with those of its neighbors. By characterizing the distribution of node embeddings in a graph, we develop graph-level feature vectors that are highly effective for graph classification. With principled sparsification and randomized approximation techniques, we make all our methods computationally efficient and able to scale to graphs with millions of nodes or edges. We demonstrate the effectiveness of structural node embeddings on industry-scale applications, and propose an extensive set of embedding evaluation techniques that lay the groundwork for further methodological development and application.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162895/1/mheimann_1.pd
CORECLUSTER: A Degeneracy Based Graph Clustering Framework
International audienceGraph clustering or community detection constitutes an important task forinvestigating the internal structure of graphs, with a plethora of applications in several domains. Traditional tools for graph clustering, such asspectral methods, typically suffer from high time and space complexity. In thisarticle, we present \textsc{CoreCluster}, an efficient graph clusteringframework based on the concept of graph degeneracy, that can be used along withany known graph clustering algorithm. Our approach capitalizes on processing thegraph in a hierarchical manner provided by its core expansion sequence, anordered partition of the graph into different levels according to the -coredecomposition. Such a partition provides a way to process the graph inan incremental manner that preserves its clustering structure, whilemaking the execution of the chosen clustering algorithm much faster due to thesmaller size of the graph's partitions onto which the algorithm operates
Fast Detection of Community Structures using Graph Traversal in Social Networks
Finding community structures in social networks is considered to be a
challenging task as many of the proposed algorithms are computationally
expensive and does not scale well for large graphs. Most of the community
detection algorithms proposed till date are unsuitable for applications that
would require detection of communities in real-time, especially for massive
networks. The Louvain method, which uses modularity maximization to detect
clusters, is usually considered to be one of the fastest community detection
algorithms even without any provable bound on its running time. We propose a
novel graph traversal-based community detection framework, which not only runs
faster than the Louvain method but also generates clusters of better quality
for most of the benchmark datasets. We show that our algorithms run in O(|V | +
|E|) time to create an initial cover before using modularity maximization to
get the final cover.
Keywords - community detection; Influenced Neighbor Score; brokers; community
nodes; communitiesComment: 29 pages, 9 tables, and 13 figures. Accepted in "Knowledge and
Information Systems", 201
- …