375,451 research outputs found
Convex Graph Invariant Relaxations For Graph Edit Distance
The edit distance between two graphs is a widely used measure of similarity
that evaluates the smallest number of vertex and edge deletions/insertions
required to transform one graph to another. It is NP-hard to compute in
general, and a large number of heuristics have been proposed for approximating
this quantity. With few exceptions, these methods generally provide upper
bounds on the edit distance between two graphs. In this paper, we propose a new
family of computationally tractable convex relaxations for obtaining lower
bounds on graph edit distance. These relaxations can be tailored to the
structural properties of the particular graphs via convex graph invariants.
Specific examples that we highlight in this paper include constraints on the
graph spectrum as well as (tractable approximations of) the stability number
and the maximum-cut values of graphs. We prove under suitable conditions that
our relaxations are tight (i.e., exactly compute the graph edit distance) when
one of the graphs consists of few eigenvalues. We also validate the utility of
our framework on synthetic problems as well as real applications involving
molecular structure comparison problems in chemistry.Comment: 27 pages, 7 figure
Discriminative Distance-Based Network Indices with Application to Link Prediction
In large networks, using the length of shortest paths as the distance measure
has shortcomings. A well-studied shortcoming is that extending it to
disconnected graphs and directed graphs is controversial. The second
shortcoming is that a huge number of vertices may have exactly the same score.
The third shortcoming is that in many applications, the distance between two
vertices not only depends on the length of shortest paths, but also on the
number of shortest paths. In this paper, first we develop a new distance
measure between vertices of a graph that yields discriminative distance-based
centrality indices. This measure is proportional to the length of shortest
paths and inversely proportional to the number of shortest paths. We present
algorithms for exact computation of the proposed discriminative indices.
Second, we develop randomized algorithms that precisely estimate average
discriminative path length and average discriminative eccentricity and show
that they give -approximations of these indices. Third, we
perform extensive experiments over several real-world networks from different
domains. In our experiments, we first show that compared to the traditional
indices, discriminative indices have usually much more discriminability. Then,
we show that our randomized algorithms can very precisely estimate average
discriminative path length and average discriminative eccentricity, using only
few samples. Then, we show that real-world networks have usually a tiny average
discriminative path length, bounded by a constant (e.g., 2). Fourth, in order
to better motivate the usefulness of our proposed distance measure, we present
a novel link prediction method, that uses discriminative distance to decide
which vertices are more likely to form a link in future, and show its superior
performance compared to the well-known existing measures
Catching the head, tail, and everything in between: a streaming algorithm for the degree distribution
The degree distribution is one of the most fundamental graph properties of
interest for real-world graphs. It has been widely observed in numerous domains
that graphs typically have a tailed or scale-free degree distribution. While
the average degree is usually quite small, the variance is quite high and there
are vertices with degrees at all scales. We focus on the problem of
approximating the degree distribution of a large streaming graph, with small
storage. We design an algorithm headtail, whose main novelty is a new estimator
of infrequent degrees using truncated geometric random variables. We give a
mathematical analysis of headtail and show that it has excellent behavior in
practice. We can process streams will millions of edges with storage less than
1% and get extremely accurate approximations for all scales in the degree
distribution.
We also introduce a new notion of Relative Hausdorff distance between tailed
histograms. Existing notions of distances between distributions are not
suitable, since they ignore infrequent degrees in the tail. The Relative
Hausdorff distance measures deviations at all scales, and is a more suitable
distance for comparing degree distributions. By tracking this new measure, we
are able to give strong empirical evidence of the convergence of headtail
A New Approach to Measuring Distances in Dense Graphs
The problem of computing distances and shortest paths between vertices in graphs is one of the fundamental issues in graph theory. It is of great importance in many different applications, for example, transportation, and social network analysis. However, efficient shortest distance algorithms are still desired in many disciplines. Basically, the majority of dense graphs have ties between the shortest distances. Therefore, we consider a different approach and introduce a new measure to solve all-pairs shortest paths for undirected and unweighted graphs. This measures the shortest distance between any two vertices by considering the length and the number of all possible paths between them. The main aim of this new approach is to break the ties between equal shortest paths SP, which can be obtained by the Breadth-first search algorithm (BFS), and distinguish meaningfully between these equal distances. Moreover, using the new measure in clustering produces higher quality results compared with SP. In our study, we apply two different clustering techniques: hierarchical clustering and K-means clustering, with four different graph models, and for a various number of clusters. We compare the results using a modularity function to check the quality of our clustering results
Systematic Topology Analysis and Generation Using Degree Correlations
We present a new, systematic approach for analyzing network topologies. We
first introduce the dK-series of probability distributions specifying all
degree correlations within d-sized subgraphs of a given graph G. Increasing
values of d capture progressively more properties of G at the cost of more
complex representation of the probability distribution. Using this series, we
can quantitatively measure the distance between two graphs and construct random
graphs that accurately reproduce virtually all metrics proposed in the
literature. The nature of the dK-series implies that it will also capture any
future metrics that may be proposed. Using our approach, we construct graphs
for d=0,1,2,3 and demonstrate that these graphs reproduce, with increasing
accuracy, important properties of measured and modeled Internet topologies. We
find that the d=2 case is sufficient for most practical purposes, while d=3
essentially reconstructs the Internet AS- and router-level topologies exactly.
We hope that a systematic method to analyze and synthesize topologies offers a
significant improvement to the set of tools available to network topology and
protocol researchers.Comment: Final versio
- …