39,266 research outputs found
SimGNN: A Neural Network Approach to Fast Graph Similarity Computation
Graph similarity search is among the most important graph-based applications,
e.g. finding the chemical compounds that are most similar to a query compound.
Graph similarity computation, such as Graph Edit Distance (GED) and Maximum
Common Subgraph (MCS), is the core operation of graph similarity search and
many other applications, but very costly to compute in practice. Inspired by
the recent success of neural network approaches to several graph applications,
such as node or graph classification, we propose a novel neural network based
approach to address this classic yet challenging graph problem, aiming to
alleviate the computational burden while preserving a good performance.
The proposed approach, called SimGNN, combines two strategies. First, we
design a learnable embedding function that maps every graph into a vector,
which provides a global summary of a graph. A novel attention mechanism is
proposed to emphasize the important nodes with respect to a specific similarity
metric. Second, we design a pairwise node comparison method to supplement the
graph-level embeddings with fine-grained node-level information. Our model
achieves better generalization on unseen graphs, and in the worst case runs in
quadratic time with respect to the number of nodes in two graphs. Taking GED
computation as an example, experimental results on three real graph datasets
demonstrate the effectiveness and efficiency of our approach. Specifically, our
model achieves smaller error rate and great time reduction compared against a
series of baselines, including several approximation algorithms on GED
computation, and many existing graph neural network based models. To the best
of our knowledge, we are among the first to adopt neural networks to explicitly
model the similarity between two graphs, and provide a new direction for future
research on graph similarity computation and graph similarity search.Comment: WSDM 201
SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint
Domains such as scientific workflows and business processes exhibit data
models with complex relationships between objects. This relationship is
typically represented as sequences, where each data item is annotated with
multi-dimensional attributes. There is a need to analyze this data for
operational insights. For example, in business processes, users are interested
in clustering process traces into smaller subsets to discover less complex
process models. This requires expensive computation of similarity metrics
between sequence-based data. Related work on dimension reduction and embedding
methods do not take into account the multi-dimensional attributes of data, and
do not address the interpretability of data in the embedding space (i.e., by
favoring vector-based representation). In this work, we introduce Summarized, a
framework for efficient analysis on sequence-based multi-dimensional data using
intuitive and user-controlled summarizations. We introduce summarization
schemes that provide tunable trade-offs between the quality and efficiency of
analysis tasks and derive an error model for summary-based similarity under an
edit-distance constraint. Evaluations using real-world datasets show the
effectives of our framework
Graph edit distance : a new binary linear programming formulation
Graph edit distance (GED) is a powerful and flexible graph matching paradigm
that can be used to address different tasks in structural pattern recognition,
machine learning, and data mining. In this paper, some new binary linear
programming formulations for computing the exact GED between two graphs are
proposed. A major strength of the formulations lies in their genericity since
the GED can be computed between directed or undirected fully attributed graphs
(i.e. with attributes on both vertices and edges). Moreover, a relaxation of
the domain constraints in the formulations provides efficient lower bound
approximations of the GED. A complete experimental study comparing the proposed
formulations with 4 state-of-the-art algorithms for exact and approximate graph
edit distances is provided. By considering both the quality of the proposed
solution and the efficiency of the algorithms as performance criteria, the
results show that none of the compared methods dominates the others in the
Pareto sense. As a consequence, faced to a given real-world problem, a
trade-off between quality and efficiency has to be chosen w.r.t. the
application constraints. In this context, this paper provides a guide that can
be used to choose the appropriate method
Recognizing Cuneiform Signs Using Graph Based Methods
The cuneiform script constitutes one of the earliest systems of writing and
is realized by wedge-shaped marks on clay tablets. A tremendous number of
cuneiform tablets have already been discovered and are incrementally
digitalized and made available to automated processing. As reading cuneiform
script is still a manual task, we address the real-world application of
recognizing cuneiform signs by two graph based methods with complementary
runtime characteristics. We present a graph model for cuneiform signs together
with a tailored distance measure based on the concept of the graph edit
distance. We propose efficient heuristics for its computation and demonstrate
its effectiveness in classification tasks experimentally. To this end, the
distance measure is used to implement a nearest neighbor classifier leading to
a high computational cost for the prediction phase with increasing training set
size. In order to overcome this issue, we propose to use CNNs adapted to graphs
as an alternative approach shifting the computational cost to the training
phase. We demonstrate the practicability of both approaches in an extensive
experimental comparison regarding runtime and prediction accuracy. Although
currently available annotated real-world data is still limited, we obtain a
high accuracy using CNNs, in particular, when the training set is enriched by
augmented examples
Few Algorithms for ascertaining merit of a document and their applications
Existing models for ranking documents(mostly in world wide web) are prestige
based. In this article, three algorithms to objectively judge the merit of a
document are proposed - 1) Citation graph maxflow 2) Recursive Gloss Overlap
based intrinsic merit scoring and 3) Interview algorithm. A short discussion on
generic judgement and its mathematical treatment is presented in introduction
to motivate these algorithms.Comment: 32 page
Properties of the Sample Mean in Graph Spaces and the Majorize-Minimize-Mean Algorithm
One of the most fundamental concepts in statistics is the concept of sample
mean. Properties of the sample mean that are well-defined in Euclidean spaces
become unwieldy or even unclear in graph spaces. Open problems related to the
sample mean of graphs include: non-existence, non-uniqueness, statistical
inconsistency, lack of convergence results of mean algorithms, non-existence of
midpoints, and disparity to midpoints. We present conditions to resolve all six
problems and propose a Majorize-Minimize-Mean (MMM) Algorithm. Experiments on
graph datasets representing images and molecules show that the MMM-Algorithm
best approximates a sample mean of graphs compared to six other mean
algorithms
An interdisciplinary survey of network similarity methods
Comparative graph and network analysis play an important role in both systems
biology and pattern recognition, but existing surveys on the topic have
historically ignored or underserved one or the other of these fields. We
present an integrative introduction to the key objectives and methods of graph
and network comparison in each field, with the intent of remaining accessible
to relative novices in order to mitigate the barrier to interdisciplinary idea
crossover.
To guide our investigation, and to quantitatively justify our assertions
about what the key objectives and methods of each field are, we have
constructed a citation network containing 5,793 vertices from the full
reference lists of over two hundred relevant papers, which we collected by
searching Google Scholar for ten different network comparison-related search
terms. We investigate its basic statistics and community structure, and frame
our presentation around the papers found to have high importance according to
five different standard centrality measures
Unsupervised Inductive Graph-Level Representation Learning via Graph-Graph Proximity
We introduce a novel approach to graph-level representation learning, which
is to embed an entire graph into a vector space where the embeddings of two
graphs preserve their graph-graph proximity. Our approach, UGRAPHEMB, is a
general framework that provides a novel means to performing graph-level
embedding in a completely unsupervised and inductive manner. The learned neural
network can be considered as a function that receives any graph as input,
either seen or unseen in the training set, and transforms it into an embedding.
A novel graph-level embedding generation mechanism called Multi-Scale Node
Attention (MSNA), is proposed. Experiments on five real graph datasets show
that UGRAPHEMB achieves competitive accuracy in the tasks of graph
classification, similarity ranking, and graph visualization.Comment: IJCAI 2019 camera ready version with supplementary materia
Convolutional Set Matching for Graph Similarity
We introduce GSimCNN (Graph Similarity Computation via Convolutional Neural
Networks) for predicting the similarity score between two graphs. As the core
operation of graph similarity search, pairwise graph similarity computation is
a challenging problem due to the NP-hard nature of computing many graph
distance/similarity metrics. We demonstrate our model using the Graph Edit
Distance (GED) as the example metric. Experiments on three real graph datasets
demonstrate that our model achieves the state-of-the-art performance on graph
similarity search.Comment: NIPS 2018 Workshop: Relational Representation Learning. Note:
Substantial text overlap with arXiv:1809.0444
Separating Structure from Noise in Large Graphs Using the Regularity Lemma
How can we separate structural information from noise in large graphs? To
address this fundamental question, we propose a graph summarization approach
based on Szemer\'edi's Regularity Lemma, a well-known result in graph theory,
which roughly states that every graph can be approximated by the union of a
small number of random-like bipartite graphs called `regular pairs'. Hence, the
Regularity Lemma provides us with a principled way to describe the essential
structure of large graphs using a small amount of data. Our paper has several
contributions: (i) We present our summarization algorithm which is able to
reveal the main structural patterns in large graphs. (ii) We discuss how to use
our summarization framework to efficiently retrieve from a database the top-k
graphs that are most similar to a query graph. (iii) Finally, we evaluate the
noise robustness of our approach in terms of the reconstruction error and the
usefulness of the summaries in addressing the graph search task
- …