1,681 research outputs found
Graph Quantization
Vector quantization(VQ) is a lossy data compression technique from signal
processing, which is restricted to feature vectors and therefore inapplicable
for combinatorial structures. This contribution presents a theoretical
foundation of graph quantization (GQ) that extends VQ to the domain of
attributed graphs. We present the necessary Lloyd-Max conditions for optimality
of a graph quantizer and consistency results for optimal GQ design based on
empirical distortion measures and stochastic optimization. These results
statistically justify existing clustering algorithms in the domain of graphs.
The proposed approach provides a template of how to link structural pattern
recognition methods other than GQ to statistical pattern recognition.Comment: 24 pages; submitted to CVI
Properties of the Sample Mean in Graph Spaces and the Majorize-Minimize-Mean Algorithm
One of the most fundamental concepts in statistics is the concept of sample
mean. Properties of the sample mean that are well-defined in Euclidean spaces
become unwieldy or even unclear in graph spaces. Open problems related to the
sample mean of graphs include: non-existence, non-uniqueness, statistical
inconsistency, lack of convergence results of mean algorithms, non-existence of
midpoints, and disparity to midpoints. We present conditions to resolve all six
problems and propose a Majorize-Minimize-Mean (MMM) Algorithm. Experiments on
graph datasets representing images and molecules show that the MMM-Algorithm
best approximates a sample mean of graphs compared to six other mean
algorithms
Geometry of Graph Edit Distance Spaces
In this paper we study the geometry of graph spaces endowed with a special
class of graph edit distances. The focus is on geometrical results useful for
statistical pattern recognition. The main result is the Graph Representation
Theorem. It states that a graph is a point in some geometrical space, called
orbit space. Orbit spaces are well investigated and easier to explore than the
original graph space. We derive a number of geometrical results from the orbit
space representation, translate them to the graph space, and indicate their
significance and usefulness in statistical pattern recognition
An interdisciplinary survey of network similarity methods
Comparative graph and network analysis play an important role in both systems
biology and pattern recognition, but existing surveys on the topic have
historically ignored or underserved one or the other of these fields. We
present an integrative introduction to the key objectives and methods of graph
and network comparison in each field, with the intent of remaining accessible
to relative novices in order to mitigate the barrier to interdisciplinary idea
crossover.
To guide our investigation, and to quantitatively justify our assertions
about what the key objectives and methods of each field are, we have
constructed a citation network containing 5,793 vertices from the full
reference lists of over two hundred relevant papers, which we collected by
searching Google Scholar for ten different network comparison-related search
terms. We investigate its basic statistics and community structure, and frame
our presentation around the papers found to have high importance according to
five different standard centrality measures
Subgraph Similarity Search in Large Graphs
One of the major challenges in applications related to social networks,
computational biology, collaboration networks etc., is to efficiently search
for similar patterns in their underlying graphs. These graphs are typically
noisy and contain thousands of vertices and millions of edges. In many cases,
the graphs are unlabeled and the notion of similarity is also not well defined.
We study the problem of searching an induced subgraph in a large target graph
that is most similar to the given query graph. We assume that the query graph
and target graph are undirected and unlabeled. We use graphlet kernels
\cite{shervashidze2009efficient} to define graph similarity. Graphlet kernels
are known to perform better than other kernels in different applications.
Our algorithm maps topological neighborhood information of vertices in the
query and target graphs to vectors. These local topological informations are
then combined to find a target subgraph having highly similar global topology
with the given query graph. We tested our algorithm on several real world
networks such as facebook network, google plus network, youtube network, amazon
network etc. Most of them contain thousands of vertices and million edges. Our
algorithm is able to detect highly similar matches when queried in these
networks. Our multi-threaded implementation takes about one second to find the
match on a 32 core machine, excluding the time for one time preprocessing.
Computationally expensive parts of our algorithm can be further scaled to
standard parallel and distributed frameworks like map-reduce
Graph Kernels based on High Order Graphlet Parsing and Hashing
Graph-based methods are known to be successful in many machine learning and
pattern classification tasks. These methods consider semi-structured data as
graphs where nodes correspond to primitives (parts, interest points, segments,
etc.) and edges characterize the relationships between these primitives.
However, these non-vectorial graph data cannot be straightforwardly plugged
into off-the-shelf machine learning algorithms without a preliminary step of --
explicit/implicit -- graph vectorization and embedding. This embedding process
should be resilient to intra-class graph variations while being highly
discriminant. In this paper, we propose a novel high-order stochastic graphlet
embedding (SGE) that maps graphs into vector spaces. Our main contribution
includes a new stochastic search procedure that efficiently parses a given
graph and extracts/samples unlimitedly high-order graphlets. We consider these
graphlets, with increasing orders, to model local primitives as well as their
increasingly complex interactions. In order to build our graph representation,
we measure the distribution of these graphlets into a given graph, using
particular hash functions that efficiently assign sampled graphlets into
isomorphic sets with a very low probability of collision. When combined with
maximum margin classifiers, these graphlet-based representations have positive
impact on the performance of pattern comparison and recognition as corroborated
through extensive experiments using standard benchmark databases.Comment: arXiv admin note: substantial text overlap with arXiv:1702.0015
Stable and Informative Spectral Signatures for Graph Matching
In this paper, we consider the approximate weighted graph matching problem
and introduce stable and informative first and second order compatibility terms
suitable for inclusion into the popular integer quadratic program formulation.
Our approach relies on a rigorous analysis of stability of spectral signatures
based on the graph Laplacian. In the case of the first order term, we derive an
objective function that measures both the stability and informativeness of a
given spectral signature. By optimizing this objective, we design new spectral
node signatures tuned to a specific graph to be matched. We also introduce the
pairwise heat kernel distance as a stable second order compatibility term; we
justify its plausibility by showing that in a certain limiting case it
converges to the classical adjacency matrix-based second order compatibility
function. We have tested our approach on a set of synthetic graphs, the
widely-used CMU house sequence, and a set of real images. These experiments
show the superior performance of our first and second order compatibility terms
as compared with the commonly used ones.Comment: final version for CVPR201
Avoiding Unnecessary Information Loss: Correct and Efficient Model Synchronization Based on Triple Graph Grammars
Model synchronization, i.e., the task of restoring consistency between two
interrelated models after a model change, is a challenging task. Triple Graph
Grammars (TGGs) specify model consistency by means of rules that describe how
to create consistent pairs of models. These rules can be used to automatically
derive further rules, which describe how to propagate changes from one model to
the other or how to change one model in such a way that propagation is
guaranteed to be possible. Restricting model synchronization to these derived
rules, however, may lead to unnecessary deletion and recreation of model
elements during change propagation. This is inefficient and may cause
unnecessary information loss, i.e., when deleted elements contain information
that is not represented in the second model, this information cannot be
recovered easily. Short-cut rules have recently been developed to avoid
unnecessary information loss by reusing existing model elements. In this paper,
we show how to automatically derive (short-cut) repair rules from short-cut
rules to propagate changes such that information loss is avoided and model
synchronization is accelerated. The key ingredients of our rule-based model
synchronization process are these repair rules and an incremental pattern
matcher informing about suitable applications of them. We prove the termination
and the correctness of this synchronization process and discuss its
completeness. As a proof of concept, we have implemented this synchronization
process in eMoflon, a state-of-the-art model transformation tool with inherent
support of bidirectionality. Our evaluation shows that repair processes based
on (short-cut) repair rules have considerably decreased information loss and
improved performance compared to former model synchronization processes based
on TGGs.Comment: 33 pages, 20 figures, 3 table
Sublinear Models for Graphs
This contribution extends linear models for feature vectors to sublinear
models for graphs and analyzes their properties. The results are (i) a
geometric interpretation of sublinear classifiers, (ii) a generic learning rule
based on the principle of empirical risk minimization, (iii) a convergence
theorem for the margin perceptron in the sublinearly separable case, and (iv)
the VC-dimension of sublinear functions. Empirical results on graph data show
that sublinear models on graphs have similar properties as linear models for
feature vectors
Graph Kernels: State-of-the-Art and Future Challenges
Graph-structured data are an integral part of many application domains,
including chemoinformatics, computational biology, neuroimaging, and social
network analysis. Over the last two decades, numerous graph kernels, i.e.
kernel functions between graphs, have been proposed to solve the problem of
assessing the similarity between graphs, thereby making it possible to perform
predictions in both classification and regression settings. This manuscript
provides a review of existing graph kernels, their applications, software plus
data resources, and an empirical comparison of state-of-the-art graph kernels.Comment: Accepted by Foundations and Trends in Machine Learning, 202
- …