44 research outputs found
Extending local features with contextual information in graph kernels
Graph kernels are usually defined in terms of simpler kernels over local
substructures of the original graphs. Different kernels consider different
types of substructures. However, in some cases they have similar predictive
performances, probably because the substructures can be interpreted as
approximations of the subgraphs they induce. In this paper, we propose to
associate to each feature a piece of information about the context in which the
feature appears in the graph. A substructure appearing in two different graphs
will match only if it appears with the same context in both graphs. We propose
a kernel based on this idea that considers trees as substructures, and where
the contexts are features too. The kernel is inspired from the framework in
[6], even if it is not part of it. We give an efficient algorithm for computing
the kernel and show promising results on real-world graph classification
datasets.Comment: To appear in ICONIP 201
funcGNN: A Graph Neural Network Approach to Program Similarity
Program similarity is a fundamental concept, central to the solution of
software engineering tasks such as software plagiarism, clone identification,
code refactoring and code search. Accurate similarity estimation between
programs requires an in-depth understanding of their structure, semantics and
flow. A control flow graph (CFG), is a graphical representation of a program
which captures its logical control flow and hence its semantics. A common
approach is to estimate program similarity by analysing CFGs using graph
similarity measures, e.g. graph edit distance (GED). However, graph edit
distance is an NP-hard problem and computationally expensive, making the
application of graph similarity techniques to complex software programs
impractical. This study intends to examine the effectiveness of graph neural
networks to estimate program similarity, by analysing the associated control
flow graphs. We introduce funcGNN, which is a graph neural network trained on
labeled CFG pairs to predict the GED between unseen program pairs by utilizing
an effective embedding vector. To our knowledge, this is the first time graph
neural networks have been applied on labeled CFGs for estimating the similarity
between high-level language programs. Results: We demonstrate the effectiveness
of funcGNN to estimate the GED between programs and our experimental analysis
demonstrates how it achieves a lower error rate (0.00194), with faster (23
times faster than the quickest traditional GED approximation method) and better
scalability compared with the state of the art methods. funcGNN posses the
inductive learning ability to infer program structure and generalise to unseen
programs. The graph embedding of a program proposed by our methodology could be
applied to several related software engineering problems (such as code
plagiarism and clone identification) thus opening multiple research directions.Comment: 11 pages, 8 figures, 3 table
Probabilistic Clustering of Time-Evolving Distance Data
We present a novel probabilistic clustering model for objects that are
represented via pairwise distances and observed at different time points. The
proposed method utilizes the information given by adjacent time points to find
the underlying cluster structure and obtain a smooth cluster evolution. This
approach allows the number of objects and clusters to differ at every time
point, and no identification on the identities of the objects is needed.
Further, the model does not require the number of clusters being specified in
advance -- they are instead determined automatically using a Dirichlet process
prior. We validate our model on synthetic data showing that the proposed method
is more accurate than state-of-the-art clustering methods. Finally, we use our
dynamic clustering model to analyze and illustrate the evolution of brain
cancer patients over time
Hierarchies and Ranks for Persistence Pairs
We develop a novel hierarchy for zero-dimensional persistence pairs, i.e.,
connected components, which is capable of capturing more fine-grained spatial
relations between persistence pairs. Our work is motivated by a lack of spatial
relationships between features in persistence diagrams, leading to a limited
expressive power. We build upon a recently-introduced hierarchy of pairs in
persistence diagrams that augments the pairing stored in persistence diagrams
with information about which components merge. Our proposed hierarchy captures
differences in branching structure. Moreover, we show how to use our hierarchy
to measure the spatial stability of a pairing and we define a rank function for
persistence pairs and demonstrate different applications.Comment: Topology-based Methods in Visualization 201
Kernels on Graphs as Proximity Measures
International audienceKernels and, broadly speaking, similarity measures on graphs are extensively used in graph-based unsupervised and semi-supervised learning algorithms as well as in the link prediction problem. We analytically study proximity and distance properties of various kernels and similarity measures on graphs. This can potentially be useful for recommending the adoption of one or another similarity measure in a machine learning method. Also, we numerically compare various similarity measures in the context of spectral clustering and observe that normalized heat-type similarity measures with log modification generally perform the best