88,275 research outputs found
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
Mining subjectively interesting attributed subgraphs
Community detection in graphs, data clustering, and local pattern mining
are three mature fields of data mining and machine learning.
In recent years, attributed subgraph mining is emerging as a new
powerful data mining task in the intersection of these areas.
Given a graph and a set of attributes for each vertex,
attributed subgraph mining aims to find cohesive subgraphs
for which (a subset of) the attribute values has exceptional values in some sense.
While research on this task can borrow from the three abovementioned fields,
the principled integration of graph and attribute data poses two challenges:
the definition of a pattern language that is intuitive and lends itself to efficient search strategies,
and the formalization of the interestingness of such patterns.
We propose an integrated solution to both of these challenges.
The proposed pattern language improves upon prior work in being both highly flexible and intuitive.
We show how an effective and principled algorithm can enumerate patterns of this language.
The proposed approach for quantifying interestingness of patterns of this language
is rooted in information theory, and is able to account for prior knowledge on the data.
Prior work typically quantifies interestingness based on the cohesion of the subgraph
and for the exceptionality of its attributes separately,
combining these in a parameterized trade-off.
Instead, in our proposal this trade-off is implicitly handled in a principled, parameter-free manner.
Extensive empirical results confirm the proposed pattern syntax is intuitive,
and the interestingness measure aligns well with actual subjective interestingness
Anomaly and Change Detection in Graph Streams through Constant-Curvature Manifold Embeddings
Mapping complex input data into suitable lower dimensional manifolds is a
common procedure in machine learning. This step is beneficial mainly for two
reasons: (1) it reduces the data dimensionality and (2) it provides a new data
representation possibly characterised by convenient geometric properties.
Euclidean spaces are by far the most widely used embedding spaces, thanks to
their well-understood structure and large availability of consolidated
inference methods. However, recent research demonstrated that many types of
complex data (e.g., those represented as graphs) are actually better described
by non-Euclidean geometries. Here, we investigate how embedding graphs on
constant-curvature manifolds (hyper-spherical and hyperbolic manifolds) impacts
on the ability to detect changes in sequences of attributed graphs. The
proposed methodology consists in embedding graphs into a geometric space and
perform change detection there by means of conventional methods for numerical
streams. The curvature of the space is a parameter that we learn to reproduce
the geometry of the original application-dependent graph space. Preliminary
experimental results show the potential capability of representing graphs by
means of curved manifold, in particular for change and anomaly detection
problems.Comment: To be published in IEEE IJCNN 201
Conflict Detection for Edits on Extended Feature Models using Symbolic Graph Transformation
Feature models are used to specify variability of user-configurable systems
as appearing, e.g., in software product lines. Software product lines are
supposed to be long-living and, therefore, have to continuously evolve over
time to meet ever-changing requirements. Evolution imposes changes to feature
models in terms of edit operations. Ensuring consistency of concurrent edits
requires appropriate conflict detection techniques. However, recent approaches
fail to handle crucial subtleties of extended feature models, namely
constraints mixing feature-tree patterns with first-order logic formulas over
non-Boolean feature attributes with potentially infinite value domains. In this
paper, we propose a novel conflict detection approach based on symbolic graph
transformation to facilitate concurrent edits on extended feature models. We
describe extended feature models formally with symbolic graphs and edit
operations with symbolic graph transformation rules combining graph patterns
with first-order logic formulas. The approach is implemented by combining
eMoflon with an SMT solver, and evaluated with respect to applicability.Comment: In Proceedings FMSPLE 2016, arXiv:1603.0857
Explainable subgraphs with surprising densities : a subgroup discovery approach
The connectivity structure of graphs is typically related to the attributes of the nodes. In social networks for example, the probability of a friendship between any pair of people depends on a range of attributes, such as their age, residence location, workplace, and hobbies. The high-level structure of a graph can thus possibly be described well by means of patterns of the form `the subgroup of all individuals with a certain properties X are often (or rarely) friends with individuals in another subgroup defined by properties Y', in comparison to what is expected. Such rules present potentially actionable and generalizable insight into the graph.
We present a method that finds node subgroup pairs between which the edge density is interestingly high or low, using an information-theoretic definition of interestingness. Additionally, the interestingness is quantified subjectively, to contrast with prior information an analyst may have about the connectivity. This view immediatly enables iterative mining of such patterns. This is the first method aimed at graph connectivity relations between different subgroups. Our method generalizes prior work on dense subgraphs induced by a subgroup description. Although this setting has been studied already, we demonstrate for this special case considerable practical advantages of our subjective interestingness measure with respect to a wide range of (objective) interestingness measures
- …