18,528 research outputs found
Frequent Subgraph Mining in Outerplanar Graphs
In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of the search space, but have not identified a practically relevant tractable graph class beyond trees. In this paper, we define the class of so called tenuous outerplanar graphs, a strict generalization of trees, develop a frequent subgraph mining algorithm for tenuous outerplanar graphs that works in incremental polynomial time, and evaluate the algorithm empirically on the NCI molecular graph dataset
Frequent Subgraph Mining in Outerplanar Graphs
In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of the search space, but have not identified a practically relevant tractable graph class beyond trees. In this paper, we define the class of so called tenuous outerplanar graphs, a strict generalization of trees, develop a frequent subgraph mining algorithm for tenuous outerplanar graphs that works in incremental polynomial time, and evaluate the algorithm empirically on the NCI molecular graph dataset
Reductions for Frequency-Based Data Mining Problems
Studying the computational complexity of problems is one of the - if not the
- fundamental questions in computer science. Yet, surprisingly little is known
about the computational complexity of many central problems in data mining. In
this paper we study frequency-based problems and propose a new type of
reduction that allows us to compare the complexities of the maximal frequent
pattern mining problems in different domains (e.g. graphs or sequences). Our
results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader
range of data mining problems. Our results show that, by allowing constraints
in the pattern space, the complexities of many maximal frequent pattern mining
problems collapse. These problems include maximal frequent subgraphs in
labelled graphs, maximal frequent itemsets, and maximal frequent subsequences
with no repetitions. In addition to theoretical interest, our results might
yield more efficient algorithms for the studied problems.Comment: This is an extended version of a paper of the same title to appear in
the Proceedings of the 17th IEEE International Conference on Data Mining
(ICDM'17
Burning a Graph is Hard
Graph burning is a model for the spread of social contagion. The burning
number is a graph parameter associated with graph burning that measures the
speed of the spread of contagion in a graph; the lower the burning number, the
faster the contagion spreads. We prove that the corresponding graph decision
problem is \textbf{NP}-complete when restricted to acyclic graphs with maximum
degree three, spider graphs and path-forests. We provide polynomial time
algorithms for finding the burning number of spider graphs and path-forests if
the number of arms and components, respectively, are fixed.Comment: 20 Pages, 4 figures, presented at GRASTA-MAC 2015 (October 19-23rd,
2015, Montr\'eal, Canada
Graph-based task libraries for robots: generalization and autocompletion
In this paper, we consider an autonomous robot that persists
over time performing tasks and the problem of providing one additional
task to the robot's task library. We present an approach to generalize
tasks, represented as parameterized graphs with sequences, conditionals,
and looping constructs of sensing and actuation primitives. Our approach
performs graph-structure task generalization, while maintaining task ex-
ecutability and parameter value distributions. We present an algorithm
that, given the initial steps of a new task, proposes an autocompletion
based on a recognized past similar task. Our generalization and auto-
completion contributions are eective on dierent real robots. We show
concrete examples of the robot primitives and task graphs, as well as
results, with Baxter. In experiments with multiple tasks, we show a sig-
nicant reduction in the number of new task steps to be provided
Performance and scalability of indexed subgraph query processing methods
Graph data management systems have become very popular
as graphs are the natural data model for many applications.
One of the main problems addressed by these systems is subgraph
query processing; i.e., given a query graph, return all
graphs that contain the query. The naive method for processing
such queries is to perform a subgraph isomorphism
test against each graph in the dataset. This obviously does
not scale, as subgraph isomorphism is NP-Complete. Thus,
many indexing methods have been proposed to reduce the
number of candidate graphs that have to underpass the subgraph
isomorphism test. In this paper, we identify a set of
key factors-parameters, that influence the performance of
related methods: namely, the number of nodes per graph,
the graph density, the number of distinct labels, the number
of graphs in the dataset, and the query graph size. We then
conduct comprehensive and systematic experiments that analyze
the sensitivity of the various methods on the values of
the key parameters. Our aims are twofold: first to derive
conclusions about the algorithms’ relative performance, and,
second, to stress-test all algorithms, deriving insights as to
their scalability, and highlight how both performance and
scalability depend on the above factors. We choose six wellestablished
indexing methods, namely Grapes, CT-Index,
GraphGrepSX, gIndex, Tree+∆, and gCode, as representative
approaches of the overall design space, including the
most recent and best performing methods. We report on
their index construction time and index size, and on query
processing performance in terms of time and false positive
ratio. We employ both real and synthetic datasets. Specifi-
cally, four real datasets of different characteristics are used:
AIDS, PDBS, PCM, and PPI. In addition, we generate a
large number of synthetic graph datasets, empowering us to
systematically study the algorithms’ performance and scalability
versus the aforementioned key parameters
The Complexity of Finding Effectors
The NP-hard EFFECTORS problem on directed graphs is motivated by applications
in network mining, particularly concerning the analysis of probabilistic
information-propagation processes in social networks. In the corresponding
model the arcs carry probabilities and there is a probabilistic diffusion
process activating nodes by neighboring activated nodes with probabilities as
specified by the arcs. The point is to explain a given network activation state
as well as possible by using a minimum number of "effector nodes"; these are
selected before the activation process starts.
We correct, complement, and extend previous work from the data mining
community by a more thorough computational complexity analysis of EFFECTORS,
identifying both tractable and intractable cases. To this end, we also exploit
a parameterization measuring the "degree of randomness" (the number of "really"
probabilistic arcs) which might prove useful for analyzing other probabilistic
network diffusion problems as well.Comment: 28 page
- …