1,243 research outputs found
Vertex unique labelled subgraph mining
This thesis proposes the novel concept of Vertex Unique Labelled Subgraph (VULS) mining with respect to the field of graph-based knowledge discovery (or graph mining). The objective of the research is to investigate the benefits that the concept of VULS can offer in the context of vertex classification. A VULS is a subgraph with a particular structure and edge labelling that has a unique vertex labelling associated with it within a given (set of) host graph(s). VULS can describe highly discriminative and significant local geometries each with a particular associated vertex label pattern. This knowledge can then be used to predict vertex labels in ``unseen" graphs (graphs with edge labels, but without vertex labels). Thus this research is directed at identifying (mining) VULS, of various forms, that ``best" serve to both capture effectively graph information, while at the same time allowing for the generation of effective vertex label predictors (classifiers). To this end, four VULS classifiers are proposed, directed at mining four different kinds of VULS: (i) complete, (ii) minimal, (iii) frequent and (iv) minimal frequent. The thesis describes and discusses each of these in detail including, in each case, the theoretical definition and algorithms with respect to VULS identification and prediction. A full evaluation of each of the VULS categories is also presented. VULS has wide applicability in areas where the domain of interest can be represented in the form of some sort of a graph. The evaluation was primarily directed at predicting a form of deformation, known as springback, that occurs in the Asymmetric Incremental Sheet Forming (AISF) manufacturing process. For the evaluation two flat-topped, square-based, pyramid shapes were used. Each pyramid had been manufactured twice using Steel and twice using Titanium. The utilisation of VULS was also explored by applying the VULS concept to the field of satellite image interpretation. Satellite data describing two villages located in a rural part of the Ethiopian hinterland were used for this purpose. In each case the ground surface was represented in a similar manner to the way that AISF sheet metal surfaces were represented, with the dimension describing the grey scale value. The idea here was to predict vertex labels describing ground type. As will become apparent, from the work presented in this thesis, the VULS concept is well suited to the task of 3D surface classification with respect to AISF and satellite imagery. The thesis demonstrates that the use of frequent VULS (rather than the other forms of VULS considered) produces more efficient results in the AISF sheet metal forming application domain, whilst the use of minimal VULS provided promising results in the context of the satellite image interpretation domain. The reported evaluation also indicates that a sound foundation has been established for future work on more general VULS based vertex classification
Tight lower bounds on the number of bicliques in false-twin-free graphs
A \emph{biclique} is a maximal bipartite complete induced subgraph of .
Bicliques have been studied in the last years motivated by the large number of
applications. In particular, enumeration of the maximal bicliques has been of
interest in data analysis. Associated with this issue, bounds on the maximum
number of bicliques were given. In this paper we study bounds on the minimun
number of bicliques of a graph. Since adding false-twin vertices to does
not change the number of bicliques, we restrict to false-twin-free graphs. We
give a tight lower bound on the minimum number bicliques for a subclass of
,false-twin-free graphs and for the class of
,false-twin-free graphs. Finally we discuss the problem for general
graphs.Comment: 16 pages, 4 figue
Reductions for Frequency-Based Data Mining Problems
Studying the computational complexity of problems is one of the - if not the
- fundamental questions in computer science. Yet, surprisingly little is known
about the computational complexity of many central problems in data mining. In
this paper we study frequency-based problems and propose a new type of
reduction that allows us to compare the complexities of the maximal frequent
pattern mining problems in different domains (e.g. graphs or sequences). Our
results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader
range of data mining problems. Our results show that, by allowing constraints
in the pattern space, the complexities of many maximal frequent pattern mining
problems collapse. These problems include maximal frequent subgraphs in
labelled graphs, maximal frequent itemsets, and maximal frequent subsequences
with no repetitions. In addition to theoretical interest, our results might
yield more efficient algorithms for the studied problems.Comment: This is an extended version of a paper of the same title to appear in
the Proceedings of the 17th IEEE International Conference on Data Mining
(ICDM'17
Frequent Subgraph Mining in Outerplanar Graphs
In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of the search space, but have not identified a practically relevant tractable graph class beyond trees. In this paper, we define the class of so called tenuous outerplanar graphs, a strict generalization of trees, develop a frequent subgraph mining algorithm for tenuous outerplanar graphs that works in incremental polynomial time, and evaluate the algorithm empirically on the NCI molecular graph dataset
Frequent Subgraph Mining in Outerplanar Graphs
In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of the search space, but have not identified a practically relevant tractable graph class beyond trees. In this paper, we define the class of so called tenuous outerplanar graphs, a strict generalization of trees, develop a frequent subgraph mining algorithm for tenuous outerplanar graphs that works in incremental polynomial time, and evaluate the algorithm empirically on the NCI molecular graph dataset
Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs
We study the problem of approximating the -profile of a large graph.
-profiles are generalizations of triangle counts that specify the number of
times a small graph appears as an induced subgraph of a large graph. Our
algorithm uses the novel concept of -profile sparsifiers: sparse graphs that
can be used to approximate the full -profile counts for a given large graph.
Further, we study the problem of estimating local and ego -profiles, two
graph quantities that characterize the local neighborhood of each vertex of a
graph.
Our algorithm is distributed and operates as a vertex program over the
GraphLab PowerGraph framework. We introduce the concept of edge pivoting which
allows us to collect -hop information without maintaining an explicit
-hop neighborhood list at each vertex. This enables the computation of all
the local -profiles in parallel with minimal communication.
We test out implementation in several experiments scaling up to cores
on Amazon EC2. We find that our algorithm can estimate the -profile of a
graph in approximately the same time as triangle counting. For the harder
problem of ego -profiles, we introduce an algorithm that can estimate
profiles of hundreds of thousands of vertices in parallel, in the timescale of
minutes.Comment: To appear in part at KDD'1
- …