Search CORE

34 research outputs found

A survey of frequent subgraph mining algorithms

Author: Coenen Frans
Jiang Chuntao
Zito Michele
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 20/11/2012
Field of study

AbstractGraph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired frequent subgraphs in a way that is computationally efficient and procedurally effective. This paper presents a survey of current research in the field of frequent subgraph mining and proposes solutions to address the main research issues.</jats:p

University of Liverpool Repository

Crossref

Is Frequent Pattern Mining useful in building predictive models?

Author: Thashmee Karunaratne
Publication venue
Publication date: 11/04/2020
Field of study

Abstract. The recent studies of pattern mining have given more attention to discovering patterns that are interesting, significant, discriminative and so forth, than simply frequent. Does this imply that the frequent patterns are not useful anymore? In this paper we carry out a survey of frequent pattern mining and, using an empirical study , show how far the frequent pattern mining is useful in building predictive models

CiteSeerX

Frequent subgraph mining algorithms on weighted graphs

Author: Jiang Chuntao
Publication venue
Publication date
Field of study

This thesis describes research work undertaken in the field of graph-based knowledge discovery (or graph mining). The objective of the research is to investigate the benefits that the concept of weighted frequent subgraph mining can offer in the context of the graph model based classification. Weighted subgraphs are graphs where some of the vertexes/edges are considered to be more significant than others. How to discover frequent sub-structures with different strengths is the main issue to be resolved in this thesis. The main approach to addressing this issue is to integrate weight constraints into the frequent subgraph mining process. It is suggested that the utilization of weighted frequent subgraph mining generates more discriminate and significant subgraphs, which will have application in, for example, the classification and clustering of graph data

University of Liverpool Repository

A graph-based knowledge representation and pattern mining supporting the Digital Twin creation of existing manufacturing systems

Author: Braun Dominik
Jazdi Nasser
Müller Timo
Sahlab Nada
Schloegl Wolfgang
Weyrich Michael
Publication venue
Publication date: 21/09/2022
Field of study

The creation of a Digital Twin for existing manufacturing systems, so-called brownfield systems, is a challenging task due to the needed expert knowledge about the structure of brownfield systems and the effort to realize the digital models. Several approaches and methods have already been proposed that at least partially digitalize the information about a brownfield manufacturing system. A Digital Twin requires linked information from multiple sources. This paper presents a graph-based approach to merge information from heterogeneous sources. Furthermore, the approach provides a way to automatically identify templates using graph structure analysis to facilitate further work with the resulting Digital Twin and its further enhancement.Comment: 4 pages, 3 figures. Accepted at IEEE ETFA 202

arXiv.org e-Print Archive

Significant Subgraph Mining with Multiple Testing Correction

Author: Borgwardt Karsten M.
Kasenburg Niklas
López Felipe Llinares
Sugiyama Mahito
Publication venue
Publication date: 01/01/2015
Field of study

The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant itemset mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the standard Bonferroni correction on real-world datasets. An open question, however, is whether this strategy of excluding untestable hypotheses also leads to greater statistical power in subgraph mining, in which the number of hypotheses is much larger than in itemset mining. Here we answer this question by an empirical investigation on eight popular graph benchmark datasets. We propose a new efficient search strategy, which always returns the same solution as the state-of-the-art approach and is approximately two orders of magnitude faster. Moreover, we exploit the dependence between subgraphs by considering the effective number of tests and thereby further increase the statistical power.Comment: 18 pages, 5 figure, accepted to the 2015 SIAM International Conference on Data Mining (SDM15

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Effiziente Prozessmodellanalyse mit Algorithmen der Subgraphisomorphie

Author: Becker Jörg
Breuker Dominic
Delfmann Patrick
Dietrich Hanns-Alexander
Steinhorst Matthias
Publication venue: Institut für Wirtschaftsinformatik
Publication date: 01/01/2012
Field of study

In der Literatur existiert eine Vielzahl verschiedener Ansätze, um Prozessmodelle strukturell zu analysieren. Ein Unterproblem, das oft in vielen dieser Ansätze auftritt, ist die Identifikation von (häufig auftretenden) Subgraphen innerhalb der Modellgraphen. Um diese Problemstellung zu lösen, können graphentheoretische Algorithmen genutzt werden. Der vorliegende Artikel demonstriert, dass derartige Algorithmen in der Lage sind, große Mengen von Prozessmodellen innerhalb von (Milli-)Sekunden zu analysieren. Sie können folglich als Unterkomponente in bestehende Analyseansätze integriert werden, um (potenziell aufwändigere) Eigenentwicklungen zu ersetzen. Der Vorteil dieser Algorithmen liegt in ihrer breiten, nicht auf konkrete Modellierungssprachen oder Analysezwecke beschränkten Anwendbarkeit

Digitale Bibliothek Braunschweig

GraphMDL : sélection de motifs de graphes avec le principe MDL

Author: Bariatti Francesco
Cellier Peggy
Ferré Sébastien
Publication venue: HAL CCSD
Publication date: 27/01/2020
Field of study

International audienceMany graph pattern mining algorithms have been designed to identify recurring structures in graphs. The main drawback of these approaches is that they often extract too many patterns for human analysis. Recently, pattern mining methods using the Minimum Description Length (MDL) principle have been proposed to select a characteristic subset of patterns from transactional, sequential and relational data. In this paper, we propose a MDL-based approach for selecting a characteristic subset of patterns on labeled graphs. A key notion in this paper is the introduction of ports to encode connections between pattern occurrences without any loss of information. Experiments show that the number of patterns is drastically reduced, and the selected patterns can have complex shapes.Plusieurs algorithmes de fouille de motifs ont été proposés pour iden-tifier des structures récurrentes dans les graphes. Le principal défaut de ces ap-proches est qu'elles produisent généralement trop de motifs pour qu'une analyse humaine soit possible. Récemment, des méthodes de fouille de motifs ont traité ce problème sur des données transactionnelles, séquentielles et relationnelles en utilisant le principe MDL (Minimum Description Length). Dans ce papier, nous proposons une approche MDL pour sélectionner un sous-ensemble représentatif de motifs sur des graphes étiquetés. Une notion clé de notre approche est l'in-troduction de ports pour encoder les connections entre occurrences de motifs, sans perte d'information. Nos expériences montrent que le nombre de motifs est drastiquement réduit et que les motifs sélectionnés peuvent avoir des formes complexes

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

FS^3: A Sampling based method for top-k Frequent Subgraph Mining

Author: Hasan Mohammad Al
Saha Tanay Kumar
Publication venue
Publication date: 02/09/2014
Field of study

Mining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomorphism task which is computationally expensive, so they suffer from the lack of scalability problem when the graphs in the input database are large. In this work, we propose FS^3, which is a sampling based method. It mines a small collection of subgraphs that are most frequent in the probabilistic sense. FS^3 performs a Markov Chain Monte Carlo (MCMC) sampling over the space of a fixed-size subgraphs such that the potentially frequent subgraphs are sampled more often. Besides, FS^3 is equipped with an innovative queue manager. It stores the sampled subgraph in a finite queue over the course of mining in such a manner that the top-k positions in the queue contain the most frequent subgraphs. Our experiments on database of large graphs show that FS^3 is efficient, and it obtains subgraphs that are the most frequent amongst the subgraphs of a given size

arXiv.org e-Print Archive

CiteSeerX

IUPUIScholarWorks

Frequent Subgraph Mining via Sampling with Rigorous Guarantees

Author
Publication venue
Publication date
Field of study

Frequent subgraph mining is a fundamental task in the analysis of collections of graphs that aims at finding all the subgraphs that appear with more than a user-specified frequency in the dataset. While several exact approaches have been proposed to solve the task, it remains computationally challenging on large graph datasets due to the complexity of the subgraph isomorphism problem inherent in the task and the huge number of candidate patterns even for fairly small subgraphs. In this thesis, we study two statistical learning measures of complexity, VC-dimension and Rademacher averages, for subgraphs, and derive efficiently computable bounds for both. We then show how such bounds can be applied to devise efficient sampling-based approaches for rigorously approximating the solutions of the frequent subgraph mining problem, providing sample sizes which are much tighter than what would be obtained by a straightforward application of Chernoff and union bounds. We also show that our bounds can be used for true frequent subgraph mining, which requires to identify subgraphs generated with probability above a given threshold using samples from an unknown generative process. Moreover, we carried out an extensive experimental evaluation of our methods on real datasets, which shows that our bounds lead to efficiently computable and high-quality approximations for both applications.Frequent subgraph mining is a fundamental task in the analysis of collections of graphs that aims at finding all the subgraphs that appear with more than a user-specified frequency in the dataset. While several exact approaches have been proposed to solve the task, it remains computationally challenging on large graph datasets due to the complexity of the subgraph isomorphism problem inherent in the task and the huge number of candidate patterns even for fairly small subgraphs. In this thesis, we study two statistical learning measures of complexity, VC-dimension and Rademacher averages, for subgraphs, and derive efficiently computable bounds for both. We then show how such bounds can be applied to devise efficient sampling-based approaches for rigorously approximating the solutions of the frequent subgraph mining problem, providing sample sizes which are much tighter than what would be obtained by a straightforward application of Chernoff and union bounds. We also show that our bounds can be used for true frequent subgraph mining, which requires to identify subgraphs generated with probability above a given threshold using samples from an unknown generative process. Moreover, we carried out an extensive experimental evaluation of our methods on real datasets, which shows that our bounds lead to efficiently computable and high-quality approximations for both applications

Padua Thesis and Dissertation Archive