7 research outputs found

    Fast Multiplex Graph Association Rules for Link Prediction

    Full text link
    Multiplex networks allow us to study a variety of complex systems where nodes connect to each other in multiple ways, for example friend, family, and co-worker relations in social networks. Link prediction is the branch of network analysis allowing us to forecast the future status of a network: which new connections are the most likely to appear in the future? In multiplex link prediction we also ask: of which type? Because this last question is unanswerable with classical link prediction, here we investigate the use of graph association rules to inform multiplex link prediction. We derive such rules by identifying all frequent patterns in a network via multiplex graph mining, and then score each unobserved link's likelihood by finding the occurrences of each rule in the original network. Association rules add new abilities to multiplex link prediction: to predict new node arrivals, to consider higher order structures with four or more nodes, and to be memory efficient. We improve over previous work by creating a framework that is also efficient in terms of runtime, which enables an increase in prediction performance. This increase in efficiency allows us to improve a case study on a signed multiplex network, showing how graph association rules can provide valuable insights to extend social balance theory.Comment: arXiv admin note: substantial text overlap with arXiv:2008.0835

    Graph set data mining

    Get PDF
    Graphs are among the most versatile abstract data types in computer science. With the variety comes great adoption in various application fields, such as chemistry, biology, social analysis, logistics, and computer science itself. With the growing capacities of digital storage, the collection of large amounts of data has become the norm in many application fields. Data mining, i.e., the automated extraction of non-trivial patterns from data, is a key step to extract knowledge from these datasets and generate value. This thesis is dedicated to concurrent scalable data mining algorithms beyond traditional notions of efficiency for large-scale datasets of small labeled graphs; more precisely, structural clustering and representative subgraph pattern mining. It is motivated by, but not limited to, the need to analyze molecular libraries of ever-increasing size in the drug discovery process. Structural clustering makes use of graph theoretical concepts, such as (common) subgraph isomorphisms and frequent subgraphs, to model cluster commonalities directly in the application domain. It is considered computationally demanding for non-restricted graph classes and with very few exceptions prior algorithms are only suitable for very small datasets. This thesis discusses the first truly scalable structural clustering algorithm StruClus with linear worst-case complexity. At the same time, StruClus embraces the inherent values of structural clustering algorithms, i.e., interpretable, consistent, and high-quality results. A novel two-fold sampling strategy with stochastic error bounds for frequent subgraph mining is presented. It enables fast extraction of cluster commonalities in the form of common subgraph representative sets. StruClus is the first structural clustering algorithm with a directed selection of structural cluster-representative patterns regarding homogeneity and separation aspects in the high-dimensional subgraph pattern space. Furthermore, a novel concept of cluster homogeneity balancing using dynamically-sized representatives is discussed. The second part of this thesis discusses the representative subgraph pattern mining problem in more general terms. A novel objective function maximizes the number of represented graphs for a cardinality-constrained representative set. It is shown that the problem is a special case of the maximum coverage problem and is NP-hard. Based on the greedy approximation of Nemhauser, Wolsey, and Fisher for submodular set function maximization a novel sampling approach is presented. It mines candidate sets that contain an optimal greedy solution with a probabilistic maximum error. This leads to a constant-time algorithm to generate the candidate sets given a fixed-size sample of the dataset. In combination with a cheap single-pass streaming evaluation of the candidate sets, this enables scalability to datasets with billions of molecules on a single machine. Ultimately, the sampling approach leads to the first distributed subgraph pattern mining algorithm that distributes the pattern space and the dataset graphs at the same time

    Proceedings. 19. Workshop Computational Intelligence, Dortmund, 2. - 4. Dezember 2009

    Get PDF
    Dieser Tagungsband enthält die Beiträge des 19. Workshops „Computational Intelligence“ des Fachausschusses 5.14 der VDI/VDE-Gesellschaft für Mess- und Automatisierungstechnik (GMA) und der Fachgruppe „Fuzzy-Systeme und Soft-Computing“ der Gesellschaft für Informatik (GI), der vom 2.-4. Dezember 2009 im Haus Bommerholz bei Dortmund stattfindet

    Vertex unique labelled subgraph mining

    Get PDF
    This thesis proposes the novel concept of Vertex Unique Labelled Subgraph (VULS) mining with respect to the field of graph-based knowledge discovery (or graph mining). The objective of the research is to investigate the benefits that the concept of VULS can offer in the context of vertex classification. A VULS is a subgraph with a particular structure and edge labelling that has a unique vertex labelling associated with it within a given (set of) host graph(s). VULS can describe highly discriminative and significant local geometries each with a particular associated vertex label pattern. This knowledge can then be used to predict vertex labels in ``unseen" graphs (graphs with edge labels, but without vertex labels). Thus this research is directed at identifying (mining) VULS, of various forms, that ``best" serve to both capture effectively graph information, while at the same time allowing for the generation of effective vertex label predictors (classifiers). To this end, four VULS classifiers are proposed, directed at mining four different kinds of VULS: (i) complete, (ii) minimal, (iii) frequent and (iv) minimal frequent. The thesis describes and discusses each of these in detail including, in each case, the theoretical definition and algorithms with respect to VULS identification and prediction. A full evaluation of each of the VULS categories is also presented. VULS has wide applicability in areas where the domain of interest can be represented in the form of some sort of a graph. The evaluation was primarily directed at predicting a form of deformation, known as springback, that occurs in the Asymmetric Incremental Sheet Forming (AISF) manufacturing process. For the evaluation two flat-topped, square-based, pyramid shapes were used. Each pyramid had been manufactured twice using Steel and twice using Titanium. The utilisation of VULS was also explored by applying the VULS concept to the field of satellite image interpretation. Satellite data describing two villages located in a rural part of the Ethiopian hinterland were used for this purpose. In each case the ground surface was represented in a similar manner to the way that AISF sheet metal surfaces were represented, with the zz dimension describing the grey scale value. The idea here was to predict vertex labels describing ground type. As will become apparent, from the work presented in this thesis, the VULS concept is well suited to the task of 3D surface classification with respect to AISF and satellite imagery. The thesis demonstrates that the use of frequent VULS (rather than the other forms of VULS considered) produces more efficient results in the AISF sheet metal forming application domain, whilst the use of minimal VULS provided promising results in the context of the satellite image interpretation domain. The reported evaluation also indicates that a sound foundation has been established for future work on more general VULS based vertex classification
    corecore