130 research outputs found
Spatial association discovery process using frequent subgraph mining
Spatial associations are one of the most relevant kinds of patterns used by business intelligence regarding spatial data. Due to the characteristics of this particular type of information, different approaches have been proposed for spatial association mining. This wide variety of methods has entailed the need for a process to integrate the activities for association discovery, one that is easy to implement and flexible enough to be adapted to any particular situation, particularly for small and medium-size projects to guide the useful pattern discovery process. Thus, this work proposes an adaptable knowledge discovery process that uses graph theory to model different spatial relationships from multiple scenarios, and frequent subgraph mining to discover spatial associations. A proof of concept is presented using real data
Assessing the Computational Complexity of Multi-Layer Subgraph Detection
Multi-layer graphs consist of several graphs (layers) over the same vertex
set. They are motivated by real-world problems where entities (vertices) are
associated via multiple types of relationships (edges in different layers). We
chart the border of computational (in)tractability for the class of subgraph
detection problems on multi-layer graphs, including fundamental problems such
as maximum matching, finding certain clique relaxations (motivated by community
detection), or path problems. Mostly encountering hardness results, sometimes
even for two or three layers, we can also spot some islands of tractability
On Pattern Mining in Graph Data to Support Decision-Making
In recent years graph data models became increasingly important in both research and industry. Their core is a generic data structure of things (vertices) and connections among those things (edges). Rich graph models such as the property graph model promise an extraordinary analytical power because relationships can be evaluated without knowledge about a domain-specific database schema. This dissertation studies the usage of graph models for data integration and data mining of business data. Although a typical company's business data implicitly describes a graph it is usually stored in multiple relational databases. Therefore, we propose the first semi-automated approach to transform data from multiple relational databases into a single graph whose vertices represent domain objects and whose edges represent their mutual relationships. This transformation is the base of our conceptual framework BIIIG (Business Intelligence with Integrated Instance Graphs). We further proposed a graph-based approach to data integration. The process is executed after the transformation. In established data mining approaches interrelated input data is mostly represented by tuples of measure values and dimension values. In the context of graphs these values must be attached to the graph structure and aggregated measure values are graph attributes. Since the latter was not supported by any existing model, we proposed the use of collections of property graphs. They act as data structure of the novel Extended Property Graph Model (EPGM). The model supports vertices and edges that may appear in different graphs as well as graph properties. Further on, we proposed some operators that benefit from this data structure, for example, graph-based aggregation of measure values. A primitive operation of graph pattern mining is frequent subgraph mining (FSM). However, existing algorithms provided no support for directed multigraphs. We extended the popular gSpan algorithm to overcome this limitation. Some patterns might not be frequent while their generalizations are. Generalized graph patterns can be mined by attaching vertices to taxonomies. We proposed a novel approach to Generalized Multidimensional Frequent Subgraph Mining (GM-FSM), in particular the first solution to generalized FSM that supports not only directed multigraphs but also multiple dimensional taxonomies. In scenarios that compare patterns of different categories, e.g., fraud or not, FSM is not sufficient since pattern frequencies may differ by category. Further on, determining all pattern frequencies without frequency pruning is not an option due to the computational complexity of FSM. Thus, we developed an FSM extension to extract patterns that are characteristic for a specific category according to a user-defined interestingness function called Characteristic Subgraph Mining (CSM). Parts of this work were done in the context of GRADOOP, a framework for distributed graph analytics. To make the primitive operation of frequent subgraph mining available to this framework, we developed Distributed In-Memory gSpan (DIMSpan), a frequent subgraph miner that is tailored to the characteristics of shared-nothing clusters and distributed dataflow systems. Finally, the results of use case evaluations in cooperation with a large scale enterprise will be presented. This includes a report of practical experiences gained in implementation and application of the proposed algorithms
Fast Multiplex Graph Association Rules for Link Prediction
Multiplex networks allow us to study a variety of complex systems where nodes
connect to each other in multiple ways, for example friend, family, and
co-worker relations in social networks. Link prediction is the branch of
network analysis allowing us to forecast the future status of a network: which
new connections are the most likely to appear in the future? In multiplex link
prediction we also ask: of which type? Because this last question is
unanswerable with classical link prediction, here we investigate the use of
graph association rules to inform multiplex link prediction. We derive such
rules by identifying all frequent patterns in a network via multiplex graph
mining, and then score each unobserved link's likelihood by finding the
occurrences of each rule in the original network. Association rules add new
abilities to multiplex link prediction: to predict new node arrivals, to
consider higher order structures with four or more nodes, and to be memory
efficient. We improve over previous work by creating a framework that is also
efficient in terms of runtime, which enables an increase in prediction
performance. This increase in efficiency allows us to improve a case study on a
signed multiplex network, showing how graph association rules can provide
valuable insights to extend social balance theory.Comment: arXiv admin note: substantial text overlap with arXiv:2008.0835
Multilayer Networks
In most natural and engineered systems, a set of entities interact with each
other in complicated patterns that can encompass multiple types of
relationships, change in time, and include other types of complications. Such
systems include multiple subsystems and layers of connectivity, and it is
important to take such "multilayer" features into account to try to improve our
understanding of complex systems. Consequently, it is necessary to generalize
"traditional" network theory by developing (and validating) a framework and
associated tools to study multilayer systems in a comprehensive fashion. The
origins of such efforts date back several decades and arose in multiple
disciplines, and now the study of multilayer networks has become one of the
most important directions in network science. In this paper, we discuss the
history of multilayer networks (and related concepts) and review the exploding
body of work on such networks. To unify the disparate terminology in the large
body of recent work, we discuss a general framework for multilayer networks,
construct a dictionary of terminology to relate the numerous existing concepts
to each other, and provide a thorough discussion that compares, contrasts, and
translates between related notions such as multilayer networks, multiplex
networks, interdependent networks, networks of networks, and many others. We
also survey and discuss existing data sets that can be represented as
multilayer networks. We review attempts to generalize single-layer-network
diagnostics to multilayer networks. We also discuss the rapidly expanding
research on multilayer-network models and notions like community structure,
connected components, tensor decompositions, and various types of dynamical
processes on multilayer networks. We conclude with a summary and an outlook.Comment: Working paper; 59 pages, 8 figure
Modeling Graphs with Vertex Replacement Grammars
One of the principal goals of graph modeling is to capture the building
blocks of network data in order to study various physical and natural
phenomena. Recent work at the intersection of formal language theory and graph
theory has explored the use of graph grammars for graph modeling. However,
existing graph grammar formalisms, like Hyperedge Replacement Grammars, can
only operate on small tree-like graphs. The present work relaxes this
restriction by revising a different graph grammar formalism called Vertex
Replacement Grammars (VRGs). We show that a variant of the VRG called
Clustering-based Node Replacement Grammar (CNRG) can be efficiently extracted
from many hierarchical clusterings of a graph. We show that CNRGs encode a
succinct model of the graph, yet faithfully preserves the structure of the
original graph. In experiments on large real-world datasets, we show that
graphs generated from the CNRG model exhibit a diverse range of properties that
are similar to those found in the original networks.Comment: Accepted as a regular paper at IEEE ICDM 2019. 15 pages, 9 figure
- …