1,371 research outputs found
Towards an Efficient Discovery of the Topological Representative Subgraphs
With the emergence of graph databases, the task of frequent subgraph
discovery has been extensively addressed. Although the proposed approaches in
the literature have made this task feasible, the number of discovered frequent
subgraphs is still very high to be efficiently used in any further exploration.
Feature selection for graph data is a way to reduce the high number of frequent
subgraphs based on exact or approximate structural similarity. However, current
structural similarity strategies are not efficient enough in many real-world
applications, besides, the combinatorial nature of graphs makes it
computationally very costly. In order to select a smaller yet structurally
irredundant set of subgraphs, we propose a novel approach that mines the top-k
topological representative subgraphs among the frequent ones. Our approach
allows detecting hidden structural similarities that existing approaches are
unable to detect such as the density or the diameter of the subgraph. In
addition, it can be easily extended using any user defined structural or
topological attributes depending on the sought properties. Empirical studies on
real and synthetic graph datasets show that our approach is fast and scalable
Enumerating Maximal Bicliques from a Large Graph using MapReduce
We consider the enumeration of maximal bipartite cliques (bicliques) from a
large graph, a task central to many practical data mining problems in social
network analysis and bioinformatics. We present novel parallel algorithms for
the MapReduce platform, and an experimental evaluation using Hadoop MapReduce.
Our algorithm is based on clustering the input graph into smaller sized
subgraphs, followed by processing different subgraphs in parallel. Our
algorithm uses two ideas that enable it to scale to large graphs: (1) the
redundancy in work between different subgraph explorations is minimized through
a careful pruning of the search space, and (2) the load on different reducers
is balanced through the use of an appropriate total order among the vertices.
Our evaluation shows that the algorithm scales to large graphs with millions of
edges and tens of mil- lions of maximal bicliques. To our knowledge, this is
the first work on maximal biclique enumeration for graphs of this scale.Comment: A preliminary version of the paper was accepted at the Proceedings of
the 3rd IEEE International Congress on Big Data 201
Inductive queries for a drug designing robot scientist
It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments
Core Decomposition in Multilayer Networks: Theory, Algorithms, and Applications
Multilayer networks are a powerful paradigm to model complex systems, where
multiple relations occur between the same entities. Despite the keen interest
in a variety of tasks, algorithms, and analyses in this type of network, the
problem of extracting dense subgraphs has remained largely unexplored so far.
In this work we study the problem of core decomposition of a multilayer
network. The multilayer context is much challenging as no total order exists
among multilayer cores; rather, they form a lattice whose size is exponential
in the number of layers. In this setting we devise three algorithms which
differ in the way they visit the core lattice and in their pruning techniques.
We then move a step forward and study the problem of extracting the
inner-most (also known as maximal) cores, i.e., the cores that are not
dominated by any other core in terms of their core index in all the layers.
Inner-most cores are typically orders of magnitude less than all the cores.
Motivated by this, we devise an algorithm that effectively exploits the
maximality property and extracts inner-most cores directly, without first
computing a complete decomposition.
Finally, we showcase the multilayer core-decomposition tool in a variety of
scenarios and problems. We start by considering the problem of densest-subgraph
extraction in multilayer networks. We introduce a definition of multilayer
densest subgraph that trades-off between high density and number of layers in
which the high density holds, and exploit multilayer core decomposition to
approximate this problem with quality guarantees. As further applications, we
show how to utilize multilayer core decomposition to speed-up the extraction of
frequent cross-graph quasi-cliques and to generalize the community-search
problem to the multilayer setting
- …