3,572 research outputs found
Mining Density Contrast Subgraphs
Dense subgraph discovery is a key primitive in many graph mining
applications, such as detecting communities in social networks and mining gene
correlation from biological data. Most studies on dense subgraph mining only
deal with one graph. However, in many applications, we have more than one graph
describing relations among a same group of entities. In this paper, given two
graphs sharing the same set of vertices, we investigate the problem of
detecting subgraphs that contrast the most with respect to density. We call
such subgraphs Density Contrast Subgraphs, or DCS in short. Two widely used
graph density measures, average degree and graph affinity, are considered. For
both density measures, mining DCS is equivalent to mining the densest subgraph
from a "difference" graph, which may have both positive and negative edge
weights. Due to the existence of negative edge weights, existing dense subgraph
detection algorithms cannot identify the subgraph we need. We prove the
computational hardness of mining DCS under the two graph density measures and
develop efficient algorithms to find DCS. We also conduct extensive experiments
on several real-world datasets to evaluate our algorithms. The experimental
results show that our algorithms are both effective and efficient.Comment: Full version of an ICDE'18 pape
Explainable subgraphs with surprising densities : a subgroup discovery approach
The connectivity structure of graphs is typically related to the attributes of the nodes. In social networks for example, the probability of a friendship between any pair of people depends on a range of attributes, such as their age, residence location, workplace, and hobbies. The high-level structure of a graph can thus possibly be described well by means of patterns of the form `the subgroup of all individuals with a certain properties X are often (or rarely) friends with individuals in another subgroup defined by properties Y', in comparison to what is expected. Such rules present potentially actionable and generalizable insight into the graph.
We present a method that finds node subgroup pairs between which the edge density is interestingly high or low, using an information-theoretic definition of interestingness. Additionally, the interestingness is quantified subjectively, to contrast with prior information an analyst may have about the connectivity. This view immediatly enables iterative mining of such patterns. This is the first method aimed at graph connectivity relations between different subgroups. Our method generalizes prior work on dense subgraphs induced by a subgroup description. Although this setting has been studied already, we demonstrate for this special case considerable practical advantages of our subjective interestingness measure with respect to a wide range of (objective) interestingness measures
Development of Computer Science Disciplines - A Social Network Analysis Approach
In contrast to many other scientific disciplines, computer science considers
conference publications. Conferences have the advantage of providing fast
publication of papers and of bringing researchers together to present and
discuss the paper with peers. Previous work on knowledge mapping focused on the
map of all sciences or a particular domain based on ISI published JCR (Journal
Citation Report). Although this data covers most of important journals, it
lacks computer science conference and workshop proceedings. That results in an
imprecise and incomplete analysis of the computer science knowledge. This paper
presents an analysis on the computer science knowledge network constructed from
all types of publications, aiming at providing a complete view of computer
science research. Based on the combination of two important digital libraries
(DBLP and CiteSeerX), we study the knowledge network created at
journal/conference level using citation linkage, to identify the development of
sub-disciplines. We investigate the collaborative and citation behavior of
journals/conferences by analyzing the properties of their co-authorship and
citation subgraphs. The paper draws several important conclusions. First,
conferences constitute social structures that shape the computer science
knowledge. Second, computer science is becoming more interdisciplinary. Third,
experts are the key success factor for sustainability of journals/conferences
Network Sampling: From Static to Streaming Graphs
Network sampling is integral to the analysis of social, information, and
biological networks. Since many real-world networks are massive in size,
continuously evolving, and/or distributed in nature, the network structure is
often sampled in order to facilitate study. For these reasons, a more thorough
and complete understanding of network sampling is critical to support the field
of network science. In this paper, we outline a framework for the general
problem of network sampling, by highlighting the different objectives,
population and units of interest, and classes of network sampling methods. In
addition, we propose a spectrum of computational models for network sampling
methods, ranging from the traditionally studied model based on the assumption
of a static domain to a more challenging model that is appropriate for
streaming domains. We design a family of sampling methods based on the concept
of graph induction that generalize across the full spectrum of computational
models (from static to streaming) while efficiently preserving many of the
topological properties of the input graphs. Furthermore, we demonstrate how
traditional static sampling algorithms can be modified for graph streams for
each of the three main classes of sampling methods: node, edge, and
topology-based sampling. Our experimental results indicate that our proposed
family of sampling methods more accurately preserves the underlying properties
of the graph for both static and streaming graphs. Finally, we study the impact
of network sampling algorithms on the parameter estimation and performance
evaluation of relational classification algorithms
Towards an Efficient Discovery of the Topological Representative Subgraphs
With the emergence of graph databases, the task of frequent subgraph
discovery has been extensively addressed. Although the proposed approaches in
the literature have made this task feasible, the number of discovered frequent
subgraphs is still very high to be efficiently used in any further exploration.
Feature selection for graph data is a way to reduce the high number of frequent
subgraphs based on exact or approximate structural similarity. However, current
structural similarity strategies are not efficient enough in many real-world
applications, besides, the combinatorial nature of graphs makes it
computationally very costly. In order to select a smaller yet structurally
irredundant set of subgraphs, we propose a novel approach that mines the top-k
topological representative subgraphs among the frequent ones. Our approach
allows detecting hidden structural similarities that existing approaches are
unable to detect such as the density or the diameter of the subgraph. In
addition, it can be easily extended using any user defined structural or
topological attributes depending on the sought properties. Empirical studies on
real and synthetic graph datasets show that our approach is fast and scalable
Robust Densest Subgraph Discovery
Dense subgraph discovery is an important primitive in graph mining, which has
a wide variety of applications in diverse domains. In the densest subgraph
problem, given an undirected graph with an edge-weight vector
, we aim to find that maximizes the density,
i.e., , where is the sum of the weights of the edges in the
subgraph induced by . Although the densest subgraph problem is one of the
most well-studied optimization problems for dense subgraph discovery, there is
an implicit strong assumption; it is assumed that the weights of all the edges
are known exactly as input. In real-world applications, there are often cases
where we have only uncertain information of the edge weights. In this study, we
provide a framework for dense subgraph discovery under the uncertainty of edge
weights. Specifically, we address such an uncertainty issue using the theory of
robust optimization. First, we formulate our fundamental problem, the robust
densest subgraph problem, and present a simple algorithm. We then formulate the
robust densest subgraph problem with sampling oracle that models dense subgraph
discovery using an edge-weight sampling oracle, and present an algorithm with a
strong theoretical performance guarantee. Computational experiments using both
synthetic graphs and popular real-world graphs demonstrate the effectiveness of
our proposed algorithms.Comment: 10 pages; Accepted to ICDM 201
- …