4,665 research outputs found
Cohesive Subgraph Detection on Large Graphs
University of Technology Sydney. Faculty of Engineering and Information Technology.Graphs have been widely used to model sophisticated relationships between different entities due to their strong representative properties. Social networks, traffic networks, and biological networks are among the applications that benefit from being expressed as graphs. The cohesive subgraph is an essential structure for understanding the organization of many real-world networks, and cohesive subgraph detection is a crucial problem in network analysis. There are many cohesive subgraph models, such as k-core, strongly connected component, and maximum density subgraph.
Uncertain graph management and analysis have attracted much research attention. Among them, computing k-cores in uncertain graphs (aka, (k, eta)-cores) is an important problem and has emerged in many applications. However, the existing algorithms for computing (k, eta)-cores heavily depend on the two input parameters k and eta. In addition, computing and updating the eta-degree for each vertex is the costliest component in the algorithm, and the cost is high.
To overcome these drawbacks, we propose an index-based solution for computing (k, eta)-cores. The index size is well-bounded by O(m), where m is the number of edges in the graph. Based on the index, queries for any k and eta can be answered in optimal time. We propose an algorithm for index construction with several different optimizations.
We also discuss the (k, eta)-core computation when graphs cannot be entirely stored in memory. We adopt the semi-external setting, which allows O(n) memory usage, where n is the number of vertices in the graph. This assumption is reasonable in practice, and it has been widely adopted in massive graph analysis. We design an index-based solution for I/O efficient (k, eta)-core computation.
Given the frequent updates in many real-world graphs, detecting strongly connected components (SCC) in dynamic graphs is a very complicated problem. In the thesis, we study the fully dynamic depth-first search (DFS) problem in directed graphs, which is a crucial basis of dynamic SCC detection. In the literature, most works focus on the dynamic DFS problem in undirected graphs and directed acyclic graphs. However, their methods cannot easily be applied in the case of general directed graphs. Motivated by this, we propose a framework and corresponding algorithms for both edge insertion and deletion in general directed graphs. We further give several optimizations to speed up the algorithms.
We conduct extensive experiments on several large real-world graphs to practically evaluate the performance of all proposed algorithms
Distance-generalized Core Decomposition
The -core of a graph is defined as the maximal subgraph in which every
vertex is connected to at least other vertices within that subgraph. In
this work we introduce a distance-based generalization of the notion of
-core, which we refer to as the -core, i.e., the maximal subgraph in
which every vertex has at least other vertices at distance within
that subgraph. We study the properties of the -core showing that it
preserves many of the nice features of the classic core decomposition (e.g.,
its connection with the notion of distance-generalized chromatic number) and it
preserves its usefulness to speed-up or approximate distance-generalized
notions of dense structures, such as -club.
Computing the distance-generalized core decomposition over large networks is
intrinsically complex. However, by exploiting clever upper and lower bounds we
can partition the computation in a set of totally independent subcomputations,
opening the door to top-down exploration and to multithreading, and thus
achieving an efficient algorithm
Core Decomposition in Multilayer Networks: Theory, Algorithms, and Applications
Multilayer networks are a powerful paradigm to model complex systems, where
multiple relations occur between the same entities. Despite the keen interest
in a variety of tasks, algorithms, and analyses in this type of network, the
problem of extracting dense subgraphs has remained largely unexplored so far.
In this work we study the problem of core decomposition of a multilayer
network. The multilayer context is much challenging as no total order exists
among multilayer cores; rather, they form a lattice whose size is exponential
in the number of layers. In this setting we devise three algorithms which
differ in the way they visit the core lattice and in their pruning techniques.
We then move a step forward and study the problem of extracting the
inner-most (also known as maximal) cores, i.e., the cores that are not
dominated by any other core in terms of their core index in all the layers.
Inner-most cores are typically orders of magnitude less than all the cores.
Motivated by this, we devise an algorithm that effectively exploits the
maximality property and extracts inner-most cores directly, without first
computing a complete decomposition.
Finally, we showcase the multilayer core-decomposition tool in a variety of
scenarios and problems. We start by considering the problem of densest-subgraph
extraction in multilayer networks. We introduce a definition of multilayer
densest subgraph that trades-off between high density and number of layers in
which the high density holds, and exploit multilayer core decomposition to
approximate this problem with quality guarantees. As further applications, we
show how to utilize multilayer core decomposition to speed-up the extraction of
frequent cross-graph quasi-cliques and to generalize the community-search
problem to the multilayer setting
Efficient Node Proximity and Node Significance Computations in Graphs
abstract: Node proximity measures are commonly used for quantifying how nearby or otherwise related to two or more nodes in a graph are. Node significance measures are mainly used to find how much nodes are important in a graph. The measures of node proximity/significance have been highly effective in many predictions and applications. Despite their effectiveness, however, there are various shortcomings. One such shortcoming is a scalability problem due to their high computation costs on large size graphs and another problem on the measures is low accuracy when the significance of node and its degree in the graph are not related. The other problem is that their effectiveness is less when information for a graph is uncertain. For an uncertain graph, they require exponential computation costs to calculate ranking scores with considering all possible worlds.
In this thesis, I first introduce Locality-sensitive, Re-use promoting, approximate Personalized PageRank (LR-PPR) which is an approximate personalized PageRank calculating node rankings for the locality information for seeds without calculating the entire graph and reusing the precomputed locality information for different locality combinations. For the identification of locality information, I present Impact Neighborhood Indexing (INI) to find impact neighborhoods with nodes' fingerprints propagation on the network. For the accuracy challenge, I introduce Degree Decoupled PageRank (D2PR) technique to improve the effectiveness of PageRank based knowledge discovery, especially considering the significance of neighbors and degree of a given node. To tackle the uncertain challenge, I introduce Uncertain Personalized PageRank (UPPR) to approximately compute personalized PageRank values on uncertainties of edge existence and Interval Personalized PageRank with Integration (IPPR-I) and Interval Personalized PageRank with Mean (IPPR-M) to compute ranking scores for the case when uncertainty exists on edge weights as interval values.Dissertation/ThesisDoctoral Dissertation Computer Science 201
Efficient Subgraph Matching on Billion Node Graphs
The ability to handle large scale graph data is crucial to an increasing
number of applications. Much work has been dedicated to supporting basic graph
operations such as subgraph matching, reachability, regular expression
matching, etc. In many cases, graph indices are employed to speed up query
processing. Typically, most indices require either super-linear indexing time
or super-linear indexing space. Unfortunately, for very large graphs,
super-linear approaches are almost always infeasible. In this paper, we study
the problem of subgraph matching on billion-node graphs. We present a novel
algorithm that supports efficient subgraph matching for graphs deployed on a
distributed memory store. Instead of relying on super-linear indices, we use
efficient graph exploration and massive parallel computing for query
processing. Our experimental results demonstrate the feasibility of performing
subgraph matching on web-scale graph data.Comment: VLDB201
- …