4,665 research outputs found

    Cohesive Subgraph Detection on Large Graphs

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Graphs have been widely used to model sophisticated relationships between different entities due to their strong representative properties. Social networks, traffic networks, and biological networks are among the applications that benefit from being expressed as graphs. The cohesive subgraph is an essential structure for understanding the organization of many real-world networks, and cohesive subgraph detection is a crucial problem in network analysis. There are many cohesive subgraph models, such as k-core, strongly connected component, and maximum density subgraph. Uncertain graph management and analysis have attracted much research attention. Among them, computing k-cores in uncertain graphs (aka, (k, eta)-cores) is an important problem and has emerged in many applications. However, the existing algorithms for computing (k, eta)-cores heavily depend on the two input parameters k and eta. In addition, computing and updating the eta-degree for each vertex is the costliest component in the algorithm, and the cost is high. To overcome these drawbacks, we propose an index-based solution for computing (k, eta)-cores. The index size is well-bounded by O(m), where m is the number of edges in the graph. Based on the index, queries for any k and eta can be answered in optimal time. We propose an algorithm for index construction with several different optimizations. We also discuss the (k, eta)-core computation when graphs cannot be entirely stored in memory. We adopt the semi-external setting, which allows O(n) memory usage, where n is the number of vertices in the graph. This assumption is reasonable in practice, and it has been widely adopted in massive graph analysis. We design an index-based solution for I/O efficient (k, eta)-core computation. Given the frequent updates in many real-world graphs, detecting strongly connected components (SCC) in dynamic graphs is a very complicated problem. In the thesis, we study the fully dynamic depth-first search (DFS) problem in directed graphs, which is a crucial basis of dynamic SCC detection. In the literature, most works focus on the dynamic DFS problem in undirected graphs and directed acyclic graphs. However, their methods cannot easily be applied in the case of general directed graphs. Motivated by this, we propose a framework and corresponding algorithms for both edge insertion and deletion in general directed graphs. We further give several optimizations to speed up the algorithms. We conduct extensive experiments on several large real-world graphs to practically evaluate the performance of all proposed algorithms

    Distance-generalized Core Decomposition

    Full text link
    The kk-core of a graph is defined as the maximal subgraph in which every vertex is connected to at least kk other vertices within that subgraph. In this work we introduce a distance-based generalization of the notion of kk-core, which we refer to as the (k,h)(k,h)-core, i.e., the maximal subgraph in which every vertex has at least kk other vertices at distance ≤h\leq h within that subgraph. We study the properties of the (k,h)(k,h)-core showing that it preserves many of the nice features of the classic core decomposition (e.g., its connection with the notion of distance-generalized chromatic number) and it preserves its usefulness to speed-up or approximate distance-generalized notions of dense structures, such as hh-club. Computing the distance-generalized core decomposition over large networks is intrinsically complex. However, by exploiting clever upper and lower bounds we can partition the computation in a set of totally independent subcomputations, opening the door to top-down exploration and to multithreading, and thus achieving an efficient algorithm

    Core Decomposition in Multilayer Networks: Theory, Algorithms, and Applications

    Get PDF
    Multilayer networks are a powerful paradigm to model complex systems, where multiple relations occur between the same entities. Despite the keen interest in a variety of tasks, algorithms, and analyses in this type of network, the problem of extracting dense subgraphs has remained largely unexplored so far. In this work we study the problem of core decomposition of a multilayer network. The multilayer context is much challenging as no total order exists among multilayer cores; rather, they form a lattice whose size is exponential in the number of layers. In this setting we devise three algorithms which differ in the way they visit the core lattice and in their pruning techniques. We then move a step forward and study the problem of extracting the inner-most (also known as maximal) cores, i.e., the cores that are not dominated by any other core in terms of their core index in all the layers. Inner-most cores are typically orders of magnitude less than all the cores. Motivated by this, we devise an algorithm that effectively exploits the maximality property and extracts inner-most cores directly, without first computing a complete decomposition. Finally, we showcase the multilayer core-decomposition tool in a variety of scenarios and problems. We start by considering the problem of densest-subgraph extraction in multilayer networks. We introduce a definition of multilayer densest subgraph that trades-off between high density and number of layers in which the high density holds, and exploit multilayer core decomposition to approximate this problem with quality guarantees. As further applications, we show how to utilize multilayer core decomposition to speed-up the extraction of frequent cross-graph quasi-cliques and to generalize the community-search problem to the multilayer setting

    Efficient Node Proximity and Node Significance Computations in Graphs

    Get PDF
    abstract: Node proximity measures are commonly used for quantifying how nearby or otherwise related to two or more nodes in a graph are. Node significance measures are mainly used to find how much nodes are important in a graph. The measures of node proximity/significance have been highly effective in many predictions and applications. Despite their effectiveness, however, there are various shortcomings. One such shortcoming is a scalability problem due to their high computation costs on large size graphs and another problem on the measures is low accuracy when the significance of node and its degree in the graph are not related. The other problem is that their effectiveness is less when information for a graph is uncertain. For an uncertain graph, they require exponential computation costs to calculate ranking scores with considering all possible worlds. In this thesis, I first introduce Locality-sensitive, Re-use promoting, approximate Personalized PageRank (LR-PPR) which is an approximate personalized PageRank calculating node rankings for the locality information for seeds without calculating the entire graph and reusing the precomputed locality information for different locality combinations. For the identification of locality information, I present Impact Neighborhood Indexing (INI) to find impact neighborhoods with nodes' fingerprints propagation on the network. For the accuracy challenge, I introduce Degree Decoupled PageRank (D2PR) technique to improve the effectiveness of PageRank based knowledge discovery, especially considering the significance of neighbors and degree of a given node. To tackle the uncertain challenge, I introduce Uncertain Personalized PageRank (UPPR) to approximately compute personalized PageRank values on uncertainties of edge existence and Interval Personalized PageRank with Integration (IPPR-I) and Interval Personalized PageRank with Mean (IPPR-M) to compute ranking scores for the case when uncertainty exists on edge weights as interval values.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Efficient Subgraph Matching on Billion Node Graphs

    Full text link
    The ability to handle large scale graph data is crucial to an increasing number of applications. Much work has been dedicated to supporting basic graph operations such as subgraph matching, reachability, regular expression matching, etc. In many cases, graph indices are employed to speed up query processing. Typically, most indices require either super-linear indexing time or super-linear indexing space. Unfortunately, for very large graphs, super-linear approaches are almost always infeasible. In this paper, we study the problem of subgraph matching on billion-node graphs. We present a novel algorithm that supports efficient subgraph matching for graphs deployed on a distributed memory store. Instead of relying on super-linear indices, we use efficient graph exploration and massive parallel computing for query processing. Our experimental results demonstrate the feasibility of performing subgraph matching on web-scale graph data.Comment: VLDB201
    • …
    corecore