230 research outputs found

    Cohesive Subgraph Detection in Massive Networks

    Full text link
    Due to the strong expressive power of the graph model, many real-world applications model data and relationships among the data as a graph, and significant research efforts have been devoted towards efficiently and effectively managing and analyzing graph data. Among them, mining and querying cohesive subgraph structure in massive networks is of great importance for a deeper understanding and better management of such networks. However, the massive graph volume and rapid evolution present huge challenges, which need highly efficient solutions. In this thesis, we study three important problems in mining cohesive subgraph structure in massive networks, and designs efficient and scalable solutions. Firstly, We study the problem of structural graph clustering. We develop a new two-step paradigm for scalable structural graph clustering based on our three new observations. Then, we present a pSCAN approach, and propose optimization techniques to speed up checking whether two vertices are structure-similar. Moreover, we also propose efficient techniques for updating the clusters when the input graph dynamically changes. Secondly, we formulate and investigate the problem of diversified top-k community detection over labeled graphs. We introduce a model, called special-interest-group, to enforce both structural cohesiveness and focused interests of a community. We prove that computing the top-1 community is NP-hard. Nevertheless, we propose effective pruning techniques to efficiently enumerate all communities in a graph, based on which we then select diversified top-k communities in a greedy manner. We prove that our algorithm computes the top-k communities approximately but with a guaranteed approximation ratio. Finally, we study the problem of efficiently computing a maximum independent set from a large graph G (a maximum clique in the complement graph of G). We develop a Reducing-Peeling framework which iteratively reduces the graph size by applying reduction rules on vertices with very low degrees (Reducing) and temporarily removing with the highest degree (Peeling) if the reduction rules cannot be applied. Secondly, based on our framework we design two baseline algorithms, a linear-time algorithm and a near-linear time algorithm, by designing new reduction rules and developing techniques for efficiently and incrementally applying reduction rules

    Big Graph Analyses: From Queries to Dependencies and Association Rules

    Get PDF

    Exploring Communities in Large Profiled Graphs

    Full text link
    Given a graph GG and a vertex qāˆˆGq\in G, the community search (CS) problem aims to efficiently find a subgraph of GG whose vertices are closely related to qq. Communities are prevalent in social and biological networks, and can be used in product advertisement and social event recommendation. In this paper, we study profiled community search (PCS), where CS is performed on a profiled graph. This is a graph in which each vertex has labels arranged in a hierarchical manner. Extensive experiments show that PCS can identify communities with themes that are common to their vertices, and is more effective than existing CS approaches. As a naive solution for PCS is highly expensive, we have also developed a tree index, which facilitate efficient and online solutions for PCS

    Querying Web-Scale Information Networks Through Bounding Matching Scores

    Full text link

    Graph pattern matching on social network analysis

    Get PDF
    Graph pattern matching is fundamental to social network analysis. Its effectiveness for identifying social communities and social positions, making recommendations and so on has been repeatedly demonstrated. However, the social network analysis raises new challenges to graph pattern matching. As real-life social graphs are typically large, it is often prohibitively expensive to conduct graph pattern matching over such large graphs, e.g., NP-complete for subgraph isomorphism, cubic time for bounded simulation, and quadratic time for simulation. These hinder the applicability of graph pattern matching on social network analysis. In response to these challenges, the thesis presents a series of effective techniques for querying large, dynamic, and distributively stored social networks. First of all, we propose a notion of query preserving graph compression, to compress large social graphs relative to a class Q of queries. We then develop both batch and incremental compression strategies for two commonly used pattern queries. Via both theoretical analysis and experimental studies, we show that (1) using compressed graphs Gr benefits graph pattern matching dramatically; and (2) the computation of Gr as well as its maintenance can be processed efficiently. Secondly, we investigate the distributed graph pattern matching problem, and explore parallel computation for graph pattern matching. We show that our techniques possess following performance guarantees: (1) each site is visited only once; (2) the total network traffic is independent of the size of G; and (3) the response time is decided by the size of largest fragment of G rather than the size of entire G. Furthermore, we show how these distributed algorithms can be implemented in the MapReduce framework. Thirdly, we study the problem of answering graph pattern matching using views since view based techniques have proven an effective technique for speeding up query evaluation. We propose a notion of pattern containment to characterise graph pattern matching using views, and introduce efficient algorithms to answer graph pattern matching using views. Moreover, we identify three problems related to graph pattern containment, and provide efficient algorithms for containment checking (approximation when the problem is intractable). Fourthly, we revise graph pattern matching by supporting a designated output node, which we treat as ā€œquery focusā€. We then introduce algorithms for computing the top-k relevant matches w.r.t. the output node for both acyclic and cyclic pattern graphs, respectively, with early termination property. Furthermore, we investigate the diversified top-k matching problem, and develop an approximation algorithm with performance guarantee and a heuristic algorithm with early termination property. Finally, we introduce an expert search system, called ExpFinder, for large and dynamic social networks. ExpFinder identifies top-k experts in social networks by graph pattern matching, and copes with the sheer size of real-life social networks by integrating incremental graph pattern matching, query preserving compression and top-k matching computation. In particular, we also introduce bounded (resp. unbounded) incremental algorithms to maintain the weighted landmark vectors which are used for incremental maintenance for cached results

    Considering User Intention in Differential Graph Queries

    Get PDF
    Empty answers are a major problem by processing pattern matching queries in graph databases. Especially, there can be multiple reasons why a query failed. To support users in such situations, differential queries can be used that deliver missing parts of a graph query. Multiple heuristics are proposed for differential queries, which reduce the search space. Although they are successful in increasing the performance, they can discard query subgraphs relevant to a user. To address this issue, the authors extend the concept of differential queries and introduce top-k differential queries that calculate the ranking based on usersā€™ preferences and significantly support the usersā€™ understanding of query database management systems. A user assigns relevance weights to elements of a graph query that steer the search and are used for the ranking. In this paper the authors propose different strategies for selection of relevance weights and their propagation. As a result, the search is modelled along the most relevant paths. The authors evaluate their solution and both strategies on the DBpedia data graph
    • ā€¦
    corecore