50 research outputs found

    Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions

    Full text link
    Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, k-densest subgraph) are NP-hard. Furthermore, the goal is rarely to find the "true optimum", but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine relationships among them. Current dense subgraph finding algorithms usually optimize some objective, and only find a few such subgraphs without providing any structural relations. We define the nucleus decomposition of a graph, which represents the graph as a forest of nuclei. Each nucleus is a subgraph where smaller cliques are present in many larger cliques. The forest of nuclei is a hierarchy by containment, where the edge density increases as we proceed towards leaf nuclei. Sibling nuclei can have limited intersections, which enables discovering overlapping dense subgraphs. With the right parameters, the nucleus decomposition generalizes the classic notions of k-cores and k-truss decompositions. We give provably efficient algorithms for nucleus decompositions, and empirically evaluate their behavior in a variety of real graphs. The tree of nuclei consistently gives a global, hierarchical snapshot of dense substructures, and outputs dense subgraphs of higher quality than other state-of-the-art solutions. Our algorithm can process graphs with tens of millions of edges in less than an hour

    Exploring Communities in Large Profiled Graphs

    Full text link
    Given a graph GG and a vertex q∈Gq\in G, the community search (CS) problem aims to efficiently find a subgraph of GG whose vertices are closely related to qq. Communities are prevalent in social and biological networks, and can be used in product advertisement and social event recommendation. In this paper, we study profiled community search (PCS), where CS is performed on a profiled graph. This is a graph in which each vertex has labels arranged in a hierarchical manner. Extensive experiments show that PCS can identify communities with themes that are common to their vertices, and is more effective than existing CS approaches. As a naive solution for PCS is highly expensive, we have also developed a tree index, which facilitate efficient and online solutions for PCS

    Efficient Data Modelling, Indexing and Processing in Large Datasets

    Full text link
    Many devices and applications in social networks and on-line services are producing, storing, and using description, location, and occurrence time of objects. There are various systems to study, model, index, and process a huge amount of data. In this thesis, we study graphs and publish/subscribe systems. Firstly, we study the problem of continuously updating top-k messages with the highest ranks, each of which contains all the requested keywords when the rank of a message calculates based on freshness and distance to query’s location. Since new incoming messages are arriving all the time and the score of existing top-k results are decreasing over time, providing the most recent information needs continuously computing and maintaining the best results. We propose an efficient indexing and matching method using keywords, location, and the most recent top-k results of queries. Secondly, we study the problem of the decomposition of (k,s)-core. As both the user engagement of nodes and the strength of relationships are important, the (k, s)-core model is proposed in the literature to discover strong communities. Nevertheless, the decomposition algorithm regarding (k,s)-core is not yet investigated. We propose (k,s)-core algorithms to decompose a graph into its hierarchical structures considering both user engagement and tie strength. We first present the basic (k,s)-core decomposition methods. Then, we propose the advanced algorithms DES and DEK which index the support of edges to enable higher-level cost-sharing in the peeling process. In addition, effective pruning strategies are applied to DES/DEK to further enhance performance. Moreover, we build a novel index based on the decomposition result and investigate efficient (k,s)-core query algorithm based on our index. Finally, we develop efficient algorithm for maintaining the (k, s)-core index of the dynamic graph where vertices and edges are inserted and deleted. The algorithm, uses pruning strategies by exploiting the lower and upper bounds of the core number. We define a new Smax core and develop an efficient method for updating (k,s) numbers of nodes

    Graph Data Processing and Analysis: From Algorithms to System Development

    Full text link
    There are many real-world application domains where data can be naturally modelled as graphs, such as social networks and computer networks. The amount of data generated and published is rapidly increasing with the explosion of information. Effective storage of graph data and querying has become a significant challenge; hence the graph database is emerging to address this challenge. Graph databases have the unique advantages of modelling and querying complex relationships, capturing and navigating complex data relationships and recursive path querying when handling graph data. In this thesis, we enhance graph databases from both system and algorithm perspectives. Firstly, we propose two systems, SQL2Cypher and FSPS, to improve the usability and efficiency of graph databases. SQL2Cypher automatically migrates data from a relational database to a graph database. This system also supports translating SQL queries into Cypher queries. FSPS is the first FPGA-based system for accelerating graph queries on massive graphs. FSPS has the following features 1) a CPU-FPGA co-designed framework, 2) a fully pipelined FPGA execution, and 3) reduced data transfer from FPGA’s external memory. FSPS supports the two most fundamental types of graph queries, namely subgraph and path queries. Performance evaluation shows that FSPS outperforms the most popular graph database, Neo4j, by up to three orders of magnitude. All the draft demo videos can be found at https://www.youtube.com/watch?v=oSpHtJ8iVio and https://www.youtube.com/watch?v=eGaeBrVTJws. Secondly, the graph database does not widely support the cohesive subgraph models (i.e., Neo4j and PatMat). Many real-world relationships can be naturally represented as bipartite graphs such as customer-product, user-item, and author-paper. Therefore, we use efficient construct algorithms to investigate the bipartite hierarchy model. The bipartite hierarchy is the first model to discover the hierarchical structure of bipartite graphs based on the concept of (alpha, beta)-core and graph connectivity. These algorithms can effectively identify the affected regions to limit computation scope and avoid re-building the bipartite hierarchy from scratch. Extensive experiments on 10 real-world graphs demonstrate the effectiveness of the proposed bipartite hierarchy and validate the efficiency of our hierarchy constructions algorithms
    corecore