50 research outputs found
Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions
Finding dense substructures in a graph is a fundamental graph mining
operation, with applications in bioinformatics, social networks, and
visualization to name a few. Yet most standard formulations of this problem
(like clique, quasiclique, k-densest subgraph) are NP-hard. Furthermore, the
goal is rarely to find the "true optimum", but to identify many (if not all)
dense substructures, understand their distribution in the graph, and ideally
determine relationships among them. Current dense subgraph finding algorithms
usually optimize some objective, and only find a few such subgraphs without
providing any structural relations. We define the nucleus decomposition of a
graph, which represents the graph as a forest of nuclei. Each nucleus is a
subgraph where smaller cliques are present in many larger cliques. The forest
of nuclei is a hierarchy by containment, where the edge density increases as we
proceed towards leaf nuclei. Sibling nuclei can have limited intersections,
which enables discovering overlapping dense subgraphs. With the right
parameters, the nucleus decomposition generalizes the classic notions of
k-cores and k-truss decompositions. We give provably efficient algorithms for
nucleus decompositions, and empirically evaluate their behavior in a variety of
real graphs. The tree of nuclei consistently gives a global, hierarchical
snapshot of dense substructures, and outputs dense subgraphs of higher quality
than other state-of-the-art solutions. Our algorithm can process graphs with
tens of millions of edges in less than an hour
Exploring Communities in Large Profiled Graphs
Given a graph and a vertex , the community search (CS) problem
aims to efficiently find a subgraph of whose vertices are closely related
to . Communities are prevalent in social and biological networks, and can be
used in product advertisement and social event recommendation. In this paper,
we study profiled community search (PCS), where CS is performed on a profiled
graph. This is a graph in which each vertex has labels arranged in a
hierarchical manner. Extensive experiments show that PCS can identify
communities with themes that are common to their vertices, and is more
effective than existing CS approaches. As a naive solution for PCS is highly
expensive, we have also developed a tree index, which facilitate efficient and
online solutions for PCS
Efficient Data Modelling, Indexing and Processing in Large Datasets
Many devices and applications in social networks and on-line services are producing, storing, and using description, location, and occurrence time of objects. There are various systems to study, model, index, and process a huge amount of data. In this thesis, we study graphs and publish/subscribe systems.
Firstly, we study the problem of continuously updating top-k messages with the highest ranks, each of which contains all the requested keywords when the rank of a message calculates based on freshness and distance to query’s location.
Since new incoming messages are arriving all the time and the score of existing top-k results are decreasing over time, providing the most recent information needs continuously computing and maintaining the best results. We propose an efficient indexing and matching method using keywords, location, and the most recent top-k results of queries.
Secondly, we study the problem of the decomposition of (k,s)-core. As both the user engagement of nodes and the strength of relationships are important, the (k, s)-core model is proposed in the literature to discover strong communities. Nevertheless, the decomposition algorithm regarding (k,s)-core is not yet investigated. We propose (k,s)-core algorithms to decompose a graph into its hierarchical structures considering both user engagement and tie strength. We first present the basic (k,s)-core decomposition methods. Then, we propose the advanced algorithms DES and DEK which index the support of edges to enable higher-level cost-sharing in the peeling process. In addition, effective pruning strategies are applied to DES/DEK to further enhance performance. Moreover, we build a novel index based on the decomposition result and investigate efficient (k,s)-core query algorithm based on our index.
Finally, we develop efficient algorithm for maintaining the (k, s)-core index of the dynamic graph where vertices and edges are inserted and deleted. The algorithm, uses pruning strategies by exploiting the lower and upper bounds of the core number. We define a new Smax core and develop an efficient method for updating (k,s)
numbers of nodes
Graph Data Processing and Analysis: From Algorithms to System Development
There are many real-world application domains where data can be naturally modelled as graphs, such as social networks and computer networks. The amount of data generated and published is rapidly increasing with the explosion of information. Effective storage of graph data and querying has become a significant challenge; hence the graph database is emerging to address this challenge. Graph databases have the unique advantages of modelling and querying complex relationships, capturing and navigating complex data relationships and recursive path querying when handling graph data. In this thesis, we enhance graph databases from both system and algorithm perspectives.
Firstly, we propose two systems, SQL2Cypher and FSPS, to improve the usability and efficiency of graph databases. SQL2Cypher automatically migrates data from a relational database to a graph database. This system also supports translating SQL queries into Cypher queries. FSPS is the first FPGA-based system for accelerating graph queries on massive graphs. FSPS has the following features 1) a CPU-FPGA co-designed framework, 2) a fully pipelined FPGA execution, and 3) reduced data transfer from FPGA’s external memory. FSPS supports the two most fundamental types of graph queries, namely subgraph and path queries. Performance evaluation shows that FSPS outperforms the most popular graph database, Neo4j, by up to three orders of magnitude. All the draft demo videos can be found at https://www.youtube.com/watch?v=oSpHtJ8iVio and https://www.youtube.com/watch?v=eGaeBrVTJws.
Secondly, the graph database does not widely support the cohesive subgraph models (i.e., Neo4j and PatMat). Many real-world relationships can be naturally represented as bipartite graphs such as customer-product, user-item, and author-paper. Therefore, we use efficient construct algorithms to investigate the bipartite hierarchy model. The bipartite hierarchy is the first model to discover the hierarchical structure of bipartite graphs based on the concept of (alpha, beta)-core and graph connectivity. These algorithms can effectively identify the affected regions to limit computation scope and avoid re-building the bipartite hierarchy from scratch. Extensive experiments on 10 real-world graphs demonstrate the effectiveness of the proposed bipartite hierarchy and validate the efficiency of our hierarchy constructions algorithms