2,342 research outputs found

    The Minimum Wiener Connector

    Full text link
    The Wiener index of a graph is the sum of all pairwise shortest-path distances between its vertices. In this paper we study the novel problem of finding a minimum Wiener connector: given a connected graph G=(V,E)G=(V,E) and a set Q⊆VQ\subseteq V of query vertices, find a subgraph of GG that connects all query vertices and has minimum Wiener index. We show that The Minimum Wiener Connector admits a polynomial-time (albeit impractical) exact algorithm for the special case where the number of query vertices is bounded. We show that in general the problem is NP-hard, and has no PTAS unless P=NP\mathbf{P} = \mathbf{NP}. Our main contribution is a constant-factor approximation algorithm running in time O~(∣Q∣∣E∣)\widetilde{O}(|Q||E|). A thorough experimentation on a large variety of real-world graphs confirms that our method returns smaller and denser solutions than other methods, and does so by adding to the query set QQ a small number of important vertices (i.e., vertices with high centrality).Comment: Published in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Dat

    Approximate Closest Community Search in Networks

    Get PDF
    Recently, there has been significant interest in the study of the community search problem in social and information networks: given one or more query nodes, find densely connected communities containing the query nodes. However, most existing studies do not address the "free rider" issue, that is, nodes far away from query nodes and irrelevant to them are included in the detected community. Some state-of-the-art models have attempted to address this issue, but not only are their formulated problems NP-hard, they do not admit any approximations without restrictive assumptions, which may not always hold in practice. In this paper, given an undirected graph G and a set of query nodes Q, we study community search using the k-truss based community model. We formulate our problem of finding a closest truss community (CTC), as finding a connected k-truss subgraph with the largest k that contains Q, and has the minimum diameter among such subgraphs. We prove this problem is NP-hard. Furthermore, it is NP-hard to approximate the problem within a factor (2−ε)(2-\varepsilon), for any ε>0\varepsilon >0 . However, we develop a greedy algorithmic framework, which first finds a CTC containing Q, and then iteratively removes the furthest nodes from Q, from the graph. The method achieves 2-approximation to the optimal solution. To further improve the efficiency, we make use of a compact truss index and develop efficient algorithms for k-truss identification and maintenance as nodes get eliminated. In addition, using bulk deletion optimization and local exploration strategies, we propose two more efficient algorithms. One of them trades some approximation quality for efficiency while the other is a very efficient heuristic. Extensive experiments on 6 real-world networks show the effectiveness and efficiency of our community model and search algorithms

    Probabilistic Random Walk Models for Comparative Network Analysis

    Get PDF
    Graph-based systems and data analysis methods have become critical tools in many fields as they can provide an intuitive way of representing and analyzing interactions between variables. Due to the advances in measurement techniques, a massive amount of labeled data that can be represented as nodes on a graph (or network) have been archived in databases. Additionally, novel data without label information have been gradually generated and archived. Labeling and identifying characteristics of novel data is an important first step in utilizing the valuable data in an effective and meaningful way. Comparative network analysis is an effective computational means to identify and predict the properties of the unlabeled data by comparing the similarities and differences between well-studied and less-studied networks. Comparative network analysis aims to identify the matching nodes and conserved subnetworks across multiple networks to enable a prediction of the properties of the nodes in the less-studied networks based on the properties of the matching nodes in the well-studied networks (i.e., transferring knowledge between networks). One of the fundamental and important questions in comparative network analysis is how to accurately estimate node-to-node correspondence as it can be a critical clue in analyzing the similarities and differences between networks. Node correspondence is a comprehensive similarity that integrates various types of similarity measurements in a balanced manner. However, there are several challenges in accurately estimating the node correspondence for large-scale networks. First, the scale of the networks is a critical issue. As networks generally include a large number of nodes, we have to examine an extremely large space and it can pose a computational challenge due to the combinatorial nature of the problem. Furthermore, although there are matching nodes and conserved subnetworks in different networks, structural variations such as node insertions and deletions make it difficult to integrate a topological similarity. In this dissertation, novel probabilistic random walk models are proposed to accurately estimate node-to-node correspondence between networks. First, we propose a context-sensitive random walk (CSRW) model. In the CSRW model, the random walker analyzes the context of the current position of the random walker and it can switch the random movement to either a simultaneous walk on both networks or an individual walk on one of the networks. The context-sensitive nature of the random walker enables the method to effectively integrate different types of similarities by dealing with structural variations. Second, we propose the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) model. In the CUFID model, we construct an integrated network by inserting pseudo edges between potential matching nodes in different networks. Then, we design the random walk protocol to transit more frequently between potential matching nodes as their node similarity increases and they have more matching neighboring nodes. We apply the proposed random walk models to comparative network analysis problems: global network alignment and network querying. Through extensive performance evaluations, we demonstrate that the proposed random walk models can accurately estimate node correspondence and these can lead to improved and reliable network comparison results

    Probabilistic Random Walk Models for Comparative Network Analysis

    Get PDF
    Graph-based systems and data analysis methods have become critical tools in many fields as they can provide an intuitive way of representing and analyzing interactions between variables. Due to the advances in measurement techniques, a massive amount of labeled data that can be represented as nodes on a graph (or network) have been archived in databases. Additionally, novel data without label information have been gradually generated and archived. Labeling and identifying characteristics of novel data is an important first step in utilizing the valuable data in an effective and meaningful way. Comparative network analysis is an effective computational means to identify and predict the properties of the unlabeled data by comparing the similarities and differences between well-studied and less-studied networks. Comparative network analysis aims to identify the matching nodes and conserved subnetworks across multiple networks to enable a prediction of the properties of the nodes in the less-studied networks based on the properties of the matching nodes in the well-studied networks (i.e., transferring knowledge between networks). One of the fundamental and important questions in comparative network analysis is how to accurately estimate node-to-node correspondence as it can be a critical clue in analyzing the similarities and differences between networks. Node correspondence is a comprehensive similarity that integrates various types of similarity measurements in a balanced manner. However, there are several challenges in accurately estimating the node correspondence for large-scale networks. First, the scale of the networks is a critical issue. As networks generally include a large number of nodes, we have to examine an extremely large space and it can pose a computational challenge due to the combinatorial nature of the problem. Furthermore, although there are matching nodes and conserved subnetworks in different networks, structural variations such as node insertions and deletions make it difficult to integrate a topological similarity. In this dissertation, novel probabilistic random walk models are proposed to accurately estimate node-to-node correspondence between networks. First, we propose a context-sensitive random walk (CSRW) model. In the CSRW model, the random walker analyzes the context of the current position of the random walker and it can switch the random movement to either a simultaneous walk on both networks or an individual walk on one of the networks. The context-sensitive nature of the random walker enables the method to effectively integrate different types of similarities by dealing with structural variations. Second, we propose the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) model. In the CUFID model, we construct an integrated network by inserting pseudo edges between potential matching nodes in different networks. Then, we design the random walk protocol to transit more frequently between potential matching nodes as their node similarity increases and they have more matching neighboring nodes. We apply the proposed random walk models to comparative network analysis problems: global network alignment and network querying. Through extensive performance evaluations, we demonstrate that the proposed random walk models can accurately estimate node correspondence and these can lead to improved and reliable network comparison results

    Mining Marked Nodes in Large Graphs

    Get PDF
    abstract: With the rise of the Big Data Era, an exponential amount of network data is being generated at an unprecedented rate across a wide-range of high impact micro and macro areas of research---from protein interaction to social networks. The critical challenge is translating this large scale network data into actionable information. A key task in the data translation is the analysis of network connectivity via marked nodes---the primary focus of our research. We have developed a framework for analyzing network connectivity via marked nodes in large scale graphs, utilizing novel algorithms in three interrelated areas: (1) analysis of a single seed node via it’s ego-centric network (AttriPart algorithm); (2) pathway identification between two seed nodes (K-Simple Shortest Paths Multithreaded and Search Reduced (KSSPR) algorithm); and (3) tree detection, defining the interaction between three or more seed nodes (Shortest Path MST algorithm). In an effort to address both fundamental and applied research issues, we have developed the LocalForcasting algorithm to explore how network connectivity analysis can be applied to local community evolution and recommender systems. The goal is to apply the LocalForecasting algorithm to various domains---e.g., friend suggestions in social networks or future collaboration in co-authorship networks. This algorithm utilizes link prediction in combination with the AttriPart algorithm to predict future connections in local graph partitions. Results show that our proposed AttriPart algorithm finds up to 1.6x denser local partitions, while running approximately 43x faster than traditional local partitioning techniques (PageRank-Nibble). In addition, our LocalForecasting algorithm demonstrates a significant improvement in the number of nodes and edges correctly predicted over baseline methods. Furthermore, results for the KSSPR algorithm demonstrate a speed-up of up to 2.5x the standard k-simple shortest paths algorithm.Dissertation/ThesisMasters Thesis Computer Science 201
    • …
    corecore