11 research outputs found

    Detecting and Characterizing Small Dense Bipartite-like Subgraphs by the Bipartiteness Ratio Measure

    Full text link
    We study the problem of finding and characterizing subgraphs with small \textit{bipartiteness ratio}. We give a bicriteria approximation algorithm \verb|SwpDB| such that if there exists a subset SS of volume at most kk and bipartiteness ratio θ\theta, then for any 0<ϵ<1/20<\epsilon<1/2, it finds a set SS' of volume at most 2k1+ϵ2k^{1+\epsilon} and bipartiteness ratio at most 4θ/ϵ4\sqrt{\theta/\epsilon}. By combining a truncation operation, we give a local algorithm \verb|LocDB|, which has asymptotically the same approximation guarantee as the algorithm \verb|SwpDB| on both the volume and bipartiteness ratio of the output set, and runs in time O(ϵ2θ2k1+ϵln3k)O(\epsilon^2\theta^{-2}k^{1+\epsilon}\ln^3k), independent of the size of the graph. Finally, we give a spectral characterization of the small dense bipartite-like subgraphs by using the kkth \textit{largest} eigenvalue of the Laplacian of the graph.Comment: 17 pages; ISAAC 201

    A Divide-and-Conquer Algorithm for Betweenness Centrality

    Full text link
    The problem of efficiently computing the betweenness centrality of nodes has been researched extensively. To date, the best known exact and centralized algorithm for this task is an algorithm proposed in 2001 by Brandes. The contribution of our paper is Brandes++, an algorithm for exact efficient computation of betweenness centrality. The crux of our algorithm is that we create a sketch of the graph, that we call the skeleton, by replacing subgraphs with simpler graph structures. Depending on the underlying graph structure, using this skeleton and by keeping appropriate summaries Brandes++ we can achieve significantly low running times in our computations. Extensive experimental evaluation on real life datasets demonstrate the efficacy of our algorithm for different types of graphs. We release our code for benefit of the research community.Comment: Shorter version of this paper appeared in Siam Data Mining 201

    On the Probe Complexity of Local Computation Algorithms

    Get PDF
    In the Local Computation Algorithms (LCA) model, the algorithm is asked to compute a part of the output by reading as little as possible from the input. For example, an LCA for coloring a graph is given a vertex name (as a "query"), and it should output the color assigned to that vertex after inquiring about some part of the graph topology using "probes"; all outputs must be consistent with the same coloring. LCAs are useful when the input is huge, and the output as a whole is not needed simultaneously. Most previous work on LCAs was limited to bounded-degree graphs, which seems inevitable because probes are of the form "what vertex is at the other end of edge i of vertex v?". In this work we study LCAs for unbounded-degree graphs. In particular, such LCAs are expected to probe the graph a number of times that is significantly smaller than the maximum, average, or even minimum degree. We show that there are problems that have very efficient LCAs on any graph - specifically, we show that there is an LCA for the weak coloring problem (where a coloring is legal if every vertex has a neighbor with a different color) that uses log^* n+O(1) probes to reply to any query. As another way of dealing with large degrees, we propose a more powerful type of probe which we call a strong probe: given a vertex name, it returns a list of its neighbors. Lower bounds for strong probes are stronger than ones in the edge probe model (which we call weak probes). Our main result in this model is that roughly Omega(sqrt{n}) strong probes are required to compute a maximal matching. Our findings include interesting separations between closely related problems. For weak probes, we show that while weak 3-coloring can be done with probe complexity log^* n+O(1), weak 2-coloring has probe complexity Omega(log n/log log n). For strong probes, our negative result for maximal matching is complemented by an LCA for (1-epsilon)-approximate maximum matching on regular graphs that uses O(1) strong probes, for any constant epsilon>0

    Extracting large quasi-bicliques using a skeleton-based heuristic

    Get PDF
    One important computational problem is that of mining quasi bicliques from bipartite graphs. It is important because it has an almost endless number of applications and, in most real world problems, is more appropriate than the mining of bicliques. In my thesis I examine the following: the motivation for quasi bicliques, the existing literature for quasi bicliques, my implementation of a web application that allows the user to compute exact quasi biclique solutions using the biclique formulation and the exact solution algorithm provided by Chang et al.[1], and finally a polynomial heuristic algorithm for finding large quasi bicliques in the special case where we have all the biclique subgraphs of a bipartite graph available

    Centrality measures and analyzing dot-product graphs

    Full text link
    In this thesis we investigate two topics in data mining on graphs; in the first part we investigate the notion of centrality in graphs, in the second part we look at reconstructing graphs from aggregate information. In many graph related problems the goal is to rank nodes based on an importance score. This score is in general referred to as node centrality. In Part I. we start by giving a novel and more efficient algorithm for computing betweenness centrality. In many applications not an individual node but rather a set of nodes is chosen to perform some task. We generalize the notion of centrality to groups of nodes. While group centrality was first formally defined by Everett and Borgatti (1999), we are the first to pose it as a combinatorial optimization problem; find a group of k nodes with largest centrality. We give an algorithm for solving this optimization problem for a general notion of centrality that subsumes various instantiations of centrality that find paths in the graph. We prove that this problem is NP-hard for specific centrality definitions and we provide a universal algorithm for this problem that can be modified to optimize the specific measures. We also investigate the problem of increasing node centrality by adding or deleting edges in the graph. We conclude this part by solving the optimization problem for two specific applications; one for minimizing redundancy in information propagation networks and one for optimizing the expected number of interceptions of a group in a random navigational network. In the second part of the thesis we investigate what we can infer about a bipartite graph if only some aggregate information -- the number of common neighbors among each pair of nodes -- is given. First, we observe that the given data is equivalent to the dot-product of the adjacency vectors of each node. Based on this knowledge we develop an algorithm that is based on SVD-decomposition, that is capable of almost perfectly reconstructing graphs from such neighborhood data. We investigate two versions of this problem, in the versions the dot-product of nodes with themselves, e.g. the node degrees, are either known or hidden

    Graph Processing on GPU

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    On learning the structure of clusters in graphs

    Get PDF
    Graph clustering is a fundamental problem in unsupervised learning, with numerous applications in computer science and in analysing real-world data. In many real-world applications, we find that the clusters have a significant high-level structure. This is often overlooked in the design and analysis of graph clustering algorithms which make strong simplifying assumptions about the structure of the graph. This thesis addresses the natural question of whether the structure of clusters can be learned efficiently and describes four new algorithmic results for learning such structure in graphs and hypergraphs. The first part of the thesis studies the classical spectral clustering algorithm, and presents a tighter analysis on its performance. This result explains why it works under a much weaker and more natural condition than the ones studied in the literature, and helps to close the gap between the theoretical guarantees of the spectral clustering algorithm and its excellent empirical performance. The second part of the thesis builds on the theoretical guarantees of the previous part and shows that, when the clusters of the underlying graph have certain structures, spectral clustering with fewer than k eigenvectors is able to produce better output than classical spectral clustering in which k eigenvectors are employed, where k is the number of clusters. This presents the first work that discusses and analyses the performance of spectral clustering with fewer than k eigenvectors, and shows that general structures of clusters can be learned with spectral methods. The third part of the thesis considers efficient learning of the structure of clusters with local algorithms, whose runtime depends only on the size of the target clusters and is independent of the underlying input graph. While the objective of classical local clustering algorithms is to find a cluster which is sparsely connected to the rest of the graph, this part of the thesis presents a local algorithm that finds a pair of clusters which are densely connected to each other. This result demonstrates that certain structures of clusters can be learned efficiently in the local setting, even in the massive graphs which are ubiquitous in real-world applications. The final part of the thesis studies the problem of learning densely connected clusters in hypergraphs. The developed algorithm is based on a new heat diffusion process, whose analysis extends a sequence of recent work on the spectral theory of hypergraphs. It allows the structure of clusters to be learned in datasets modelling higher-order relations of objects and can be applied to efficiently analyse many complex datasets occurring in practice. All of the presented theoretical results are further extensively evaluated on both synthetic and real-word datasets of different domains, including image classification and segmentation, migration networks, co-authorship networks, and natural language processing. These experimental results demonstrate that the newly developed algorithms are practical, effective, and immediately applicable for learning the structure of clusters in real-world data

    Incremental and parallel algorithms for dense subgraph mining

    Get PDF
    The task of maintaining densely connected subgraphs from a continuously evolving graph is important because it solves many practical problems that require constant monitoring over the continuous stream of linked data often represented as a graph. For example, continuous maintenance of a certain group of closely connected nodes can reveal unusual activity over the transaction network, identification, and evolution of active groups in the social network, etc. On the other hand, mining these structures from graph data is often expensive because of the complexity of the computation and the volume of the structures (the number of densely connected structures can be of exponential order on the number of vertices in the graph). One way to deal with the expensive computations is to consider parallel computation. In this thesis, we advance the state of the art by developing provably efficient algorithms for mining maximal cliques and maximal bicliques; two fundamental dense structures. First, we consider the design of efficient algorithms for the maintenance of maximal cliques and maximal bicliques in an evolving network. We observe that it is important to locate the region of the graph in the event of the update so that we can maintain the structures by computing the changes exactly where it is located. Following this observation, we design efficient techniques that find appropriate subgraphs for identifying the changes in the structures. We prove that our algorithms can maintain dense structures efficiently. More specifically, we show that our algorithms can quickly compute the changes when it is small irrespective of the size of the graph. We empirically evaluate our algorithms and show that our algorithms significantly outperform the state of the art algorithms. Next, we consider parallel computation for efficient utilization of the multiple cores in a multi-core computing system so that the expensive mining tasks can be eased off and we can achieve better speedup than their efficient sequential counterparts. We design shared memory parallel algorithms for the mining of maximal cliques and maximal bicliques and we prove the efficiency of the parallel algorithms through showing that the total work performed by the parallel algorithm is equivalent to the time complexity of the best sequential algorithm for doing the same task. Our experimental study shows that we achieve good speedup over the prior state of the art parallel algorithms and significant speedup over the state of the art sequential algorithms. We also show that our parallel algorithms scale almost linearly with the increase in the processor cores

    A local algorithm for finding dense subgraphs

    No full text
    corecore