673 research outputs found

    Cohesive subgraph identification in large graphs

    Get PDF
    Graph data is ubiquitous in real world applications, as the relationship among entities in the applications can be naturally captured by the graph model. Finding cohesive subgraphs is a fundamental problem in graph mining with diverse applications. Given the important roles of cohesive subgraphs, this thesis focuses on cohesive subgraph identification in large graphs. Firstly, we study the size-bounded community search problem that aims to find a subgraph with the largest min-degree among all connected subgraphs that contain the query vertex q and have at least l and at most h vertices, where q, l, h are specified by the query. As the problem is NP-hard, we propose a branch-reduce-and-bound algorithm SC-BRB by developing nontrivial reducing techniques, upper bounding techniques, and branching techniques. Secondly, we formulate the notion of similar-biclique in bipartite graphs which is a special kind of biclique where all vertices from a designated side are similar to each other, and aim to enumerate all maximal similar-bicliques. We propose a backtracking algorithm MSBE to directly enumerate maximal similar-bicliques, and power it by vertex reduction and optimization techniques. In addition, we design a novel index structure to speed up a time-critical operation of MSBE, as well as to speed up vertex reduction. Efficient index construction algorithms are developed. Thirdly, we consider balanced cliques in signed graphs --- a clique is balanced if its vertex set can be partitioned into CL and CR such that all negative edges are between CL and CR --- and study the problem of maximum balanced clique computation. We propose techniques to transform the maximum balanced clique problem over G to a series of maximum dichromatic clique problems over small subgraphs of G. The transformation not only removes edge signs but also sparsifies the edge set

    A Unifying Model of Genome Evolution Under Parsimony

    Get PDF
    We present a data structure called a history graph that offers a practical basis for the analysis of genome evolution. It conceptually simplifies the study of parsimonious evolutionary histories by representing both substitutions and double cut and join (DCJ) rearrangements in the presence of duplications. The problem of constructing parsimonious history graphs thus subsumes related maximum parsimony problems in the fields of phylogenetic reconstruction and genome rearrangement. We show that tractable functions can be used to define upper and lower bounds on the minimum number of substitutions and DCJ rearrangements needed to explain any history graph. These bounds become tight for a special type of unambiguous history graph called an ancestral variation graph (AVG), which constrains in its combinatorial structure the number of operations required. We finally demonstrate that for a given history graph GG, a finite set of AVGs describe all parsimonious interpretations of GG, and this set can be explored with a few sampling moves.Comment: 52 pages, 24 figure

    Quantum Algorithm for Maximum Biclique Problem

    Full text link
    Identifying a biclique with the maximum number of edges bears considerable implications for numerous fields of application, such as detecting anomalies in E-commerce transactions, discerning protein-protein interactions in biology, and refining the efficacy of social network recommendation algorithms. However, the inherent NP-hardness of this problem significantly complicates the matter. The prohibitive time complexity of existing algorithms is the primary bottleneck constraining the application scenarios. Aiming to address this challenge, we present an unprecedented exploration of a quantum computing approach. Efficient quantum algorithms, as a crucial future direction for handling NP-hard problems, are presently under intensive investigation, of which the potential has already been proven in practical arenas such as cybersecurity. However, in the field of quantum algorithms for graph databases, little work has been done due to the challenges presented by the quantum representation of complex graph topologies. In this study, we delve into the intricacies of encoding a bipartite graph on a quantum computer. Given a bipartite graph with n vertices, we propose a ground-breaking algorithm qMBS with time complexity O^*(2^(n/2)), illustrating a quadratic speed-up in terms of complexity compared to the state-of-the-art. Furthermore, we detail two variants tailored for the maximum vertex biclique problem and the maximum balanced biclique problem. To corroborate the practical performance and efficacy of our proposed algorithms, we have conducted proof-of-principle experiments utilizing IBM quantum simulators, of which the results provide a substantial validation of our approach to the extent possible to date

    Compression via Matroids: A Randomized Polynomial Kernel for Odd Cycle Transversal

    Full text link
    The Odd Cycle Transversal problem (OCT) asks whether a given graph can be made bipartite by deleting at most kk of its vertices. In a breakthrough result Reed, Smith, and Vetta (Operations Research Letters, 2004) gave a \BigOh(4^kkmn) time algorithm for it, the first algorithm with polynomial runtime of uniform degree for every fixed kk. It is known that this implies a polynomial-time compression algorithm that turns OCT instances into equivalent instances of size at most \BigOh(4^k), a so-called kernelization. Since then the existence of a polynomial kernel for OCT, i.e., a kernelization with size bounded polynomially in kk, has turned into one of the main open questions in the study of kernelization. This work provides the first (randomized) polynomial kernelization for OCT. We introduce a novel kernelization approach based on matroid theory, where we encode all relevant information about a problem instance into a matroid with a representation of size polynomial in kk. For OCT, the matroid is built to allow us to simulate the computation of the iterative compression step of the algorithm of Reed, Smith, and Vetta, applied (for only one round) to an approximate odd cycle transversal which it is aiming to shrink to size kk. The process is randomized with one-sided error exponentially small in kk, where the result can contain false positives but no false negatives, and the size guarantee is cubic in the size of the approximate solution. Combined with an \BigOh(\sqrt{\log n})-approximation (Agarwal et al., STOC 2005), we get a reduction of the instance to size \BigOh(k^{4.5}), implying a randomized polynomial kernelization.Comment: Minor changes to agree with SODA 2012 version of the pape

    Casting Light on the Hidden Bilevel Combinatorial Structure of the Capacitated Vertex Separator Problem

    Get PDF
    Given an undirected graph, we study the capacitated vertex separator problem that asks to find a subset of vertices of minimum cardinality, the removal of which induces a graph having a bounded number of pairwise disconnected shores (subsets of vertices) of limited cardinality. The problem is of great importance in the analysis and protection of communication or social networks against possible viral attacks and for matrix decomposition algorithms. In this article, we provide a new bilevel interpretation of the problem and model it as a two-player Stackelberg game in which the leader interdicts the vertices (i.e., decides on the subset of vertices to remove), and the follower solves a combinatorial optimization problem on the resulting graph. This approach allows us to develop a computational framework based on an integer programming formulation in the natural space of the variables. Thanks to this bilevel interpretation, we derive three different families of strengthening inequalities and show that they can be separated in polynomial time. We also show how to extend these results to a min-max version of the problem. Our extensive computational study conducted on available benchmark instances from the literature reveals that our new exact method is competitive against the state-of-the-art algorithms for the capacitated vertex separator problem and is able to improve the best-known results for several difficult classes of instances. The ideas exploited in our framework can also be extended to other vertex/edge deletion/ insertion problems or graph partitioning problems by modeling them as two-player Stackel- berg games and solving them through bilevel optimization

    Enabling Scalability: Graph Hierarchies and Fault Tolerance

    Get PDF
    In this dissertation, we explore approaches to two techniques for building scalable algorithms. First, we look at different graph problems. We show how to exploit the input graph\u27s inherent hierarchy for scalable graph algorithms. The second technique takes a step back from concrete algorithmic problems. Here, we consider the case of node failures in large distributed systems and present techniques to quickly recover from these. In the first part of the dissertation, we investigate how hierarchies in graphs can be used to scale algorithms to large inputs. We develop algorithms for three graph problems based on two approaches to build hierarchies. The first approach reduces instance sizes for NP-hard problems by applying so-called reduction rules. These rules can be applied in polynomial time. They either find parts of the input that can be solved in polynomial time, or they identify structures that can be contracted (reduced) into smaller structures without loss of information for the specific problem. After solving the reduced instance using an exponential-time algorithm, these previously contracted structures can be uncontracted to obtain an exact solution for the original input. In addition to a simple preprocessing procedure, reduction rules can also be used in branch-and-reduce algorithms where they are successively applied after each branching step to build a hierarchy of problem kernels of increasing computational hardness. We develop reduction-based algorithms for the classical NP-hard problems Maximum Independent Set and Maximum Cut. The second approach is used for route planning in road networks where we build a hierarchy of road segments based on their importance for long distance shortest paths. By only considering important road segments when we are far away from the source and destination, we can substantially speed up shortest path queries. In the second part of this dissertation, we take a step back from concrete graph problems and look at more general problems in high performance computing (HPC). Here, due to the ever increasing size and complexity of HPC clusters, we expect hardware and software failures to become more common in massively parallel computations. We present two techniques for applications to recover from failures and resume computation. Both techniques are based on in-memory storage of redundant information and a data distribution that enables fast recovery. The first technique can be used for general purpose distributed processing frameworks: We identify data that is redundantly available on multiple machines and only introduce additional work for the remaining data that is only available on one machine. The second technique is a checkpointing library engineered for fast recovery using a data distribution method that achieves balanced communication loads. Both our techniques have in common that they work in settings where computation after a failure is continued with less machines than before. This is in contrast to many previous approaches that---in particular for checkpointing---focus on systems that keep spare resources available to replace failed machines. Overall, we present different techniques that enable scalable algorithms. While some of these techniques are specific to graph problems, we also present tools for fault tolerant algorithms and applications in a distributed setting. To show that those can be helpful in many different domains, we evaluate them for graph problems and other applications like phylogenetic tree inference
    corecore