324 research outputs found

    Robust Densest Subgraph Discovery

    Full text link
    Dense subgraph discovery is an important primitive in graph mining, which has a wide variety of applications in diverse domains. In the densest subgraph problem, given an undirected graph G=(V,E)G=(V,E) with an edge-weight vector w=(we)eEw=(w_e)_{e\in E}, we aim to find SVS\subseteq V that maximizes the density, i.e., w(S)/Sw(S)/|S|, where w(S)w(S) is the sum of the weights of the edges in the subgraph induced by SS. Although the densest subgraph problem is one of the most well-studied optimization problems for dense subgraph discovery, there is an implicit strong assumption; it is assumed that the weights of all the edges are known exactly as input. In real-world applications, there are often cases where we have only uncertain information of the edge weights. In this study, we provide a framework for dense subgraph discovery under the uncertainty of edge weights. Specifically, we address such an uncertainty issue using the theory of robust optimization. First, we formulate our fundamental problem, the robust densest subgraph problem, and present a simple algorithm. We then formulate the robust densest subgraph problem with sampling oracle that models dense subgraph discovery using an edge-weight sampling oracle, and present an algorithm with a strong theoretical performance guarantee. Computational experiments using both synthetic graphs and popular real-world graphs demonstrate the effectiveness of our proposed algorithms.Comment: 10 pages; Accepted to ICDM 201

    Linear optimization over homogeneous matrix cones

    Full text link
    A convex cone is homogeneous if its automorphism group acts transitively on the interior of the cone, i.e., for every pair of points in the interior of the cone, there exists a cone automorphism that maps one point to the other. Cones that are homogeneous and self-dual are called symmetric. The symmetric cones include the positive semidefinite matrix cone and the second order cone as important practical examples. In this paper, we consider the less well-studied conic optimization problems over cones that are homogeneous but not necessarily self-dual. We start with cones of positive semidefinite symmetric matrices with a given sparsity pattern. Homogeneous cones in this class are characterized by nested block-arrow sparsity patterns, a subset of the chordal sparsity patterns. We describe transitive subsets of the automorphism groups of the cones and their duals, and important properties of the composition of log-det barrier functions with the automorphisms in this set. Next, we consider extensions to linear slices of the positive semidefinite cone, i.e., intersection of the positive semidefinite cone with a linear subspace, and review conditions that make the cone homogeneous. In the third part of the paper we give a high-level overview of the classical algebraic theory of homogeneous cones due to Vinberg and Rothaus. A fundamental consequence of this theory is that every homogeneous cone admits a spectrahedral (linear matrix inequality) representation. We conclude by discussing the role of homogeneous cone structure in primal-dual symmetric interior-point methods.Comment: 59 pages, 10 figures, to appear in Acta Numeric

    Correlation Clustering with Low-Rank Matrices

    Full text link
    Correlation clustering is a technique for aggregating data based on qualitative information about which pairs of objects are labeled 'similar' or 'dissimilar.' Because the optimization problem is NP-hard, much of the previous literature focuses on finding approximation algorithms. In this paper we explore how to solve the correlation clustering objective exactly when the data to be clustered can be represented by a low-rank matrix. We prove in particular that correlation clustering can be solved in polynomial time when the underlying matrix is positive semidefinite with small constant rank, but that the task remains NP-hard in the presence of even one negative eigenvalue. Based on our theoretical results, we develop an algorithm for efficiently "solving" low-rank positive semidefinite correlation clustering by employing a procedure for zonotope vertex enumeration. We demonstrate the effectiveness and speed of our algorithm by using it to solve several clustering problems on both synthetic and real-world data

    On prisms, M\"obius ladders and the cycle space of dense graphs

    Full text link
    For a graph X, let f_0(X) denote its number of vertices, d(X) its minimum degree and Z_1(X;Z/2) its cycle space in the standard graph-theoretical sense (i.e. 1-dimensional cycle group in the sense of simplicial homology theory with Z/2-coefficients). Call a graph Hamilton-generated if and only if the set of all Hamilton circuits is a Z/2-generating system for Z_1(X;Z/2). The main purpose of this paper is to prove the following: for every s > 0 there exists n_0 such that for every graph X with f_0(X) >= n_0 vertices, (1) if d(X) >= (1/2 + s) f_0(X) and f_0(X) is odd, then X is Hamilton-generated, (2) if d(X) >= (1/2 + s) f_0(X) and f_0(X) is even, then the set of all Hamilton circuits of X generates a codimension-one subspace of Z_1(X;Z/2), and the set of all circuits of X having length either f_0(X)-1 or f_0(X) generates all of Z_1(X;Z/2), (3) if d(X) >= (1/4 + s) f_0(X) and X is square bipartite, then X is Hamilton-generated. All these degree-conditions are essentially best-possible. The implications in (1) and (2) give an asymptotic affirmative answer to a special case of an open conjecture which according to [European J. Combin. 4 (1983), no. 3, p. 246] originates with A. Bondy.Comment: 33 pages; 5 figure

    Knowledge Graph Embedding: An Overview

    Full text link
    Many mathematical models have been leveraged to design embeddings for representing Knowledge Graph (KG) entities and relations for link prediction and many downstream tasks. These mathematically-inspired models are not only highly scalable for inference in large KGs, but also have many explainable advantages in modeling different relation patterns that can be validated through both formal proofs and empirical results. In this paper, we make a comprehensive overview of the current state of research in KG completion. In particular, we focus on two main branches of KG embedding (KGE) design: 1) distance-based methods and 2) semantic matching-based methods. We discover the connections between recently proposed models and present an underlying trend that might help researchers invent novel and more effective models. Next, we delve into CompoundE and CompoundE3D, which draw inspiration from 2D and 3D affine operations, respectively. They encompass a broad spectrum of techniques including distance-based and semantic-based methods. We will also discuss an emerging approach for KG completion which leverages pre-trained language models (PLMs) and textual descriptions of entities and relations and offer insights into the integration of KGE embedding methods with PLMs for KG completion

    Approximating Nash Equilibria and Dense Bipartite Subgraphs via an Approximate Version of Carathéodory's Theorem

    Get PDF
    We present algorithmic applications of an approximate version of Caratheodory's theorem. The theorem states that given a set of vectors X in R^d, for every vector in the convex hull of X there exists an ε-close (under the p-norm distance, for 2 ≤ p < ∞) vector that can be expressed as a convex combination of at most b vectors of X, where the bound b depends on ε and the norm p and is independent of the dimension d. This theorem can be derived by instantiating Maurey's lemma, early references to which can be found in the work of Pisier (1981) and Carl (1985). However, in this paper we present a self-contained proof of this result. Using this theorem we establish that in a bimatrix game with n x n payoff matrices A, B, if the number of non-zero entries in any column of A+B is at most s then an ε-Nash equilibrium of the game can be computed in time n^O(log s/ε^2}). This, in particular, gives us a polynomial-time approximation scheme for Nash equilibrium in games with fixed column sparsity s. Moreover, for arbitrary bimatrix games---since s can be at most n---the running time of our algorithm matches the best-known upper bound, which was obtained by Lipton, Markakis, and Mehta (2003). The approximate Carathéodory's theorem also leads to an additive approximation algorithm for the densest k-bipartite subgraph problem. Given a graph with n vertices and maximum degree d, the developed algorithm determines a k x k bipartite subgraph with density within ε (in the additive sense) of the optimal density in time n^O(log d/ε^2)