933 research outputs found

    Optimal Data Collection For Informative Rankings Expose Well-Connected Graphs

    Get PDF
    Given a graph where vertices represent alternatives and arcs represent pairwise comparison data, the statistical ranking problem is to find a potential function, defined on the vertices, such that the gradient of the potential function agrees with the pairwise comparisons. Our goal in this paper is to develop a method for collecting data for which the least squares estimator for the ranking problem has maximal Fisher information. Our approach, based on experimental design, is to view data collection as a bi-level optimization problem where the inner problem is the ranking problem and the outer problem is to identify data which maximizes the informativeness of the ranking. Under certain assumptions, the data collection problem decouples, reducing to a problem of finding multigraphs with large algebraic connectivity. This reduction of the data collection problem to graph-theoretic questions is one of the primary contributions of this work. As an application, we study the Yahoo! Movie user rating dataset and demonstrate that the addition of a small number of well-chosen pairwise comparisons can significantly increase the Fisher informativeness of the ranking. As another application, we study the 2011-12 NCAA football schedule and propose schedules with the same number of games which are significantly more informative. Using spectral clustering methods to identify highly-connected communities within the division, we argue that the NCAA could improve its notoriously poor rankings by simply scheduling more out-of-conference games.Comment: 31 pages, 10 figures, 3 table

    Edge Augmentation on Disconnected Graphs via Eigenvalue Elevation

    Full text link
    The graph-theoretical task of determining most likely inter-community edges based on disconnected subgraphs' intra-community connectivity is proposed. An algorithm is developed for this edge augmentation task, based on elevating the zero eigenvalues of graph's spectrum. Upper bounds for eigenvalue elevation amplitude and for the corresponding augmented edge density are derived and are authenticated with simulation on random graphs. The algorithm works consistently across synthetic and real networks, yielding desirable performance at connecting graph components. Edge augmentation reverse-engineers graph partition under different community detection methods (Girvan-Newman method, greedy modularity maximization, label propagation, Louvain method, and fluid community), in most cases producing inter-community edges at >50% frequency.Comment: 6 pages, 3 figure

    Polynomial Kernels for Weighted Problems

    Full text link
    Kernelization is a formalization of efficient preprocessing for NP-hard problems using the framework of parameterized complexity. Among open problems in kernelization it has been asked many times whether there are deterministic polynomial kernelizations for Subset Sum and Knapsack when parameterized by the number nn of items. We answer both questions affirmatively by using an algorithm for compressing numbers due to Frank and Tardos (Combinatorica 1987). This result had been first used by Marx and V\'egh (ICALP 2013) in the context of kernelization. We further illustrate its applicability by giving polynomial kernels also for weighted versions of several well-studied parameterized problems. Furthermore, when parameterized by the different item sizes we obtain a polynomial kernelization for Subset Sum and an exponential kernelization for Knapsack. Finally, we also obtain kernelization results for polynomial integer programs

    Attainable bounds for algebraic connectivity and maximally-connected regular graphs

    Full text link
    We derive attainable upper bounds on the algebraic connectivity (spectral gap) of a regular graph in terms of its diameter and girth. This bound agrees with the well-known Alon-Boppana-Friedman bound for graphs of even diameter, but is an improvement for graphs of odd diameter. For the girth bound, we show that only Moore graphs can attain it, and these only exist for very few possible girths. For diameter bound, we use a combination of stochastic algorithms and exhaustive search to find graphs which attain it. For 3-regular graphs, we find attainable graphs for all diameters DD up to and including D=9D=9 (the case of D=10D=10 is open). These graphs are extremely rare and also have high girth; for example we found exactly 45 distinct cubic graphs on 44 vertices attaining the upper bound when D=7D=7; all have girth 8 (out of a total of about 102010^{20} cubic graphs on 44 vertices, including 266362 having girth 8). We also exhibit families of dd-regular graphs attaining upper bounds with D=3D=3 and 44, and with g=6.g=6. Several conjectures are proposed

    Linear Programming in the Semi-streaming Model with Application to the Maximum Matching Problem

    Get PDF
    In this paper, we study linear programming based approaches to the maximum matching problem in the semi-streaming model. The semi-streaming model has gained attention as a model for processing massive graphs as the importance of such graphs has increased. This is a model where edges are streamed-in in an adversarial order and we are allowed a space proportional to the number of vertices in a graph. In recent years, there has been several new results in this semi-streaming model. However broad techniques such as linear programming have not been adapted to this model. We present several techniques to adapt and optimize linear programming based approaches in the semi-streaming model with an application to the maximum matching problem. As a consequence, we improve (almost) all previous results on this problem, and also prove new results on interesting variants

    An Empirical Analysis of Approximation Algorithms for the Unweighted Tree Augmentation Problem

    Get PDF
    In this thesis, we perform an experimental study of approximation algorithms for the tree augmentation problem (TAP). TAP is a fundamental problem in network design. The goal of TAP is to add the minimum number of edges from a given edge set to a tree so that it becomes 2-edge connected. Formally, given a tree T = (V, E), where V denotes the set of vertices and E denotes the set of edges in the tree, and a set of edges (or links) L ⊆ V × V disjoint from E, the objective is to find a set of edges to add to the tree F ⊆ L such that the augmented tree (V, E ∪ F) is 2-edge connected. Our goal is to establish a baseline performance for each approximation algorithm on actual instances rather than worst-case instances. In particular, we are interested in whether the algorithms rank on practical instances is consistent with their worst-case guarantee rankings. We are also interested in whether preprocessing times, implementation difficulties, and running times justify the use of an algorithm in practice. We profiled and analyzed five approximation algorithms, viz., the Frederickson algorithm, the Nagamochi algorithm, the Even algorithm, the Adjiashivili algorithm, and the Grandoni algorithm. Additionally, we used an integer program and a simple randomized algorithm as benchmarks. The performance of each algorithm was measured using space, time, and quality comparison metrics. We found that the simple randomized is competitive with the approximation algorithms and that the algorithms rank according to their theoretical guarantees. The randomized algorithm is simpler to implement and understand. Furthermore, the randomized algorithm runs faster and uses less space than any of the more sophisticated approximation algorithms

    Community detection and stochastic block models: recent developments

    Full text link
    The stochastic block model (SBM) is a random graph model with planted clusters. It is widely employed as a canonical model to study clustering and community detection, and provides generally a fertile ground to study the statistical and computational tradeoffs that arise in network and data sciences. This note surveys the recent developments that establish the fundamental limits for community detection in the SBM, both with respect to information-theoretic and computational thresholds, and for various recovery requirements such as exact, partial and weak recovery (a.k.a., detection). The main results discussed are the phase transitions for exact recovery at the Chernoff-Hellinger threshold, the phase transition for weak recovery at the Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial recovery, the learning of the SBM parameters and the gap between information-theoretic and computational thresholds. The note also covers some of the algorithms developed in the quest of achieving the limits, in particular two-round algorithms via graph-splitting, semi-definite programming, linearized belief propagation, classical and nonbacktracking spectral methods. A few open problems are also discussed

    Robust capacitated trees and networks with uniform demands

    Full text link
    We are interested in the design of robust (or resilient) capacitated rooted Steiner networks in case of terminals with uniform demands. Formally, we are given a graph, capacity and cost functions on the edges, a root, a subset of nodes called terminals, and a bound k on the number of edge failures. We first study the problem where k = 1 and the network that we want to design must be a tree covering the root and the terminals: we give complexity results and propose models to optimize both the cost of the tree and the number of terminals disconnected from the root in the worst case of an edge failure, while respecting the capacity constraints on the edges. Second, we consider the problem of computing a minimum-cost survivable network, i.e., a network that covers the root and terminals even after the removal of any k edges, while still respecting the capacity constraints on the edges. We also consider the possibility of protecting a given number of edges. We propose three different formulations: a cut-set based formulation, a flow based one, and a bilevel one (with an attacker and a defender). We propose algorithms to solve each formulation and compare their efficiency
    corecore