933 research outputs found
Optimal Data Collection For Informative Rankings Expose Well-Connected Graphs
Given a graph where vertices represent alternatives and arcs represent
pairwise comparison data, the statistical ranking problem is to find a
potential function, defined on the vertices, such that the gradient of the
potential function agrees with the pairwise comparisons. Our goal in this paper
is to develop a method for collecting data for which the least squares
estimator for the ranking problem has maximal Fisher information. Our approach,
based on experimental design, is to view data collection as a bi-level
optimization problem where the inner problem is the ranking problem and the
outer problem is to identify data which maximizes the informativeness of the
ranking. Under certain assumptions, the data collection problem decouples,
reducing to a problem of finding multigraphs with large algebraic connectivity.
This reduction of the data collection problem to graph-theoretic questions is
one of the primary contributions of this work. As an application, we study the
Yahoo! Movie user rating dataset and demonstrate that the addition of a small
number of well-chosen pairwise comparisons can significantly increase the
Fisher informativeness of the ranking. As another application, we study the
2011-12 NCAA football schedule and propose schedules with the same number of
games which are significantly more informative. Using spectral clustering
methods to identify highly-connected communities within the division, we argue
that the NCAA could improve its notoriously poor rankings by simply scheduling
more out-of-conference games.Comment: 31 pages, 10 figures, 3 table
Edge Augmentation on Disconnected Graphs via Eigenvalue Elevation
The graph-theoretical task of determining most likely inter-community edges
based on disconnected subgraphs' intra-community connectivity is proposed. An
algorithm is developed for this edge augmentation task, based on elevating the
zero eigenvalues of graph's spectrum. Upper bounds for eigenvalue elevation
amplitude and for the corresponding augmented edge density are derived and are
authenticated with simulation on random graphs. The algorithm works
consistently across synthetic and real networks, yielding desirable performance
at connecting graph components. Edge augmentation reverse-engineers graph
partition under different community detection methods (Girvan-Newman method,
greedy modularity maximization, label propagation, Louvain method, and fluid
community), in most cases producing inter-community edges at >50% frequency.Comment: 6 pages, 3 figure
Polynomial Kernels for Weighted Problems
Kernelization is a formalization of efficient preprocessing for NP-hard
problems using the framework of parameterized complexity. Among open problems
in kernelization it has been asked many times whether there are deterministic
polynomial kernelizations for Subset Sum and Knapsack when parameterized by the
number of items.
We answer both questions affirmatively by using an algorithm for compressing
numbers due to Frank and Tardos (Combinatorica 1987). This result had been
first used by Marx and V\'egh (ICALP 2013) in the context of kernelization. We
further illustrate its applicability by giving polynomial kernels also for
weighted versions of several well-studied parameterized problems. Furthermore,
when parameterized by the different item sizes we obtain a polynomial
kernelization for Subset Sum and an exponential kernelization for Knapsack.
Finally, we also obtain kernelization results for polynomial integer programs
Attainable bounds for algebraic connectivity and maximally-connected regular graphs
We derive attainable upper bounds on the algebraic connectivity (spectral
gap) of a regular graph in terms of its diameter and girth. This bound agrees
with the well-known Alon-Boppana-Friedman bound for graphs of even diameter,
but is an improvement for graphs of odd diameter. For the girth bound, we show
that only Moore graphs can attain it, and these only exist for very few
possible girths. For diameter bound, we use a combination of stochastic
algorithms and exhaustive search to find graphs which attain it. For 3-regular
graphs, we find attainable graphs for all diameters up to and including
(the case of is open). These graphs are extremely rare and also
have high girth; for example we found exactly 45 distinct cubic graphs on 44
vertices attaining the upper bound when ; all have girth 8 (out of a total
of about cubic graphs on 44 vertices, including 266362 having girth
8). We also exhibit families of -regular graphs attaining upper bounds with
and , and with Several conjectures are proposed
Linear Programming in the Semi-streaming Model with Application to the Maximum Matching Problem
In this paper, we study linear programming based approaches to the maximum
matching problem in the semi-streaming model. The semi-streaming model has
gained attention as a model for processing massive graphs as the importance of
such graphs has increased. This is a model where edges are streamed-in in an
adversarial order and we are allowed a space proportional to the number of
vertices in a graph.
In recent years, there has been several new results in this semi-streaming
model. However broad techniques such as linear programming have not been
adapted to this model. We present several techniques to adapt and optimize
linear programming based approaches in the semi-streaming model with an
application to the maximum matching problem. As a consequence, we improve
(almost) all previous results on this problem, and also prove new results on
interesting variants
An Empirical Analysis of Approximation Algorithms for the Unweighted Tree Augmentation Problem
In this thesis, we perform an experimental study of approximation algorithms for the tree augmentation problem (TAP). TAP is a fundamental problem in network design. The goal of TAP is to add the minimum number of edges from a given edge set to a tree so that it becomes 2-edge connected. Formally, given a tree T = (V, E), where V denotes the set of vertices and E denotes the set of edges in the tree, and a set of edges (or links) L ⊆ V × V disjoint from E, the objective is to find a set of edges to add to the tree F ⊆ L such that the augmented tree (V, E ∪ F) is 2-edge connected. Our goal is to establish a baseline performance for each approximation algorithm on actual instances rather than worst-case instances. In particular, we are interested in whether the algorithms rank on practical instances is consistent with their worst-case guarantee rankings. We are also interested in whether preprocessing times, implementation difficulties, and running times justify the use of an algorithm in practice. We profiled and analyzed five approximation algorithms, viz., the Frederickson algorithm, the Nagamochi algorithm, the Even algorithm, the Adjiashivili algorithm, and the Grandoni algorithm. Additionally, we used an integer program and a simple randomized algorithm as benchmarks. The performance of each algorithm was measured using space, time, and quality comparison metrics. We found that the simple randomized is competitive with the approximation algorithms and that the algorithms rank according to their theoretical guarantees. The randomized algorithm is simpler to implement and understand. Furthermore, the randomized algorithm runs faster and uses less space than any of the more sophisticated approximation algorithms
Community detection and stochastic block models: recent developments
The stochastic block model (SBM) is a random graph model with planted
clusters. It is widely employed as a canonical model to study clustering and
community detection, and provides generally a fertile ground to study the
statistical and computational tradeoffs that arise in network and data
sciences.
This note surveys the recent developments that establish the fundamental
limits for community detection in the SBM, both with respect to
information-theoretic and computational thresholds, and for various recovery
requirements such as exact, partial and weak recovery (a.k.a., detection). The
main results discussed are the phase transitions for exact recovery at the
Chernoff-Hellinger threshold, the phase transition for weak recovery at the
Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial
recovery, the learning of the SBM parameters and the gap between
information-theoretic and computational thresholds.
The note also covers some of the algorithms developed in the quest of
achieving the limits, in particular two-round algorithms via graph-splitting,
semi-definite programming, linearized belief propagation, classical and
nonbacktracking spectral methods. A few open problems are also discussed
Robust capacitated trees and networks with uniform demands
We are interested in the design of robust (or resilient) capacitated rooted
Steiner networks in case of terminals with uniform demands. Formally, we are
given a graph, capacity and cost functions on the edges, a root, a subset of
nodes called terminals, and a bound k on the number of edge failures. We first
study the problem where k = 1 and the network that we want to design must be a
tree covering the root and the terminals: we give complexity results and
propose models to optimize both the cost of the tree and the number of
terminals disconnected from the root in the worst case of an edge failure,
while respecting the capacity constraints on the edges. Second, we consider the
problem of computing a minimum-cost survivable network, i.e., a network that
covers the root and terminals even after the removal of any k edges, while
still respecting the capacity constraints on the edges. We also consider the
possibility of protecting a given number of edges. We propose three different
formulations: a cut-set based formulation, a flow based one, and a bilevel one
(with an attacker and a defender). We propose algorithms to solve each
formulation and compare their efficiency
- …