1,271 research outputs found
The quadratic assignment problem is easy for Robinsonian matrices with Toeplitz structure
We present a new polynomially solvable case of the Quadratic Assignment
Problem in Koopmans-Beckman form , by showing that the identity
permutation is optimal when and are respectively a Robinson similarity
and dissimilarity matrix and one of or is a Toeplitz matrix. A Robinson
(dis)similarity matrix is a symmetric matrix whose entries (increase) decrease
monotonically along rows and columns when moving away from the diagonal, and
such matrices arise in the classical seriation problem.Comment: 15 pages, 2 figure
Convex Relaxations for Permutation Problems
Seriation seeks to reconstruct a linear order between variables using
unsorted, pairwise similarity information. It has direct applications in
archeology and shotgun gene sequencing for example. We write seriation as an
optimization problem by proving the equivalence between the seriation and
combinatorial 2-SUM problems on similarity matrices (2-SUM is a quadratic
minimization problem over permutations). The seriation problem can be solved
exactly by a spectral algorithm in the noiseless case and we derive several
convex relaxations for 2-SUM to improve the robustness of seriation solutions
in noisy settings. These convex relaxations also allow us to impose structural
constraints on the solution, hence solve semi-supervised seriation problems. We
derive new approximation bounds for some of these relaxations and present
numerical experiments on archeological data, Markov chains and DNA assembly
from shotgun gene sequencing data.Comment: Final journal version, a few typos and references fixe
Community detection and stochastic block models: recent developments
The stochastic block model (SBM) is a random graph model with planted
clusters. It is widely employed as a canonical model to study clustering and
community detection, and provides generally a fertile ground to study the
statistical and computational tradeoffs that arise in network and data
sciences.
This note surveys the recent developments that establish the fundamental
limits for community detection in the SBM, both with respect to
information-theoretic and computational thresholds, and for various recovery
requirements such as exact, partial and weak recovery (a.k.a., detection). The
main results discussed are the phase transitions for exact recovery at the
Chernoff-Hellinger threshold, the phase transition for weak recovery at the
Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial
recovery, the learning of the SBM parameters and the gap between
information-theoretic and computational thresholds.
The note also covers some of the algorithms developed in the quest of
achieving the limits, in particular two-round algorithms via graph-splitting,
semi-definite programming, linearized belief propagation, classical and
nonbacktracking spectral methods. A few open problems are also discussed
A New Measure for Analyzing and Fusing Sequences of Objects
This work is related to the combinatorial data analysis problem of seriation used for data visualization and exploratory analysis. Seriation re-sequences the data, so that more similar samples or objects appear closer together, whereas dissimilar ones are further apart. Despite the large number of current algorithms to realize such re-sequencing, there has not been a systematic way for analyzing the resulting sequences, comparing them, or fusing them to obtain a single unifying one. We propose a new positional proximity measure that evaluates the similarity of two arbitrary sequences based on their agreement on pairwise positional information of the sequenced objects. Furthermore, we present various statistical properties of this measure as well as its normalized version modeled as an instance of the generalized correlation coefficient. Based on this measure, we define a new procedure for consensus seriation that fuses multiple arbitrary sequences based on a quadratic assignment problem formulation and an efficient way of approximating its solution. We also derive theoretical links with other permutation distance functions and present their associated combinatorial optimization forms for consensus tasks. The utility of the proposed contributions is demonstrated through the comparison and fusion of multiple seriation algorithms we have implemented, using many real-world datasets from different application domains
Finding community structure using the ordered random graph model
Visualization of the adjacency matrix enables us to capture macroscopic
features of a network when the matrix elements are aligned properly. Community
structure, a network consisting of several densely connected components, is a
particularly important feature, and the structure can be identified through the
adjacency matrix when it is close to a block-diagonal form. However, classical
ordering algorithms for matrices fail to align matrix elements such that the
community structure is visible. In this study, we propose an ordering algorithm
based on the maximum-likelihood estimate of the ordered random graph model. We
show that the proposed method allows us to more clearly identify community
structures than the existing ordering algorithms.Comment: 14 pages, 12 figure
- …