2,736 research outputs found
Spectral Clustering with Imbalanced Data
Spectral clustering is sensitive to how graphs are constructed from data
particularly when proximal and imbalanced clusters are present. We show that
Ratio-Cut (RCut) or normalized cut (NCut) objectives are not tailored to
imbalanced data since they tend to emphasize cut sizes over cut values. We
propose a graph partitioning problem that seeks minimum cut partitions under
minimum size constraints on partitions to deal with imbalanced data. Our
approach parameterizes a family of graphs, by adaptively modulating node
degrees on a fixed node set, to yield a set of parameter dependent cuts
reflecting varying levels of imbalance. The solution to our problem is then
obtained by optimizing over these parameters. We present rigorous limit cut
analysis results to justify our approach. We demonstrate the superiority of our
method through unsupervised and semi-supervised experiments on synthetic and
real data sets.Comment: 24 pages, 7 figures. arXiv admin note: substantial text overlap with
arXiv:1302.513
Clustering and Community Detection with Imbalanced Clusters
Spectral clustering methods which are frequently used in clustering and
community detection applications are sensitive to the specific graph
constructions particularly when imbalanced clusters are present. We show that
ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to
imbalanced cluster sizes since they tend to emphasize cut sizes over cut
values. We propose a graph partitioning problem that seeks minimum cut
partitions under minimum size constraints on partitions to deal with imbalanced
cluster sizes. Our approach parameterizes a family of graphs by adaptively
modulating node degrees on a fixed node set, yielding a set of parameter
dependent cuts reflecting varying levels of imbalance. The solution to our
problem is then obtained by optimizing over these parameters. We present
rigorous limit cut analysis results to justify our approach and demonstrate the
superiority of our method through experiments on synthetic and real datasets
for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted
to IEEE TSIP
Parameterized Compilation Lower Bounds for Restricted CNF-formulas
We show unconditional parameterized lower bounds in the area of knowledge
compilation, more specifically on the size of circuits in decomposable negation
normal form (DNNF) that encode CNF-formulas restricted by several graph width
measures. In particular, we show that
- there are CNF formulas of size and modular incidence treewidth
whose smallest DNNF-encoding has size , and
- there are CNF formulas of size and incidence neighborhood diversity
whose smallest DNNF-encoding has size .
These results complement recent upper bounds for compiling CNF into DNNF and
strengthen---quantitatively and qualitatively---known conditional low\-er
bounds for cliquewidth. Moreover, they show that, unlike for many graph
problems, the parameters considered here behave significantly differently from
treewidth
Hyperbolic intersection graphs and (quasi)-polynomial time
We study unit ball graphs (and, more generally, so-called noisy uniform ball
graphs) in -dimensional hyperbolic space, which we denote by .
Using a new separator theorem, we show that unit ball graphs in
enjoy similar properties as their Euclidean counterparts, but in one dimension
lower: many standard graph problems, such as Independent Set, Dominating Set,
Steiner Tree, and Hamiltonian Cycle can be solved in
time for any fixed , while the same problems need
time in . We also show that these algorithms in
are optimal up to constant factors in the exponent under ETH.
This drop in dimension has the largest impact in , where we
introduce a new technique to bound the treewidth of noisy uniform disk graphs.
The bounds yield quasi-polynomial () algorithms for all of the
studied problems, while in the case of Hamiltonian Cycle and -Coloring we
even get polynomial time algorithms. Furthermore, if the underlying noisy disks
in have constant maximum degree, then all studied problems can
be solved in polynomial time. This contrasts with the fact that these problems
require time under ETH in constant maximum degree
Euclidean unit disk graphs.
Finally, we complement our quasi-polynomial algorithm for Independent Set in
noisy uniform disk graphs with a matching lower bound
under ETH. This shows that the hyperbolic plane is a potential source of
NP-intermediate problems.Comment: Short version appears in SODA 202
Network-Based Vertex Dissolution
We introduce a graph-theoretic vertex dissolution model that applies to a
number of redistribution scenarios such as gerrymandering in political
districting or work balancing in an online situation. The central aspect of our
model is the deletion of certain vertices and the redistribution of their load
to neighboring vertices in a completely balanced way.
We investigate how the underlying graph structure, the knowledge of which
vertices should be deleted, and the relation between old and new vertex loads
influence the computational complexity of the underlying graph problems. Our
results establish a clear borderline between tractable and intractable cases.Comment: Version accepted at SIAM Journal on Discrete Mathematic
- …