82,129 research outputs found
Local Guarantees in Graph Cuts and Clustering
Correlation Clustering is an elegant model that captures fundamental graph
cut problems such as Min Cut, Multiway Cut, and Multicut, extensively
studied in combinatorial optimization. Here, we are given a graph with edges
labeled or and the goal is to produce a clustering that agrees with the
labels as much as possible: edges within clusters and edges across
clusters. The classical approach towards Correlation Clustering (and other
graph cut problems) is to optimize a global objective. We depart from this and
study local objectives: minimizing the maximum number of disagreements for
edges incident on a single node, and the analogous max min agreements
objective. This naturally gives rise to a family of basic min-max graph cut
problems. A prototypical representative is Min Max Cut: find an cut
minimizing the largest number of cut edges incident on any node. We present the
following results: an -approximation for the problem of
minimizing the maximum total weight of disagreement edges incident on any node
(thus providing the first known approximation for the above family of min-max
graph cut problems), a remarkably simple -approximation for minimizing
local disagreements in complete graphs (improving upon the previous best known
approximation of ), and a -approximation for
maximizing the minimum total weight of agreement edges incident on any node,
hence improving upon the -approximation that follows from
the study of approximate pure Nash equilibria in cut and party affiliation
games
A Study of NK Landscapes' Basins and Local Optima Networks
We propose a network characterization of combinatorial fitness landscapes by
adapting the notion of inherent networks proposed for energy surfaces (Doye,
2002). We use the well-known family of landscapes as an example. In our
case the inherent network is the graph where the vertices are all the local
maxima and edges mean basin adjacency between two maxima. We exhaustively
extract such networks on representative small NK landscape instances, and show
that they are 'small-worlds'. However, the maxima graphs are not random, since
their clustering coefficients are much larger than those of corresponding
random graphs. Furthermore, the degree distributions are close to exponential
instead of Poissonian. We also describe the nature of the basins of attraction
and their relationship with the local maxima network.Comment: best paper nominatio
Nonparametric Feature Extraction from Dendrograms
We propose feature extraction from dendrograms in a nonparametric way. The
Minimax distance measures correspond to building a dendrogram with single
linkage criterion, with defining specific forms of a level function and a
distance function over that. Therefore, we extend this method to arbitrary
dendrograms. We develop a generalized framework wherein different distance
measures can be inferred from different types of dendrograms, level functions
and distance functions. Via an appropriate embedding, we compute a vector-based
representation of the inferred distances, in order to enable many numerical
machine learning algorithms to employ such distances. Then, to address the
model selection problem, we study the aggregation of different dendrogram-based
distances respectively in solution space and in representation space in the
spirit of deep representations. In the first approach, for example for the
clustering problem, we build a graph with positive and negative edge weights
according to the consistency of the clustering labels of different objects
among different solutions, in the context of ensemble methods. Then, we use an
efficient variant of correlation clustering to produce the final clusters. In
the second approach, we investigate the sequential combination of different
distances and features sequentially in the spirit of multi-layered
architectures to obtain the final features. Finally, we demonstrate the
effectiveness of our approach via several numerical studies
Visualising the structure of document search results: A comparison of graph theoretic approaches
This is the post-print of the article - Copyright @ 2010 Sage PublicationsPrevious work has shown that distance-similarity visualisation or ‘spatialisation’ can provide a potentially useful context in which to browse the results of a query search, enabling the user to adopt a simple local foraging or ‘cluster growing’ strategy to navigate through the retrieved document set. However, faithfully mapping feature-space models to visual space can be problematic owing to their inherent high dimensionality and non-linearity. Conventional linear approaches to dimension reduction tend to fail at this kind of task, sacrificing local structural in order to preserve a globally optimal mapping. In this paper the clustering performance of a recently proposed algorithm called isometric feature mapping (Isomap), which deals with non-linearity by transforming dissimilarities into geodesic distances, is compared to that of non-metric multidimensional scaling (MDS). Various graph pruning methods, for geodesic distance estimation, are also compared. Results show that Isomap is significantly better at preserving local structural detail than MDS, suggesting it is better suited to cluster growing and other semantic navigation tasks. Moreover, it is shown that applying a minimum-cost graph pruning criterion can provide a parameter-free alternative to the traditional K-neighbour method, resulting in spatial clustering that is equivalent to or better than that achieved using an optimal-K criterion
Complex-network analysis of combinatorial spaces: The NK landscape case
We propose a network characterization of combinatorial fitness landscapes by
adapting the notion of inherent networks proposed for energy surfaces. We use
the well-known family of NK landscapes as an example. In our case the inherent
network is the graph whose vertices represent the local maxima in the
landscape, and the edges account for the transition probabilities between their
corresponding basins of attraction. We exhaustively extracted such networks on
representative NK landscape instances, and performed a statistical
characterization of their properties. We found that most of these network
properties are related to the search difficulty on the underlying NK landscapes
with varying values of K.Comment: arXiv admin note: substantial text overlap with arXiv:0810.3492,
arXiv:0810.348
Fast k-means based on KNN Graph
In the era of big data, k-means clustering has been widely adopted as a basic
processing tool in various contexts. However, its computational cost could be
prohibitively high as the data size and the cluster number are large. It is
well known that the processing bottleneck of k-means lies in the operation of
seeking closest centroid in each iteration. In this paper, a novel solution
towards the scalability issue of k-means is presented. In the proposal, k-means
is supported by an approximate k-nearest neighbors graph. In the k-means
iteration, each data sample is only compared to clusters that its nearest
neighbors reside. Since the number of nearest neighbors we consider is much
less than k, the processing cost in this step becomes minor and irrelevant to
k. The processing bottleneck is therefore overcome. The most interesting thing
is that k-nearest neighbor graph is constructed by iteratively calling the fast
-means itself. Comparing with existing fast k-means variants, the proposed
algorithm achieves hundreds to thousands times speed-up while maintaining high
clustering quality. As it is tested on 10 million 512-dimensional data, it
takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the
same scale of clustering, it would take 3 years for traditional k-means
- …