42,978 research outputs found
Bipartite graph partitioning and data clustering
Many data types arising from data mining applications can be modeled as
bipartite graphs, examples include terms and documents in a text corpus,
customers and purchasing items in market basket analysis and reviewers and
movies in a movie recommender system. In this paper, we propose a new data
clustering method based on partitioning the underlying bipartite graph. The
partition is constructed by minimizing a normalized sum of edge weights between
unmatched pairs of vertices of the bipartite graph. We show that an approximate
solution to the minimization problem can be obtained by computing a partial
singular value decomposition (SVD) of the associated edge weight matrix of the
bipartite graph. We point out the connection of our clustering algorithm to
correspondence analysis used in multivariate analysis. We also briefly discuss
the issue of assigning data objects to multiple clusters. In the experimental
results, we apply our clustering algorithm to the problem of document
clustering to illustrate its effectiveness and efficiency.Comment: Proceedings of ACM CIKM 2001, the Tenth International Conference on
Information and Knowledge Management, 200
Order independent structural alignment of circularly permuted proteins
Circular permutation connects the N and C termini of a protein and
concurrently cleaves elsewhere in the chain, providing an important mechanism
for generating novel protein fold and functions. However, their in genomes is
unknown because current detection methods can miss many occurances, mistaking
random repeats as circular permutation. Here we develop a method for detecting
circularly permuted proteins from structural comparison. Sequence order
independent alignment of protein structures can be regarded as a special case
of the maximum-weight independent set problem, which is known to be
computationally hard. We develop an efficient approximation algorithm by
repeatedly solving relaxations of an appropriate intermediate integer
programming formulation, we show that the approximation ratio is much better
then the theoretical worst case ratio of . Circularly permuted
proteins reported in literature can be identified rapidly with our method,
while they escape the detection by publicly available servers for structural
alignment.Comment: 5 pages, 3 figures, Accepted by IEEE-EMBS 2004 Conference Proceeding
Adaptive Nonparametric Image Parsing
In this paper, we present an adaptive nonparametric solution to the image
parsing task, namely annotating each image pixel with its corresponding
category label. For a given test image, first, a locality-aware retrieval set
is extracted from the training data based on super-pixel matching similarities,
which are augmented with feature extraction for better differentiation of local
super-pixels. Then, the category of each super-pixel is initialized by the
majority vote of the -nearest-neighbor super-pixels in the retrieval set.
Instead of fixing as in traditional non-parametric approaches, here we
propose a novel adaptive nonparametric approach which determines the
sample-specific k for each test image. In particular, is adaptively set to
be the number of the fewest nearest super-pixels which the images in the
retrieval set can use to get the best category prediction. Finally, the initial
super-pixel labels are further refined by contextual smoothing. Extensive
experiments on challenging datasets demonstrate the superiority of the new
solution over other state-of-the-art nonparametric solutions.Comment: 11 page
Memetic Multilevel Hypergraph Partitioning
Hypergraph partitioning has a wide range of important applications such as
VLSI design or scientific computing. With focus on solution quality, we develop
the first multilevel memetic algorithm to tackle the problem. Key components of
our contribution are new effective multilevel recombination and mutation
operations that provide a large amount of diversity. We perform a wide range of
experiments on a benchmark set containing instances from application areas such
VLSI, SAT solving, social networks, and scientific computing. Compared to the
state-of-the-art hypergraph partitioning tools hMetis, PaToH, and KaHyPar, our
new algorithm computes the best result on almost all instances
An Agent-Based Algorithm exploiting Multiple Local Dissimilarities for Clusters Mining and Knowledge Discovery
We propose a multi-agent algorithm able to automatically discover relevant
regularities in a given dataset, determining at the same time the set of
configurations of the adopted parametric dissimilarity measure yielding compact
and separated clusters. Each agent operates independently by performing a
Markovian random walk on a suitable weighted graph representation of the input
dataset. Such a weighted graph representation is induced by the specific
parameter configuration of the dissimilarity measure adopted by the agent,
which searches and takes decisions autonomously for one cluster at a time.
Results show that the algorithm is able to discover parameter configurations
that yield a consistent and interpretable collection of clusters. Moreover, we
demonstrate that our algorithm shows comparable performances with other similar
state-of-the-art algorithms when facing specific clustering problems
Multiclass Data Segmentation using Diffuse Interface Methods on Graphs
We present two graph-based algorithms for multiclass segmentation of
high-dimensional data. The algorithms use a diffuse interface model based on
the Ginzburg-Landau functional, related to total variation compressed sensing
and image processing. A multiclass extension is introduced using the Gibbs
simplex, with the functional's double-well potential modified to handle the
multiclass case. The first algorithm minimizes the functional using a convex
splitting numerical scheme. The second algorithm is a uses a graph adaptation
of the classical numerical Merriman-Bence-Osher (MBO) scheme, which alternates
between diffusion and thresholding. We demonstrate the performance of both
algorithms experimentally on synthetic data, grayscale and color images, and
several benchmark data sets such as MNIST, COIL and WebKB. We also make use of
fast numerical solvers for finding the eigenvectors and eigenvalues of the
graph Laplacian, and take advantage of the sparsity of the matrix. Experiments
indicate that the results are competitive with or better than the current
state-of-the-art multiclass segmentation algorithms.Comment: 14 page
- …