9,115 research outputs found
Clustering by soft-constraint affinity propagation: Applications to gene-expression data
Motivation: Similarity-measure based clustering is a crucial problem
appearing throughout scientific data analysis. Recently, a powerful new
algorithm called Affinity Propagation (AP) based on message-passing techniques
was proposed by Frey and Dueck \cite{Frey07}. In AP, each cluster is identified
by a common exemplar all other data points of the same cluster refer to, and
exemplars have to refer to themselves. Albeit its proved power, AP in its
present form suffers from a number of drawbacks. The hard constraint of having
exactly one exemplar per cluster restricts AP to classes of regularly shaped
clusters, and leads to suboptimal performance, {\it e.g.}, in analyzing gene
expression data. Results: This limitation can be overcome by relaxing the AP
hard constraints. A new parameter controls the importance of the constraints
compared to the aim of maximizing the overall similarity, and allows to
interpolate between the simple case where each data point selects its closest
neighbor as an exemplar and the original AP. The resulting soft-constraint
affinity propagation (SCAP) becomes more informative, accurate and leads to
more stable clustering. Even though a new {\it a priori} free-parameter is
introduced, the overall dependence of the algorithm on external tuning is
reduced, as robustness is increased and an optimal strategy for parameter
selection emerges more naturally. SCAP is tested on biological benchmark data,
including in particular microarray data related to various cancer types. We
show that the algorithm efficiently unveils the hierarchical cluster structure
present in the data sets. Further on, it allows to extract sparse gene
expression signatures for each cluster.Comment: 11 pages, supplementary material:
http://isiosf.isi.it/~weigt/scap_supplement.pd
Efficient classification using parallel and scalable compressed model and Its application on intrusion detection
In order to achieve high efficiency of classification in intrusion detection,
a compressed model is proposed in this paper which combines horizontal
compression with vertical compression. OneR is utilized as horizontal
com-pression for attribute reduction, and affinity propagation is employed as
vertical compression to select small representative exemplars from large
training data. As to be able to computationally compress the larger volume of
training data with scalability, MapReduce based parallelization approach is
then implemented and evaluated for each step of the model compression process
abovementioned, on which common but efficient classification methods can be
directly used. Experimental application study on two publicly available
datasets of intrusion detection, KDD99 and CMDC2012, demonstrates that the
classification using the compressed model proposed can effectively speed up the
detection procedure at up to 184 times, most importantly at the cost of a
minimal accuracy difference with less than 1% on average
Distributed Clustering in Cognitive Radio Ad Hoc Networks Using Soft-Constraint Affinity Propagation
Absence of network infrastructure and heterogeneous spectrum availability in cognitive radio ad hoc networks (CRAHNs) necessitate the self-organization of cognitive radio users (CRs) for efficient spectrum coordination. The cluster-based structure is known to be effective in both guaranteeing system performance and reducing communication overhead in variable network environment. In this paper, we propose a distributed clustering algorithm based on soft-constraint affinity propagation message passing model (DCSCAP). Without dependence on predefined common control channel (CCC), DCSCAP relies on the distributed message passing among CRs through their available channels, making the algorithm applicable for large scale networks. Different from original soft-constraint affinity propagation algorithm, the maximal iterations of message passing is controlled to a relatively small number to accommodate to the dynamic environment of CRAHNs. Based on the accumulated evidence for clustering from the message passing process, clusters are formed with the objective of grouping the CRs with similar spectrum availability into smaller number of clusters while guaranteeing at least one CCC in each cluster. Extensive simulation results demonstrate the preference of DCSCAP compared with existing algorithms in both efficiency and robustness of the clusters
Distributed Low-rank Subspace Segmentation
Vision problems ranging from image clustering to motion segmentation to
semi-supervised learning can naturally be framed as subspace segmentation
problems, in which one aims to recover multiple low-dimensional subspaces from
noisy and corrupted input data. Low-Rank Representation (LRR), a convex
formulation of the subspace segmentation problem, is provably and empirically
accurate on small problems but does not scale to the massive sizes of modern
vision datasets. Moreover, past work aimed at scaling up low-rank matrix
factorization is not applicable to LRR given its non-decomposable constraints.
In this work, we propose a novel divide-and-conquer algorithm for large-scale
subspace segmentation that can cope with LRR's non-decomposable constraints and
maintains LRR's strong recovery guarantees. This has immediate implications for
the scalability of subspace segmentation, which we demonstrate on a benchmark
face recognition dataset and in simulations. We then introduce novel
applications of LRR-based subspace segmentation to large-scale semi-supervised
learning for multimedia event detection, concept detection, and image tagging.
In each case, we obtain state-of-the-art results and order-of-magnitude speed
ups
Cortical spatio-temporal dimensionality reduction for visual grouping
The visual systems of many mammals, including humans, is able to integrate
the geometric information of visual stimuli and to perform cognitive tasks
already at the first stages of the cortical processing. This is thought to be
the result of a combination of mechanisms, which include feature extraction at
single cell level and geometric processing by means of cells connectivity. We
present a geometric model of such connectivities in the space of detected
features associated to spatio-temporal visual stimuli, and show how they can be
used to obtain low-level object segmentation. The main idea is that of defining
a spectral clustering procedure with anisotropic affinities over datasets
consisting of embeddings of the visual stimuli into higher dimensional spaces.
Neural plausibility of the proposed arguments will be discussed
Multi-view constrained clustering with an incomplete mapping between views
Multi-view learning algorithms typically assume a complete bipartite mapping
between the different views in order to exchange information during the
learning process. However, many applications provide only a partial mapping
between the views, creating a challenge for current methods. To address this
problem, we propose a multi-view algorithm based on constrained clustering that
can operate with an incomplete mapping. Given a set of pairwise constraints in
each view, our approach propagates these constraints using a local similarity
measure to those instances that can be mapped to the other views, allowing the
propagated constraints to be transferred across views via the partial mapping.
It uses co-EM to iteratively estimate the propagation within each view based on
the current clustering model, transfer the constraints across views, and then
update the clustering model. By alternating the learning process between views,
this approach produces a unified clustering model that is consistent with all
views. We show that this approach significantly improves clustering performance
over several other methods for transferring constraints and allows multi-view
clustering to be reliably applied when given a limited mapping between the
views. Our evaluation reveals that the propagated constraints have high
precision with respect to the true clusters in the data, explaining their
benefit to clustering performance in both single- and multi-view learning
scenarios
- …