Search CORE

18,278 research outputs found

Clustering large-scale data based on modified affinity propagation algorithm

Author: Ashour Wesam M.
Serdah Ahmed M
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2016
Field of study

Traditional clustering algorithms are no longer suitable for use in data mining applications that make use of large-scale data. There have been many large-scale data clustering algorithms proposed in recent years, but most of them do not achieve clustering with high quality. Despite that Affinity Propagation (AP) is effective and accurate in normal data clustering, but it is not effective for large-scale data. This paper proposes two methods for large-scale data clustering that depend on a modified version of AP algorithm. The proposed methods are set to ensure both low time complexity and good accuracy of the clustering method. Firstly, a data set is divided into several subsets using one of two methods random fragmentation or K-means. Secondly, subsets are clustered into K clusters using K-Affinity Propagation (KAP) algorithm to select local cluster exemplars in each subset. Thirdly, the inverse weighted clustering

Biblioteka Nauki - repozytorium artykuÅÃ³w

Institutional Repository of the Islamic University of Gaza

Clustering by soft-constraint affinity propagation: Applications to gene-expression data

Author: Alizadeh
Blatt
Braunstein
Golub
M. Leone
M. Weigt
Pomeroy
Sumedha
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2007
Field of study

Motivation: Similarity-measure based clustering is a crucial problem appearing throughout scientific data analysis. Recently, a powerful new algorithm called Affinity Propagation (AP) based on message-passing techniques was proposed by Frey and Dueck \cite{Frey07}. In AP, each cluster is identified by a common exemplar all other data points of the same cluster refer to, and exemplars have to refer to themselves. Albeit its proved power, AP in its present form suffers from a number of drawbacks. The hard constraint of having exactly one exemplar per cluster restricts AP to classes of regularly shaped clusters, and leads to suboptimal performance, {\it e.g.}, in analyzing gene expression data. Results: This limitation can be overcome by relaxing the AP hard constraints. A new parameter controls the importance of the constraints compared to the aim of maximizing the overall similarity, and allows to interpolate between the simple case where each data point selects its closest neighbor as an exemplar and the original AP. The resulting soft-constraint affinity propagation (SCAP) becomes more informative, accurate and leads to more stable clustering. Even though a new {\it a priori} free-parameter is introduced, the overall dependence of the algorithm on external tuning is reduced, as robustness is increased and an optimal strategy for parameter selection emerges more naturally. SCAP is tested on biological benchmark data, including in particular microarray data related to various cancer types. We show that the algorithm efficiently unveils the hierarchical cluster structure present in the data sets. Further on, it allows to extract sparse gene expression signatures for each cluster.Comment: 11 pages, supplementary material: http://isiosf.isi.it/~weigt/scap_supplement.pd

arXiv.org e-Print Archive

CiteSeerX

Crossref

Scaling Analysis of Affinity Propagation

Author: A. P. Dempster
Cyril Furtlehner
J. Pearl
J. S. Yedidia
K. Wang
L. de Haan
Lihi Zelnik-manor
M. Meila
Michèle Sebag
S. Dudoit
X. Zhang
X. Zhang
Xiangliang Zhang
Publication venue: 'American Physical Society (APS)'
Publication date: 09/10/2009
Field of study

We analyze and exploit some scaling properties of the Affinity Propagation (AP) clustering algorithm proposed by Frey and Dueck (2007). First we observe that a divide and conquer strategy, used on a large data set hierarchically reduces the complexity

{\cal O}(N^2)

{\cal O}(N^{(h+2)/(h+1)})

, for a data-set of size

N

and a depth

h

of the hierarchical strategy. For a data-set embedded in a

d

-dimensional space, we show that this is obtained without notably damaging the precision except in dimension

d=2

. In fact, for

d

larger than 2 the relative loss in precision scales like

N^{(2-d)/(h+1)d}

. Finally, under some conditions we observe that there is a value

s^*

of the penalty coefficient, a free parameter used to fix the number of clusters, which separates a fragmentation phase (for

s<s^*

) from a coalescent one (for

s>s^*

) of the underlying hidden cluster structure. At this precise point holds a self-similarity property which can be exploited by the hierarchical strategy to actually locate its position. From this observation, a strategy based on \AP can be defined to find out how many clusters are present in a given dataset.Comment: 28 pages, 14 figures, Inria research repor

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

K

Author: Cheng Lu
Cheng Wu
Shiji Song
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

The Affinity Propagation (AP) algorithm is an effective algorithm for clustering analysis, but it can not be directly applicable to the case of incomplete data. In view of the prevalence of missing data and the uncertainty of missing attributes, we put forward a modified AP clustering algorithm based on K-nearest neighbor intervals (KNNI) for incomplete data. Based on an Improved Partial Data Strategy, the proposed algorithm estimates the KNNI representation of missing attributes by using the attribute distribution information of the available data. The similarity function can be changed by dealing with the interval data. Then the improved AP algorithm can be applicable to the case of incomplete data. Experiments on several UCI datasets show that the proposed algorithm achieves impressive clustering results

Crossref

Directory of Open Access Journals

Harnessing machine learning for fiber-induced nonlinearity mitigation in long-haul coherent optical OFDM

Author: Aldaya Ivan
Barry Liam
Giacoumidis Elias
Tsokanos Athanasios
Wei Jinlong
Yi-lin
Publication venue: 'MDPI AG'
Publication date: 01/12/2018
Field of study

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).Coherent optical orthogonal frequency division multiplexing (CO-OFDM) has attracted a lot of interest in optical fiber communications due to its simplified digital signal processing (DSP) units, high spectral-efficiency, flexibility, and tolerance to linear impairments. However, CO-OFDM’s high peak-to-average power ratio imposes high vulnerability to fiber-induced non-linearities. DSP-based machine learning has been considered as a promising approach for fiber non-linearity compensation without sacrificing computational complexity. In this paper, we review the existing machine learning approaches for CO-OFDM in a common framework and review the progress in this area with a focus on practical aspects and comparison with benchmark DSP solutions.Peer reviewe

Directory of Open Access Journals

University of Hertfordshire Research Archive

SIMCO: SIMilarity-based object COunting

Author: Cristani Marco
Giachetti Andrea
Godi Marco
Joppi Christian
Publication venue
Publication date: 01/01/2019
Field of study

We present SIMCO, the first agnostic multi-class object counting approach. SIMCO starts by detecting foreground objects through a novel Mask RCNN-based architecture trained beforehand (just once) on a brand-new synthetic 2D shape dataset, InShape; the idea is to highlight every object resembling a primitive 2D shape (circle, square, rectangle, etc.). Each object detected is described by a low-dimensional embedding, obtained from a novel similarity-based head branch; this latter implements a triplet loss, encouraging similar objects (same 2D shape + color and scale) to map close. Subsequently, SIMCO uses this embedding for clustering, so that different types of objects can emerge and be counted, making SIMCO the very first multi-class unsupervised counter. Experiments show that SIMCO provides state-of-the-art scores on counting benchmarks and that it can also help in many challenging image understanding tasks

arXiv.org e-Print Archive

Catalogo dei prodotti della ricerca