Search CORE

33 research outputs found

Parallel Hierarchical Affinity Propagation with MapReduce

Author: Haber Rana
Mijatovic Nenad
Peter Adrian M.
Rose Dillon Mark
Rouly Jean Michel
Publication venue
Publication date: 28/03/2014
Field of study

The accelerated evolution and explosion of the Internet and social media is generating voluminous quantities of data (on zettabyte scales). Paramount amongst the desires to manipulate and extract actionable intelligence from vast big data volumes is the need for scalable, performance-conscious analytics algorithms. To directly address this need, we propose a novel MapReduce implementation of the exemplar-based clustering algorithm known as Affinity Propagation. Our parallelization strategy extends to the multilevel Hierarchical Affinity Propagation algorithm and enables tiered aggregation of unstructured data with minimal free parameters, in principle requiring only a similarity measure between data points. We detail the linear run-time complexity of our approach, overcoming the limiting quadratic complexity of the original algorithm. Experimental validation of our clustering methodology on a variety of synthetic and real data sets (e.g. images and point data) demonstrates our competitiveness against other state-of-the-art MapReduce clustering techniques

arXiv.org e-Print Archive

Crossref

Scaling Analysis of Affinity Propagation

Author: A. P. Dempster
Cyril Furtlehner
J. Pearl
J. S. Yedidia
K. Wang
L. de Haan
Lihi Zelnik-manor
M. Meila
Michèle Sebag
S. Dudoit
X. Zhang
X. Zhang
Xiangliang Zhang
Publication venue: 'American Physical Society (APS)'
Publication date: 09/10/2009
Field of study

We analyze and exploit some scaling properties of the Affinity Propagation (AP) clustering algorithm proposed by Frey and Dueck (2007). First we observe that a divide and conquer strategy, used on a large data set hierarchically reduces the complexity

{\cal O}(N^2)

{\cal O}(N^{(h+2)/(h+1)})

, for a data-set of size

N

and a depth

h

of the hierarchical strategy. For a data-set embedded in a

d

-dimensional space, we show that this is obtained without notably damaging the precision except in dimension

d=2

. In fact, for

d

larger than 2 the relative loss in precision scales like

N^{(2-d)/(h+1)d}

. Finally, under some conditions we observe that there is a value

s^*

of the penalty coefficient, a free parameter used to fix the number of clusters, which separates a fragmentation phase (for

s<s^*

) from a coalescent one (for

s>s^*

) of the underlying hidden cluster structure. At this precise point holds a self-similarity property which can be exploited by the hierarchical strategy to actually locate its position. From this observation, a strategy based on \AP can be defined to find out how many clusters are present in a given dataset.Comment: 28 pages, 14 figures, Inria research repor

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Adaptive semi-supervised affinity propagation clustering algorithm based on structural similarity

Author: Limin Wang
Qiang Ji
Xuming Han
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2016
Field of study

Uzimajući u obzir nezadovoljavajuće djelovanje grupiranja srodnog širenja algoritma grupiranja, kada se radi o nizovima podataka složenih struktura, u ovom se radu predlaže prilagodljivi nadzirani algoritam grupiranja srodnog širenja utemeljen na strukturnoj sličnosti (SAAP-SS). Najprije se predlaže nova strukturna sličnost rješavanjem nelinearnog problema zastupljenosti niskoga ranga. Zatim slijedi srodno širenje na temelju podešavanja matrice sličnosti primjenom poznatih udvojenih ograničenja. Na kraju se u postupak algoritma uvodi ideja eksplozija kod vatrometa. Prilagodljivo pretražujući preferencijalni prostor u dva smjera, uravnotežuju se globalne i lokalne pretraživačke sposobnosti algoritma u cilju pronalaženja optimalne strukture grupiranja. Rezultati eksperimenata i sa sintetičkim i s realnim nizovima podataka pokazuju poboljšanja u radu predloženog algoritma u usporedbi s AP, FEO-SAP i K-means metodama.In view of the unsatisfying clustering effect of affinity propagation (AP) clustering algorithm when dealing with data sets of complex structures, an adaptive semi-supervised affinity propagation clustering algorithm based on structural similarity (SAAP-SS) is proposed in this paper. First, a novel structural similarity is proposed by solving a non-linear, low-rank representation problem. Then we perform affinity propagation on the basis of adjusting the similarity matrix by utilizing the known pairwise constraints. Finally, the idea of fireworks explosion is introduced into the process of the algorithm. By adaptively searching the preference space bi-directionally, the algorithm’s global and local searching abilities are balanced in order to find the optimal clustering structure. The results of the experiments with both synthetic and real data sets show performance improvements of the proposed algorithm compared with AP, FEO-SAP and K-means methods

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Structural learning for large scale image classification

Author: NC DOCKS at The University of North Carolina at Charlotte
Shen Yi
Publication venue
Publication date: 01/01/2013
Field of study

To leverage large-scale collaboratively-tagged (loosely-tagged) images for training a large number of classifiers to support large-scale image classification, we need to develop new frameworks to deal with the following issues: (1) spam tags, i.e., tags are not relevant to the semantic of the images; (2) loose object tags, i.e., multiple object tags are loosely given at the image level without their locations in the images; (3) missing object tags, i.e. some object tags are missed due to incomplete tagging; (4) inter-related object classes, i.e., some object classes are visually correlated and their classifiers need to be trained jointly instead of independently; (5) large scale object classes, which requires to limit the computational time complexity for classifier training algorithms as well as the storage spaces for intermediate results. To deal with these issues, we propose a structural learning framework which consists of the following key components: (1) cluster-based junk image filtering to address the issue of spam tags; (2) automatic tag-instance alignment to address the issue of loose object tags; (3) automatic missing object tag prediction; (4) object correlation network for inter-class visual correlation characterization to address the issue of missing tags; (5) large-scale structural learning with object correlation network for enhancing the discrimination power of object classifiers. To obtain enough numbers of labeled training images, our proposed framework leverages the abundant web images and their social tags. To make those web images usable, tag cleansing has to be done to neutralize the noise from user tagging preferences, in particularly junk tags, loose tags and missing tags. Then a discriminative learning algorithm is developed to train a large number of inter-related classifiers for achieving large-scale image classification, e.g., learning a large number of classifiers for categorizing large-scale images into a large number of inter-related object classes and image concepts. A visual concept network is first constructed for organizing enumorus object classes and image concepts according to their inter-concept visual correlations. The visual concept network is further used to: (a) identify inter-related learning tasks for classifier training; (b) determine groups of visually-similar object classes and image concepts; and (c) estimate the learning complexity for classifier training. A large-scale discriminative learning algorithm is developed for supporting multi-class classifier training and achieving accurate inter-group discrimination and effective intra-group separation. Our discriminative learning algorithm can significantly enhance the discrimination power of the classifiers and dramatically reduce the computational cost for large-scale classifier training

The University of North Carolina at Greensboro

A Model-Based Analysis of Chemical and Temporal Patterns of Cuticular Hydrocarbons in Male Drosophila melanogaster

Author: A Ejima
A Gibbs
A Skroblin
A Takahashi
Adrienne Chu
AG Clark
B Foley
Ben Smith
BJ Frey
C Labeur
Clement Kent
DF Nicholls
DJ Bartholomew
F Marcillac
F Mas
GH Thomson
H Markowitz
H Markowitz
HB Dowse
J Jallon
J Rouault
JA Coyne
JA Coyne
JC Loehlin
JD Levine
JD Rouault
JF Ferveur
JF Ferveur
JH Ward
JM Gleason
JM Jallon
Joel Levine
M Cobb
M Grillet
M Ueyama
MD Ginzel
MW Blows
MW Blows
R Dallerac
Reza Azanchi
RW Howard
T Chertemps
T Chertemps
T Chertemps
TF Cox
Tom Tregenza
V Soroker
W Van der Goes van Naters
WD Jones
Y Benjamini
Publication venue: Public Library of Science
Publication date: 26/09/2007
Field of study

Drosophila Cuticular Hydrocarbons (CH) influence courtship behaviour, mating, aggregation, oviposition, and resistance to desiccation. We measured levels of 24 different CH compounds of individual male D. melanogaster hourly under a variety of environmental (LD/DD) conditions. Using a model-based analysis of CH variation, we developed an improved normalization method for CH data, and show that CH compounds have reproducible cyclic within-day temporal patterns of expression which differ between LD and DD conditions. Multivariate clustering of expression patterns identified 5 clusters of co-expressed compounds with common chemical characteristics. Turnover rate estimates suggest CH production may be a significant metabolic cost. Male cuticular hydrocarbon expression is a dynamic trait influenced by light and time of day; since abundant hydrocarbons affect male sexual behavior, males may present different pheromonal profiles at different times and under different conditions

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Clustering large-scale data based on modified affinity propagation algorithm

Author: Ashour Wesam M.
Serdah Ahmed M
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2016
Field of study

Traditional clustering algorithms are no longer suitable for use in data mining applications that make use of large-scale data. There have been many large-scale data clustering algorithms proposed in recent years, but most of them do not achieve clustering with high quality. Despite that Affinity Propagation (AP) is effective and accurate in normal data clustering, but it is not effective for large-scale data. This paper proposes two methods for large-scale data clustering that depend on a modified version of AP algorithm. The proposed methods are set to ensure both low time complexity and good accuracy of the clustering method. Firstly, a data set is divided into several subsets using one of two methods random fragmentation or K-means. Secondly, subsets are clustered into K clusters using K-Affinity Propagation (KAP) algorithm to select local cluster exemplars in each subset. Thirdly, the inverse weighted clustering

Biblioteka Nauki - repozytorium artykuÅÃ³w

Institutional Repository of the Islamic University of Gaza