Search CORE

6,363 research outputs found

Clustering with shallow trees

Author: A Braunstein
A Flaxman
Altschul S F
Bradde S Braunstein A Flaxman A Zecchina R
L Foini
M Bailly-Bechet
R Zecchina
S Bradde
Publication venue: 'IOP Publishing'
Publication date: 01/01/2009
Field of study

We propose a new method for hierarchical clustering based on the optimisation of a cost function over trees of limited depth, and we derive a message--passing method that allows to solve it efficiently. The method and algorithm can be interpreted as a natural interpolation between two well-known approaches, namely single linkage and the recently presented Affinity Propagation. We analyze with this general scheme three biological/medical structured datasets (human population based on genetic information, proteins based on sequences and verbal autopsies) and show that the interpolation technique provides new insight.Comment: 11 pages, 7 figure

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Scaling Analysis of Affinity Propagation

Author: A. P. Dempster
Cyril Furtlehner
J. Pearl
J. S. Yedidia
K. Wang
L. de Haan
Lihi Zelnik-manor
M. Meila
Michèle Sebag
S. Dudoit
X. Zhang
X. Zhang
Xiangliang Zhang
Publication venue: 'American Physical Society (APS)'
Publication date: 09/10/2009
Field of study

We analyze and exploit some scaling properties of the Affinity Propagation (AP) clustering algorithm proposed by Frey and Dueck (2007). First we observe that a divide and conquer strategy, used on a large data set hierarchically reduces the complexity

{\cal O}(N^2)

{\cal O}(N^{(h+2)/(h+1)})

, for a data-set of size

N

and a depth

h

of the hierarchical strategy. For a data-set embedded in a

d

-dimensional space, we show that this is obtained without notably damaging the precision except in dimension

d=2

. In fact, for

d

larger than 2 the relative loss in precision scales like

N^{(2-d)/(h+1)d}

. Finally, under some conditions we observe that there is a value

s^*

of the penalty coefficient, a free parameter used to fix the number of clusters, which separates a fragmentation phase (for

s<s^*

) from a coalescent one (for

s>s^*

) of the underlying hidden cluster structure. At this precise point holds a self-similarity property which can be exploited by the hierarchical strategy to actually locate its position. From this observation, a strategy based on \AP can be defined to find out how many clusters are present in a given dataset.Comment: 28 pages, 14 figures, Inria research repor

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Affinity Paths and Information Diffusion in Social Networks

Author: Albert
Ball
Bass
Boccaletti
Callaway
Daley
Ebel
Eckmann
Esteban Moro
Feld
Frenken
Galam
Gomez-Rodriguez
Granovetter
Guardiola
Guimerá
Harris
Hethcote
Jackson
José Luis Iribarren
Katz
Lazer
Leskovec
Liben-Nowell
Liu
McPherson
Nekovee
Newman
Newman
Newman
Niu
Pastor-Satorras
Piraveenan
Singla
Sznajd-Weron
Van den Bulte
Vilpponen
Watts
Watts
Watts
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Widespread interest in the diffusion of information through social networks has produced a large number of Social Dynamics models. A majority of them use theoretical hypothesis to explain their diffusion mechanisms while the few empirically based ones average out their measures over many messages of different content. Our empirical research tracking the step-by-step email propagation of an invariable viral marketing message delves into the content impact and has discovered new and striking features. The topology and dynamics of the propagation cascades display patterns not inherited from the email networks carrying the message. Their disconnected, low transitivity, tree-like cascades present positive correlation between their nodes probability to forward the message and the average number of neighbors they target and show increased participants' involvement as the propagation paths length grows. Such patterns not described before, nor replicated by any of the existing models of information diffusion, can be explained if participants make their pass-along decisions based uniquely on local knowledge of their network neighbors affinity with the message content. We prove the plausibility of such mechanism through a stylized, agent-based model that replicates the \emph{Affinity Paths} observed in real information diffusion cascades.Comment: 11 pages, 7 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Clustering by soft-constraint affinity propagation: Applications to gene-expression data

Author: Alizadeh
Blatt
Braunstein
Golub
M. Leone
M. Weigt
Pomeroy
Sumedha
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2007
Field of study

Motivation: Similarity-measure based clustering is a crucial problem appearing throughout scientific data analysis. Recently, a powerful new algorithm called Affinity Propagation (AP) based on message-passing techniques was proposed by Frey and Dueck \cite{Frey07}. In AP, each cluster is identified by a common exemplar all other data points of the same cluster refer to, and exemplars have to refer to themselves. Albeit its proved power, AP in its present form suffers from a number of drawbacks. The hard constraint of having exactly one exemplar per cluster restricts AP to classes of regularly shaped clusters, and leads to suboptimal performance, {\it e.g.}, in analyzing gene expression data. Results: This limitation can be overcome by relaxing the AP hard constraints. A new parameter controls the importance of the constraints compared to the aim of maximizing the overall similarity, and allows to interpolate between the simple case where each data point selects its closest neighbor as an exemplar and the original AP. The resulting soft-constraint affinity propagation (SCAP) becomes more informative, accurate and leads to more stable clustering. Even though a new {\it a priori} free-parameter is introduced, the overall dependence of the algorithm on external tuning is reduced, as robustness is increased and an optimal strategy for parameter selection emerges more naturally. SCAP is tested on biological benchmark data, including in particular microarray data related to various cancer types. We show that the algorithm efficiently unveils the hierarchical cluster structure present in the data sets. Further on, it allows to extract sparse gene expression signatures for each cluster.Comment: 11 pages, supplementary material: http://isiosf.isi.it/~weigt/scap_supplement.pd

arXiv.org e-Print Archive

CiteSeerX

Crossref

Enhancing Domain Word Embedding via Latent Semantic Imputation

Author: Lai Siwei
Lin Frank
Mikolov Tomas
van der Maaten Laurens
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/05/2019
Field of study

We present a novel method named Latent Semantic Imputation (LSI) to transfer external knowledge into semantic space for enhancing word embedding. The method integrates graph theory to extract the latent manifold structure of the entities in the affinity space and leverages non-negative least squares with standard simplex constraints and power iteration method to derive spectral embeddings. It provides an effective and efficient approach to combining entity representations defined in different Euclidean spaces. Specifically, our approach generates and imputes reliable embedding vectors for low-frequency words in the semantic space and benefits downstream language tasks that depend on word embedding. We conduct comprehensive experiments on a carefully designed classification problem and language modeling and demonstrate the superiority of the enhanced embedding via LSI over several well-known benchmark embeddings. We also confirm the consistency of the results under different parameter settings of our method.Comment: ACM SIGKDD 201

arXiv.org e-Print Archive

Crossref