Search CORE

4,339 research outputs found

Relational Data Mining Through Extraction of Representative Exemplars

Author: Blanchard Frédéric
Herbin Michel
Publication venue
Publication date: 01/01/2012
Field of study

With the growing interest on Network Analysis, Relational Data Mining is becoming an emphasized domain of Data Mining. This paper addresses the problem of extracting representative elements from a relational dataset. After defining the notion of degree of representativeness, computed using the Borda aggregation procedure, we present the extraction of exemplars which are the representative elements of the dataset. We use these concepts to build a network on the dataset. We expose the main properties of these notions and we propose two typical applications of our framework. The first application consists in resuming and structuring a set of binary images and the second in mining co-authoring relation in a research team

arXiv.org e-Print Archive

CiteSeerX

HAL Descartes

Parametric t-Distributed Stochastic Exemplar-centered Embedding

Author: A Gisbrecht
B Bahmani
L Greengard
L Maaten van der
L Maaten Van Der
PP Kuksa
Publication venue
Publication date: 20/04/2018
Field of study

Parametric embedding methods such as parametric t-SNE (pt-SNE) have been widely adopted for data visualization and out-of-sample data embedding without further computationally expensive optimization or approximation. However, the performance of pt-SNE is highly sensitive to the hyper-parameter batch size due to conflicting optimization goals, and often produces dramatically different embeddings with different choices of user-defined perplexities. To effectively solve these issues, we present parametric t-distributed stochastic exemplar-centered embedding methods. Our strategy learns embedding parameters by comparing given data only with precomputed exemplars, resulting in a cost function with linear computational and memory complexity, which is further reduced by noise contrastive samples. Moreover, we propose a shallow embedding network with high-order feature interactions for data visualization, which is much easier to tune but produces comparable performance in contrast to a deep neural network employed by pt-SNE. We empirically demonstrate, using several benchmark datasets, that our proposed methods significantly outperform pt-SNE in terms of robustness, visual effects, and quantitative evaluations.Comment: fixed typo

arXiv.org e-Print Archive

NRC Publications Archive

Crossref

Clustering by soft-constraint affinity propagation: Applications to gene-expression data

Author: Alizadeh
Blatt
Braunstein
Golub
M. Leone
M. Weigt
Pomeroy
Sumedha
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2007
Field of study

Motivation: Similarity-measure based clustering is a crucial problem appearing throughout scientific data analysis. Recently, a powerful new algorithm called Affinity Propagation (AP) based on message-passing techniques was proposed by Frey and Dueck \cite{Frey07}. In AP, each cluster is identified by a common exemplar all other data points of the same cluster refer to, and exemplars have to refer to themselves. Albeit its proved power, AP in its present form suffers from a number of drawbacks. The hard constraint of having exactly one exemplar per cluster restricts AP to classes of regularly shaped clusters, and leads to suboptimal performance, {\it e.g.}, in analyzing gene expression data. Results: This limitation can be overcome by relaxing the AP hard constraints. A new parameter controls the importance of the constraints compared to the aim of maximizing the overall similarity, and allows to interpolate between the simple case where each data point selects its closest neighbor as an exemplar and the original AP. The resulting soft-constraint affinity propagation (SCAP) becomes more informative, accurate and leads to more stable clustering. Even though a new {\it a priori} free-parameter is introduced, the overall dependence of the algorithm on external tuning is reduced, as robustness is increased and an optimal strategy for parameter selection emerges more naturally. SCAP is tested on biological benchmark data, including in particular microarray data related to various cancer types. We show that the algorithm efficiently unveils the hierarchical cluster structure present in the data sets. Further on, it allows to extract sparse gene expression signatures for each cluster.Comment: 11 pages, supplementary material: http://isiosf.isi.it/~weigt/scap_supplement.pd

arXiv.org e-Print Archive

CiteSeerX

Crossref

Parallel Hierarchical Affinity Propagation with MapReduce

Author: Haber Rana
Mijatovic Nenad
Peter Adrian M.
Rose Dillon Mark
Rouly Jean Michel
Publication venue
Publication date: 28/03/2014
Field of study

The accelerated evolution and explosion of the Internet and social media is generating voluminous quantities of data (on zettabyte scales). Paramount amongst the desires to manipulate and extract actionable intelligence from vast big data volumes is the need for scalable, performance-conscious analytics algorithms. To directly address this need, we propose a novel MapReduce implementation of the exemplar-based clustering algorithm known as Affinity Propagation. Our parallelization strategy extends to the multilevel Hierarchical Affinity Propagation algorithm and enables tiered aggregation of unstructured data with minimal free parameters, in principle requiring only a similarity measure between data points. We detail the linear run-time complexity of our approach, overcoming the limiting quadratic complexity of the original algorithm. Experimental validation of our clustering methodology on a variety of synthetic and real data sets (e.g. images and point data) demonstrates our competitiveness against other state-of-the-art MapReduce clustering techniques

arXiv.org e-Print Archive

Crossref

The impact of contact tracing in clustered populations

Author: DM Green
DT Gillespie
E Volz
F Ball
IM Hall
IZ Kiss
IZ Kiss
J Clarke
J Müller
JC Miller
K Eames
KTD Eames
KTD Eames
Lauren Ancel Meyers
M Lipsitch
MA Serrano
Matt J. Keeling
MEJ Newman
MJ Keeling
MJ Keeling
MJ Keeling
MJ Tildesley
MR FitzGerald
MR Golden
NM Ferguson
NM Ferguson
R Huerta
S Bansal
S Riley
S Riley
T House
Thomas House
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/03/2010
Field of study

The tracing of potentially infectious contacts has become an important part of the control strategy for many infectious diseases, from early cases of novel infections to endemic sexually transmitted infections. Here, we make use of mathematical models to consider the case of partner notification for sexually transmitted infection, however these models are sufficiently simple to allow more general conclusions to be drawn. We show that, when contact network structure is considered in addition to contact tracing, standard “mass action” models are generally inadequate. To consider the impact of mutual contacts (specifically clustering) we develop an improvement to existing pairwise network models, which we use to demonstrate that ceteris paribus, clustering improves the efficacy of contact tracing for a large region of parameter space. This result is sometimes reversed, however, for the case of highly effective contact tracing. We also develop stochastic simulations for comparison, using simple re-wiring methods that allow the generation of appropriate comparator networks. In this way we contribute to the general theory of network-based interventions against infectious disease

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

The University of Manchester - Institutional Repository