Search CORE

192 research outputs found

Helmholtzian Eigenmap: Topological feature discovery & edge flow learning from point cloud data

Author: Chen Yu-Chia
Kevrekidis Ioannis G.
Meilă Marina
Publication venue
Publication date: 13/03/2021
Field of study

The manifold Helmholtzian (1-Laplacian) operator

\Delta_1

elegantly generalizes the Laplace-Beltrami operator to vector fields on a manifold

\mathcal M

. In this work, we propose the estimation of the manifold Helmholtzian from point cloud data by a weighted 1-Laplacian

\mathbf{\mathcal L}_1

. While higher order Laplacians ave been introduced and studied, this work is the first to present a graph Helmholtzian constructed from a simplicial complex as an estimator for the continuous operator in a non-parametric setting. Equipped with the geometric and topological information about

\mathcal M

, the Helmholtzian is a useful tool for the analysis of flows and vector fields on

\mathcal M

via the Helmholtz-Hodge theorem. In addition, the

\mathbf{\mathcal L}_1

allows the smoothing, prediction, and feature extraction of the flows. We demonstrate these possibilities on substantial sets of synthetic and real point cloud datasets with non-trivial topological structures; and provide theoretical results on the limit of

\mathbf{\mathcal L}_1

\Delta_1

arXiv.org e-Print Archive

Learning with mistures of trees

Author: Meilă-Predoviciu Marina, 1962-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1999
Field of study

Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 125-129).by Marina Meilă-Predoviciu.Ph.D

DSpace@MIT

A Markov model for inferring flows in directed contact networks

Author: E Ser-Giacomi
J Saramäki
LEC Rocha
M Meilă
M Starnini
N Masuda
N Perra
NF Ramsey
P Brémaud
P Grindrod
P Holme
P Lencastre
ST King
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/10/2018
Field of study

Directed contact networks (DCNs) are a particularly flexible and convenient class of temporal networks, useful for modeling and analyzing the transfer of discrete quantities in communications, transportation, epidemiology, etc. Transfers modeled by contacts typically underlie flows that associate multiple contacts based on their spatiotemporal relationships. To infer these flows, we introduce a simple inhomogeneous Markov model associated to a DCN and show how it can be effectively used for data reduction and anomaly detection through an example of kernel-level information transfers within a computer.Comment: 12 page

arXiv.org e-Print Archive

Crossref

An Approach to Web-Scale Named-Entity Disambiguation

Author: C. Whitelaw
I. Bhattacharya
L. Sarmento
M. Halkidi
M. Meilă
P. Pantel
S. Dill
S. Guha
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We present a multi-pass clustering approach to large scale. wide-scope named-entity disambiguation (NED) oil collections of web pages. Our approach Uses name co-occurrence information to cluster and hence disambiguate entities. and is designed to handle NED on the entire web. We show that on web collections, NED becomes increasing), difficult as the corpus size increases, not only because of the challenge of scaling the NED algorithm, but also because new and surprising facets of entities become visible in the data. This effect limits the potential benefits for data-driven approaches of processing larger data-sets, and suggests that efficient clustering-based disambiguation methods for the web will require extracting more specialized information front documents

Crossref

Repositório Aberto da Universidade do Porto

How the initialization affects the stability of the $k$ -means algorithm

Author: Bubeck Sébastien
Meilă Marina
von Luxburg Ulrike
Publication venue
Publication date: 01/01/2012
Field of study

Numérisation de Documents Anciens Mathématiques

Defining functional distances over Gene Ontology

Author: A Ng
A Schlicker
A Valencia
Alfonso Valencia
Angela del Pozo
B Smith
B Smith
BT Korber
C Mungall
D Verma
E Camon
F Chung
F Couto
Florencio Pazos
I Friedberg
J Tamames
JZ Wang
M Letunic
M Meilă
M Meilă
M Pellegrini
M Riley
NJ Mulder
P Lord
P Resnik
P Zhang
R Duda
S Rison
W Li
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background A fundamental problem when trying to define the functional relationships between proteins is the difficulty in quantifying functional similarities, even when well-structured ontologies exist regarding the activity of proteins (i.e. 'gene ontology' -GO-). However, functional metrics can overcome the problems in the comparing and evaluating functional assignments and predictions. As a reference of proximity, previous approaches to compare GO terms considered linkage in terms of ontology weighted by a probability distribution that balances the non-uniform 'richness' of different parts of the Direct Acyclic Graph. Here, we have followed a different approach to quantify functional similarities between GO terms. Results We propose a new method to derive 'functional distances' between GO terms that is based on the simultaneous occurrence of terms in the same set of Interpro entries, instead of relying on the structure of the GO. The coincidence of GO terms reveals natural biological links between the GO functions and defines a distance model <it>D</it><it>f </it>which fulfils the properties of a Metric Space. The distances obtained in this way can be represented as a hierarchical 'Functional Tree'. Conclusion The method proposed provides a new definition of distance that enables the similarity between GO terms to be quantified. Additionally, the 'Functional Tree' defines groups with biological meaning enhancing its utility for protein function comparison and prediction. Finally, this approach could be for function-based protein searches in databases, and for analysing the gene clusters produced by DNA array experiments.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital.CSIC

Deciphering Network Community Structure by Surprise

Author: A Lancichinetti
A Lancichinetti
A Marco
AL Barabási
BJ Breitkreutz
DJ Watts
EC Pielou
Eshel Ben-Jacob
Ignacio Marín
J Duch
JI Lucas
L Danon
LC Freeman
LD Costa
M Girvan
M Meilă
M Rosvall
MEJ Newman
MEJ Newman
MEJ Newman
P Ronhovde
R Aldecoa
RH MacArthur
Rodrigo Aldecoa
S Fortunato
S Fortunato
S Wasserman
SH Strogatz
SY Pu
V Arnau
VD Blondel
WW Zachary
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

The analysis of complex networks permeates all sciences, from biology to sociology. A fundamental, unsolved problem is how to characterize the community structure of a network. Here, using both standard and novel benchmarks, we show that maximization of a simple global parameter, which we call Surprise (S), leads to a very efficient characterization of the community structure of complex synthetic networks. Particularly, S qualitatively outperforms the most commonly used criterion to define communities, Newman and Girvan's modularity (Q). Applying S maximization to real networks often provides natural, well-supported partitions, but also sometimes counterintuitive solutions that expose the limitations of our previous knowledge. These results indicate that it is possible to define an effective global criterion for community structure and open new routes for the understanding of complex networks.Comment: 7 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Digital.CSIC

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

Author: Al Hasan
Al-Daoud
Aloise
Aloise
Anderberg
Babu
Babu
Ball
Bei
Bergmann
Bottou
Breunig
Cao
Celebi
Chen
Chen
Daniel
Forgy
Friedman
Garcia
Garcia
Gonzalez
Hartigan
Hassan A. Kingravi
Hotelling
Huang
Huang
Hubert
Hyvärinen
Iman
Jain
Jain
Jancey
Kanungo
Katsavounidis
Kaufman
Lance
Likas
Linde
Lloyd
Lu
Luengo
M. Emre Celebi
Maitra
Mao
Matsumoto
Meilă
Milligan
Milligan
Norušis
Onoda
Ordonez
Pal
Patricio A. Vela
Pena
Redmond
Selim
Späth
Su
Tarsitano
Tou
Wu
Zhang
Publication venue: 'Elsevier BV'
Publication date: 10/09/2012
Field of study

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.Comment: 17 pages, 1 figure, 7 table

arXiv.org e-Print Archive

Crossref