Search CORE

4,109 research outputs found

KADABRA is an ADaptive Algorithm for Betweenness via Random Approximation

Author: Borassi Michele
Natale Emanuele
Publication venue
Publication date: 01/01/2016
Field of study

We present KADABRA, a new algorithm to approximate betweenness centrality in directed and undirected graphs, which significantly outperforms all previous approaches on real-world complex networks. The efficiency of the new algorithm relies on two new theoretical contributions, of independent interest. The first contribution focuses on sampling shortest paths, a subroutine used by most algorithms that approximate betweenness centrality. We show that, on realistic random graph models, we can perform this task in time

|E|^{\frac{1}{2}+o(1)}

with high probability, obtaining a significant speedup with respect to the

\Theta(|E|)

worst-case performance. We experimentally show that this new technique achieves similar speedups on real-world complex networks, as well. The second contribution is a new rigorous application of the adaptive sampling technique. This approach decreases the total number of shortest paths that need to be sampled to compute all betweenness centralities with a given absolute error, and it also handles more general problems, such as computing the

k

most central nodes. Furthermore, our analysis is general, and it might be extended to other settings.Comment: Some typos correcte

arXiv.org e-Print Archive

HAL Descartes

Dagstuhl Research Online Publication Server

Archivio della ricerca- Università di Roma La Sapienza

MPG.PuRe

Hal-Diderot

Elastic Maps and Nets for Approximating Principal Manifolds and Their Application to Microarray Data Visualization

Author: A Gorban
A Gorban
A Gusev
A Zinovyev
A. N. Gorban
AJ Smola
AJ Smola
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AN Gorban
AY Zinovyev
AY Zinovyev
B Kégl
B Kégl
B Mirkin
B Schölkopf
CM Bishop
CM Perou
D Stanford
DG Kendall
E Erwin
F Mulier
H Ritter
H Yin
H Yin
H Zou
JB Tenenbaum
JD Banfield
K Pearson
Kégl
L Aizenberg
L Dyrskjot
M Born
M Frećhet
M LeBlanc
M Oja
R Durbin
R Sayle
R Shyamsundar
S Kaski
S Roweis
T Hastie
T Hastie
T Kohonen
VA Dergachev
W Cai
Y Wang
YF Leung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/12/2007
Field of study

Principal manifolds are defined as lines or surfaces passing through ``the middle'' of data distribution. Linear principal manifolds (Principal Components Analysis) are routinely used for dimension reduction, noise filtering and data visualization. Recently, methods for constructing non-linear principal manifolds were proposed, including our elastic maps approach which is based on a physical analogy with elastic membranes. We have developed a general geometric framework for constructing ``principal objects'' of various dimensions and topologies with the simplest quadratic form of the smoothness penalty which allows very effective parallel implementations. Our approach is implemented in three programming languages (C++, Java and Delphi) with two graphical user interfaces (VidaExpert http://bioinfo.curie.fr/projects/vidaexpert and ViMiDa http://bioinfo-out.curie.fr/projects/vimida applications). In this paper we overview the method of elastic maps and present in detail one of its major applications: the visualization of microarray data in bioinformatics. We show that the method of elastic maps outperforms linear PCA in terms of data approximation, representation of between-point distance structure, preservation of local point neighborhood and representing point classes in low-dimensional spaces.Comment: 35 pages 10 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Correlation Clustering with Adaptive Similarity Queries

Author: Bressan Marco
Cesa-Bianchi Nicolò
Paudice Andrea
Vitale Fabio
Publication venue
Publication date: 01/01/2019
Field of study

In correlation clustering, we are given

n

objects together with a binary similarity score between each pair of them. The goal is to partition the objects into clusters so to minimise the disagreements with the scores. In this work we investigate correlation clustering as an active learning problem: each similarity score can be learned by making a query, and the goal is to minimise both the disagreements and the total number of queries. On the one hand, we describe simple active learning algorithms, which provably achieve an almost optimal trade-off while giving cluster recovery guarantees, and we test them on different datasets. On the other hand, we prove information-theoretical bounds on the number of queries necessary to guarantee a prescribed disagreement bound. These results give a rich characterization of the trade-off between queries and clustering error

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

INRIA a CCSD electronic archive server

HAL Descartes

Archivio della ricerca- Università di Roma La Sapienza

Hal-Diderot

In situ analysis for intelligent control

Author: Fox M.
Long D.
Py F.
Rajan K.
Ryan J.
Publication venue
Publication date: 01/01/2007
Field of study

We report a pilot study on in situ analysis of backscatter data for intelligent control of a scientific instrument on an Autonomous Underwater Vehicle (AUV) carried out at the Monterey Bay Aquarium Research Institute (MBARI). The objective of the study is to investigate techniques which use machine intelligence to enable event-response scenarios. Specifically we analyse a set of techniques for automated sample acquisition in the water-column using an electro-mechanical "Gulper", designed at MBARI. This is a syringe-like sampling device, carried onboard an AUV. The techniques we use in this study are clustering algorithms, intended to identify the important distinguishing characteristics of bodies of points within a data sample. We demonstrate that the complementary features of two clustering approaches can offer robust identification of interesting features in the water-column, which, in turn, can support automatic event-response control in the use of the Gulper

Crossref

University of Strathclyde Institutional Repository

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Author: Bengio Samy
Chiang Wei-Lin
Hsieh Cho-Jui
Li Yang
Liu Xuanqing
Si Si
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/08/2019
Field of study

Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy---using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16]. Our codes are publicly available at https://github.com/google-research/google-research/tree/master/cluster_gcn.Comment: In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'19

arXiv.org e-Print Archive

Crossref