Search CORE

12,697 research outputs found

Latent Distance Estimation for Random Geometric Graphs

Author: Araya Ernesto
De Castro Yohann
Publication venue
Publication date: 15/09/2019
Field of study

Random geometric graphs are a popular choice for a latent points generative model for networks. Their definition is based on a sample of

n

points

X_1,X_2,\cdots,X_n

on the Euclidean sphere~

\mathbb{S}^{d-1}

which represents the latent positions of nodes of the network. The connection probabilities between the nodes are determined by an unknown function (referred to as the "link" function) evaluated at the distance between the latent points. We introduce a spectral estimator of the pairwise distance between latent points and we prove that its rate of convergence is the same as the nonparametric estimation of a function on

\mathbb{S}^{d-1}

, up to a logarithmic factor. In addition, we provide an efficient spectral algorithm to compute this estimator without any knowledge on the nonparametric link function. As a byproduct, our method can also consistently estimate the dimension

d

of the latent space

arXiv.org e-Print Archive

HAL-UJM

Consistency of Maximum Likelihood for Continuous-Space Network Models

Author: Asta Dena
Shalizi Cosma Rohilla
Publication venue
Publication date: 01/10/2019
Field of study

Network analysis needs tools to infer distributions over graphs of arbitrary size from a single graph. Assuming the distribution is generated by a continuous latent space model which obeys certain natural symmetry and smoothness properties, we establish three levels of consistency for non-parametric maximum likelihood inference as the number of nodes grows: (i) the estimated locations of all nodes converge in probability on their true locations; (ii) the distribution over locations in the latent space converges on the true distribution; and (iii) the distribution over graphs of arbitrary size converges.Comment: 21 page

arXiv.org e-Print Archive

Projective, Sparse, and Learnable Latent Position Network Models

Author: Shalizi Cosma Rohilla
Spencer Neil A.
Publication venue
Publication date: 07/02/2020
Field of study

When modeling network data using a latent position model, it is typical to assume that the nodes' positions are independently and identically distributed. However, this assumption implies the average node degree grows linearly with the number of nodes, which is inappropriate when the graph is thought to be sparse. We propose an alternative assumption---that the latent positions are generated according to a Poisson point process---and show that it is compatible with various levels of sparsity. Unlike other notions of sparse latent position models in the literature, our framework also defines a projective sequence of probability models, thus ensuring consistency of statistical inference across networks of different sizes. We establish conditions for consistent estimation of the latent positions, and compare our results to existing frameworks for modeling sparse networks.Comment: 51 pages, 2 figure

arXiv.org e-Print Archive

Graphs in machine learning: an introduction

Author: Latouche Pierre
Rossi Fabrice
Publication venue
Publication date: 22/04/2015
Field of study

Graphs are commonly used to characterise interactions between objects of interest. Because they are based on a straightforward formalism, they are used in many scientific fields from computer science to historical sciences. In this paper, we give an introduction to some methods relying on graphs for learning. This includes both unsupervised and supervised methods. Unsupervised learning algorithms usually aim at visualising graphs in latent spaces and/or clustering the nodes. Both focus on extracting knowledge from graph topologies. While most existing techniques are only applicable to static graphs, where edges do not evolve through time, recent developments have shown that they could be extended to deal with evolving networks. In a supervised context, one generally aims at inferring labels or numerical values attached to nodes using both the graph and, when they are available, node characteristics. Balancing the two sources of information can be challenging, especially as they can disagree locally or globally. In both contexts, supervised and un-supervised, data can be relational (augmented with one or several global graphs) as described above, or graph valued. In this latter case, each object of interest is given as a full graph (possibly completed by other characteristics). In this context, natural tasks include graph clustering (as in producing clusters of graphs rather than clusters of nodes in a single graph), graph classification, etc. 1 Real networks One of the first practical studies on graphs can be dated back to the original work of Moreno [51] in the 30s. Since then, there has been a growing interest in graph analysis associated with strong developments in the modelling and the processing of these data. Graphs are now used in many scientific fields. In Biology [54, 2, 7], for instance, metabolic networks can describe pathways of biochemical reactions [41], while in social sciences networks are used to represent relation ties between actors [66, 56, 36, 34]. Other examples include powergrids [71] and the web [75]. Recently, networks have also been considered in other areas such as geography [22] and history [59, 39]. In machine learning, networks are seen as powerful tools to model problems in order to extract information from data and for prediction purposes. This is the object of this paper. For more complete surveys, we refer to [28, 62, 49, 45]. In this section, we introduce notations and highlight properties shared by most real networks. In Section 2, we then consider methods aiming at extracting information from a unique network. We will particularly focus on clustering methods where the goal is to find clusters of vertices. Finally, in Section 3, techniques that take a series of networks into account, where each network i

arXiv.org e-Print Archive

HAL-Paris1

From random walks to distances on unweighted graphs

Author: Hashimoto Tatsunori B.
Jaakkola Tommi S.
Sun Yi
Publication venue
Publication date: 02/11/2015
Field of study

Large unweighted directed graphs are commonly used to capture relations between entities. A fundamental problem in the analysis of such networks is to properly define the similarity or dissimilarity between any two vertices. Despite the significance of this problem, statistical characterization of the proposed metrics has been limited. We introduce and develop a class of techniques for analyzing random walks on graphs using stochastic calculus. Using these techniques we generalize results on the degeneracy of hitting times and analyze a metric based on the Laplace transformed hitting time (LTHT). The metric serves as a natural, provably well-behaved alternative to the expected hitting time. We establish a general correspondence between hitting times of the Brownian motion and analogous hitting times on the graph. We show that the LTHT is consistent with respect to the underlying metric of a geometric graph, preserves clustering tendency, and remains robust against random addition of non-geometric edges. Tests on simulated and real-world data show that the LTHT matches theoretical predictions and outperforms alternatives.Comment: To appear in NIPS 201

arXiv.org e-Print Archive

DSpace@MIT

Learning the distribution of latent variables in paired comparison models with round-robin scheduling

Author: Corff Sylvain Le
Diel Roland
Lerasle Matthieu
Publication venue
Publication date: 01/01/2020
Field of study

Paired comparison data considered in this paper originate from the comparison of a large number N of individuals in couples. The dataset is a collection of results of contests between two individuals when each of them has faced n opponents, where n is much larger than N. Individual are represented by independent and identically distributed random parameters characterizing their abilities.The paper studies the maximum likelihood estimator of the parameters distribution. The analysis relies on the construction of a graphical model encoding conditional dependencies of the observations which are the outcomes of the first n contests each individual is involved in. This graphical model allows to prove geometric loss of memory properties and deduce the asymptotic behavior of the likelihood function. This paper sets the focus on graphical models obtained from round-robin scheduling of these contests.Following a classical construction in learning theory, the asymptotic likelihood is used to measure performance of the maximum likelihood estimator. Risk bounds for this estimator are finally obtained by sub-Gaussian deviation results for Markov chains applied to the graphical model

arXiv.org e-Print Archive

HAL-Polytechnique

Testing for high-dimensional geometry in random graphs

Author: Bubeck Sébastien
Ding Jian
Eldan Ronen
Rácz Miklós
Publication venue
Publication date: 21/11/2015
Field of study

We study the problem of detecting the presence of an underlying high-dimensional geometric structure in a random graph. Under the null hypothesis, the observed graph is a realization of an Erd\H{o}s-R\'enyi random graph

G(n,p)

. Under the alternative, the graph is generated from the

G(n,p,d)

model, where each vertex corresponds to a latent independent random vector uniformly distributed on the sphere

\mathbb{S}^{d-1}

, and two vertices are connected if the corresponding latent vectors are close enough. In the dense regime (i.e.,

p

is a constant), we propose a near-optimal and computationally efficient testing procedure based on a new quantity which we call signed triangles. The proof of the detection lower bound is based on a new bound on the total variation distance between a Wishart matrix and an appropriately normalized GOE matrix. In the sparse regime, we make a conjecture for the optimal detection boundary. We conclude the paper with some preliminary steps on the problem of estimating the dimension in

G(n,p,d)

.Comment: 28 pages; v2 contains minor change

arXiv.org e-Print Archive

Princeton University Open Access Repository