30 research outputs found

    A Kriging procedure for processes indexed by graphs

    Get PDF
    International audienceWe provide a new kriging procedure of processes on graphs. Based on the construction of Gaussian random processes indexed by graphs, we extend to this framework the usual linear prediction method for spatial random fields, known as kriging. We provide the expression of the estimator of such a random field at unobserved locations as well as a control for the prediction error

    A Statistical Test of Heterogeneous Subgraph Densities to Assess Clusterability

    Get PDF
    Determining if a graph displays a clustered structure prior to subjecting it to any cluster detection technique has recently gained attention in the literature. Attempts to group graph vertices into clusters when a graph does not have a clustered structure is not only a waste of time; it will also lead to misleading conclusions. To address this problem, we introduce a novel statistical test, the-test, which is based on comparisons of local and global densities. Our goal is to assess whether a given graph meets the necessary conditions to be meaningfully summarized by clusters of vertices. We empirically explore our test’s behavior under a number of graph structures. We also compare it to other recently published tests. From a theoretical standpoint, our test is more general, versatile and transparent than recently published competing techniques. It is based on the examination of intuitive quantities, applies equally to weighted and unweighted graphs and allows comparisons across graphs. More importantly, it does not rely on any distributional assumptions, other than the universally accepted definition of a clustered graph. Empirically, our test is shown to be more responsive to graph structure than other competing tests

    Revisiting clustering as matrix factorisation on the Stiefel manifold

    Get PDF
    International audienceThis paper studies clustering for possibly high dimensional data (e.g. images, time series, gene expression data, and many other settings), and rephrase it as low rank matrix estimation in the PAC-Bayesian framework. Our approach leverages the well known Burer-Monteiro factorisation strategy from large scale optimisation, in the context of low rank estimation. Moreover, our Burer-Monteiro factors are shown to lie on a Stiefel manifold. We propose a new generalized Bayesian estimator for this problem and prove novel prediction bounds for clustering. We also devise a componentwise Langevin sampler on the Stiefel manifold to compute this estimator

    Tests for Gaussian graphical models

    No full text
    Gaussian graphical models are promising tools for analysing genetic networks. In many applications, biologists have some knowledge of the genetic network and may want to assess the quality of their model using gene expression data. This is why one introduces a novel procedure for testing the neighborhoods of a Gaussian graphical model. It is based on the connection between the local Markov property and conditional regression of a Gaussian random variable. Adapting recent results on tests for high-dimensional Gaussian linear models, one proves that the testing procedure inherits appealing theoretical properties. Besides, it applies and is computationally feasible in a high-dimensional setting: the number of nodes may be much larger than the number of observations. A large part of the study is devoted to illustrating and discussing applications to simulated data and to biological data.

    Conditional-mean least-squares fitting of Gaussian Markov random fields to Gaussian fields

    No full text
    This article discusses the following problem, often encountered when analyzing spatial lattice data. How can one construct a Gaussian Markov random field (GMRF), on a lattice, that reflects well the spatial-covariance properties present either in data or in prior knowledge? The Markov property on a spatial lattice implies spatial dependence expressed conditionally, which allows intuitively appealing site-by-site model building. There are also cases, such as in biological network analysis, where the Markov property has a deep scientific significance. Moreover, the model is often important for computational efficiency of Markov chain Monte Carlo algorithms. In this article, we introduce a new criterion to fit a GMRF to a given Gaussian field, where the Gaussian field is characterized by its spatial covariances. We establish that this criterion is computationally appealing, it can be used on both regular and irregular lattices, and both stationary and nonstationary fields can be fitted. © 2007 Elsevier B.V. All rights reserved

    Community detection in dense random networks

    No full text
    International audienceWe formalize the problem of detecting a community in a network into testing whether in a given (random) graph there is a subgraph that is unusually dense. Specifically, we observe an undirected and unweighted graph on N nodes. Under the null hypothesis, the graph is a realization of an Erdős–Rényi graph with probability p0. Under the (composite) alternative, there is an unknown subgraph of n nodes where the probability of connection is p1>p0. We derive a detection lower bound for detecting such a subgraph in terms of N, n, p0, p1 and exhibit a test that achieves that lower bound. We do this both when p0 is known and unknown. We also consider the problem of testing in polynomial-time. As an aside, we consider the problem of detecting a clique, which is intimately related to the planted clique problem. Our focus in this paper is in the quasi-normal regime where np0 is either bounded away from zero, or tends to zero slowly
    corecore