4,441 research outputs found

    Network inference and community detection, based on covariance matrices, correlations and test statistics from arbitrary distributions

    Get PDF
    In this paper we propose methodology for inference of binary-valued adjacency matrices from various measures of the strength of association between pairs of network nodes, or more generally pairs of variables. This strength of association can be quantified by sample covariance and correlation matrices, and more generally by test-statistics and hypothesis test p-values from arbitrary distributions. Community detection methods such as block modelling typically require binary-valued adjacency matrices as a starting point. Hence, a main motivation for the methodology we propose is to obtain binary-valued adjacency matrices from such pairwise measures of strength of association between variables. The proposed methodology is applicable to large high-dimensional data-sets and is based on computationally efficient algorithms. We illustrate its utility in a range of contexts and data-sets

    A Consistent Histogram Estimator for Exchangeable Graph Models

    Full text link
    Exchangeable graph models (ExGM) subsume a number of popular network models. The mathematical object that characterizes an ExGM is termed a graphon. Finding scalable estimators of graphons, provably consistent, remains an open issue. In this paper, we propose a histogram estimator of a graphon that is provably consistent and numerically efficient. The proposed estimator is based on a sorting-and-smoothing (SAS) algorithm, which first sorts the empirical degree of a graph, then smooths the sorted graph using total variation minimization. The consistency of the SAS algorithm is proved by leveraging sparsity concepts from compressed sensing.Comment: 28 pages, 5 figure

    Risk profiling of hookworm infection and intensity in southern Lao People's Democratic Republic using bayesian models

    Get PDF
    Among the common soil-transmitted helminth infections, hookworm causes the highest burden. Previous research in the southern part of Lao People's Democratic Republic (Lao PDR) revealed high prevalence rates of hookworm infection. The purpose of this study was to predict the spatial distribution of hookworm infection and intensity, and to investigate risk factors in the Champasack province, southern Lao PDR.; A cross-sectional parasitological and questionnaire survey was conducted in 51 villages. Data on demography, socioeconomic status, water, sanitation, and behavior were combined with remotely sensed environmental data. Bayesian mixed effects logistic and negative binomial models were utilized to investigate risk factors and spatial distribution of hookworm infection and intensity, and to make predictions for non-surveyed locations.; A total of 3,371 individuals were examined with duplicate Kato-Katz thick smears and revealed a hookworm prevalence of 48.8%. Most infections (91.7%) were of light intensity (1-1,999 eggs/g of stool). Lower hookworm infection levels were associated with higher socioeconomic status. The lowest infection levels were found in preschool-aged children. Overall, females were at lower risk of infection, but women aged 50 years and above harbored the heaviest hookworm infection intensities. Hookworm was widespread in Champasack province with little evidence for spatial clustering. Infection risk was somewhat lower in the lowlands, mostly along the western bank of the Mekong River, while infection intensity was homogeneous across the Champasack province.; Hookworm transmission seems to occur within, rather than between villages in Champasack province. We present spatial risk maps of hookworm infection and intensity, which suggest that control efforts should be intensified in the Champasack province, particularly in mountainous areas

    Stochastic blockmodel approximation of a graphon: Theory and consistent estimation

    Full text link
    Non-parametric approaches for analyzing network data based on exchangeable graph models (ExGM) have recently gained interest. The key object that defines an ExGM is often referred to as a graphon. This non-parametric perspective on network modeling poses challenging questions on how to make inference on the graphon underlying observed network data. In this paper, we propose a computationally efficient procedure to estimate a graphon from a set of observed networks generated from it. This procedure is based on a stochastic blockmodel approximation (SBA) of the graphon. We show that, by approximating the graphon with a stochastic block model, the graphon can be consistently estimated, that is, the estimation error vanishes as the size of the graph approaches infinity.Comment: 20 pages, 4 figures, 2 algorithms. Neural Information Processing Systems (NIPS), 201
    • …
    corecore