39 research outputs found
Differentially Private Exponential Random Graphs
We propose methods to release and analyze synthetic graphs in order to
protect privacy of individual relationships captured by the social network.
Proposed techniques aim at fitting and estimating a wide class of exponential
random graph models (ERGMs) in a differentially private manner, and thus offer
rigorous privacy guarantees. More specifically, we use the randomized response
mechanism to release networks under -edge differential privacy. To
maintain utility for statistical inference, treating the original graph as
missing, we propose a way to use likelihood based inference and Markov chain
Monte Carlo (MCMC) techniques to fit ERGMs to the produced synthetic networks.
We demonstrate the usefulness of the proposed techniques on a real data
example.Comment: minor edit
Vertex Clustering in Random Graphs via Reversible Jump Markov Chain Monte Carlo
Networks are a natural and effective tool to study relational data, in which observations are collected on pairs of units. The units are represented by nodes and their relations by edges. In biology, for example, proteins and their interactions, and, in social science, people and inter-personal relations may be the nodes and the edges of the network. In this paper we address the question of clustering vertices in networks, as a way to uncover homogeneity patterns in data that enjoy a network representation. We use a mixture model for random graphs and propose a reversible jump Markov chain Monte Carlo algorithm to infer its parameters. Applications of the algorithm to one simulated data set and three real data sets, which describe friendships among members of a University karate club, social interactions of dolphins, and gap junctions in the C. Elegans, are given
Sequential design of computer experiments for the estimation of a probability of failure
This paper deals with the problem of estimating the volume of the excursion
set of a function above a given threshold,
under a probability measure on that is assumed to be known. In
the industrial world, this corresponds to the problem of estimating a
probability of failure of a system. When only an expensive-to-simulate model of
the system is available, the budget for simulations is usually severely limited
and therefore classical Monte Carlo methods ought to be avoided. One of the
main contributions of this article is to derive SUR (stepwise uncertainty
reduction) strategies from a Bayesian-theoretic formulation of the problem of
estimating a probability of failure. These sequential strategies use a Gaussian
process model of and aim at performing evaluations of as efficiently as
possible to infer the value of the probability of failure. We compare these
strategies to other strategies also based on a Gaussian process model for
estimating a probability of failure.Comment: This is an author-generated postprint version. The published version
is available at http://www.springerlink.co
Sequential importance sampling for bipartite graphs with applications to likelihood-based inference
The ability to simulate graphs with given properties is important for the analysis of social networks. Sequential importance sampling has been shown to be particularly effective in estimating the number of graphs adhering to fixed marginals and in estimating the null distribution of test statistics. This paper builds on the work of Chen et al. (2005), providing an intuitive explanation of the sequential importance sampling algorithm as well as several examples to illustrate how the algorithm can be implemented for bipartite graphs. We examine the performance of sequential importance sampling for likelihood-based inference in comparison with Markov chain Monte Carlo, and find little empirical evidence to suggest that sequential importance sampling outperforms Markov chain Monte Carlo, even for sparse graphs or graphs with skewed marginals
networksis: A package to simulate bipartite graphs with fixed marginals through sequential importance sampling
The ability to simulate graphs with given properties is important for the analysis of social networks. Sequential importance sampling has been shown to be particularly effective in estimating the number of graphs adhering to fixed marginals and in estimating the null distribution of graph statistics. This paper describes the networksis package for R and how its simulate and simulate_sis functions can be used to address both of these tasks as well as generate initial graphs for Markov chain Monte Carlo simulations
Early entry to fatherhood estimated from men's and women's survey reports in combination
While underreporting of fatherhood is a widely acknowledged problem, satisfactory methods for its correction have yet to be developed. In the present study, we investigate methods of correction that are specific to marital status at the time of the birth and at the time of retrospective reporting, focusing on fatherhood under age 30. Matched women’s and men’s survey reports of births, in each case reported by marital status and age of the father, form the basis for our corrections. Male age-specific fertility rates are estimated from these survey data by using women’s reports for the births numerator and men’s reports for the exposed-years denominator. These are shown to match well to male age specific fertility rates estimated from population data sources. When marital births in the men’s and women’s survey data are differentiated by whether the birth is within a current or previous marriage, only for births in previous marriages is there a male reporting deficit. Further, this deficit is completely explained by under-representation of men’s exposed years in previous marriages. We find no evidence of underreporting of births for those exposed years. These results are used to develop a constrained maximum likelihood estimator in which male fertility is constrained by age and marital status, with a focus on correcting for underreported non-marital fertility
A framework for the comparison of maximum pseudo likelihood and maximum likelihood estimation of exponential family random graph models
The statistical modeling of social network data is difficult due to the complex dependence structure of the
tie variables. Statistical exponential families of distributions provide a flexible way to model such dependence.
They enable the statistical characteristics of the network to be encapsulated within an exponential
family random graph (ERG) model. For a long time, however, likelihood-based estimationwas only feasible
for ERG models assuming dyad independence. For more realistic and complex models inference has been
based on the pseudo-likelihood. Recent advances in computational methods have made likelihood-based
inference practical, and comparison of the different estimators possible.
In this paper, we present methodology to enable estimators of ERG model parameters to be compared.
We use this methodology to compare the bias, standard errors, coverage rates and efficiency of maximum
likelihood and maximum pseudo-likelihood estimators.We also propose an improved pseudo-likelihood
estimation method aimed at reducing bias. The comparison is performed using simulated social network
data based on two versions of an empirically realistic network model, the first representing Lazega’s
law firm data and the second a modified version with increased transitivity. The framework considers
estimation of both the natural and the mean-value parameters.
The results clearly showthe superiority of the likelihood-based estimators over those based on pseudolikelihood,
with the bias-reduced pseudo-likelihood out-performing the general pseudo-likelihood. The
use of the mean-value parameterization provides insight into the differences between the estimators and
when these differences will matter in practice.