227 research outputs found
A Conditional Empirical Likelihood Based Method for Model Parameter Estimation from Complex survey Datasets
We consider an empirical likelihood framework for inference for a statistical
model based on an informative sampling design. Covariate information is
incorporated both through the weights and the estimating equations. The
estimator is based on conditional weights. We show that under usual conditions,
with population size increasing unbounded, the estimates are strongly
consistent, asymptotically unbiased and normally distributed. Our framework
provides additional justification for inverse probability weighted score
estimators in terms of conditional empirical likelihood. In doing so, it
bridges the gap between design-based and model-based modes of inference in
survey sampling settings. We illustrate these ideas with an application to an
electoral survey
Exponential-family Random Network Models
Random graphs, where the connections between nodes are considered random
variables, have wide applicability in the social sciences. Exponential-family
Random Graph Models (ERGM) have shown themselves to be a useful class of models
for representing com- plex social phenomena. We generalize ERGM by also
modeling nodal attributes as random variates, thus creating a random model of
the full network, which we call Exponential-family Random Network Models
(ERNM). We demonstrate how this framework allows a new formu- lation for
logistic regression in network data. We develop likelihood-based inference for
the model and an MCMC algorithm to implement it. This new model formulation is
used to analyze a peer social network from the National Lon- gitudinal Study of
Adolescent Health. We model the relationship between substance use and
friendship relations, and show how the results differ from the standard use of
logistic regression on network data
networksis: A Package to Simulate Bipartite Graphs with Fixed Marginals Through Sequential Importance Sampling
The ability to simulate graphs with given properties is important for the analysis of social networks. Sequential importance sampling has been shown to be particularly effective in estimating the number of graphs adhering to fixed marginals and in estimating the null distribution of graph statistics. This paper describes the networksis package for R and how its simulate and simulate_sis functions can be used to address both of these tasks as well as generate initial graphs for Markov chain Monte Carlo simulations
Modeling social networks from sampled data
Network models are widely used to represent relational information among
interacting units and the structural implications of these relations. Recently,
social network studies have focused a great deal of attention on random graph
models of networks whose nodes represent individual social actors and whose
edges represent a specified relationship between the actors. Most inference for
social network models assumes that the presence or absence of all possible
links is observed, that the information is completely reliable, and that there
are no measurement (e.g., recording) errors. This is clearly not true in
practice, as much network data is collected though sample surveys. In addition
even if a census of a population is attempted, individuals and links between
individuals are missed (i.e., do not appear in the recorded data). In this
paper we develop the conceptual and computational theory for inference based on
sampled network information. We first review forms of network sampling designs
used in practice. We consider inference from the likelihood framework, and
develop a typology of network data that reflects their treatment within this
frame. We then develop inference for social network models based on information
from adaptive network designs. We motivate and illustrate these ideas by
analyzing the effect of link-tracing sampling designs on a collaboration
network.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS221 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Analysis of Partially Observed Networks via Exponential-family Random Network Models
Exponential-family random network (ERN) models specify a joint representation
of both the dyads of a network and nodal characteristics. This class of models
allow the nodal characteristics to be modelled as stochastic processes,
expanding the range and realism of exponential-family approaches to network
modelling. In this paper we develop a theory of inference for ERN models when
only part of the network is observed, as well as specific methodology for
missing data, including non-ignorable mechanisms for network-based sampling
designs and for latent class models. In particular, we consider data collected
via contact tracing, of considerable importance to infectious disease
epidemiology and public health
A description of within-family resource exchange networks in a Malawian village
In this paper we explore patterns of economic transfers between adults within household and family networks in a village in Malawi’s Rumphi district, using data from the 2006 round of the Malawi Longitudinal Study of Families and Health. We fit Exponential-family Random Graph Models (ERGMs) to assess individual, relational, and higher-order network effects. The network effects of cyclic giving, reciprocity, and in-degree and out-degree distribution suggest a network with a tendency away from the formation of hierarchies or "hubs." Effects of age, sex, working status, education, health status, and kinship relation are also considered.Malawi, Malawi Longitudinal Study of Families and Health, networks, resource exchange, social network
On the Concept of Snowball Sampling
This brief comment reflects on the historical and current uses of the term
"snowball sampling."Comment: 5 pages, 0 figures. To appear in Sociological Methodolog
Respondent-Driven Sampling: An Assessment of Current Methodology
Respondent-Driven Sampling (RDS) employs a variant of a link-tracing network
sampling strategy to collect data from hard-to-reach populations. By tracing
the links in the underlying social network, the process exploits the social
structure to expand the sample and reduce its dependence on the initial
(convenience) sample.
The primary goal of RDS is typically to estimate population averages in the
hard-to-reach population. The current estimates make strong assumptions in
order to treat the data as a probability sample. In particular, we evaluate
three critical sensitivities of the estimators: to bias induced by the initial
sample, to uncontrollable features of respondent behavior, and to the
without-replacement structure of sampling.
This paper sounds a cautionary note for the users of RDS. While current RDS
methodology is powerful and clever, the favorable statistical properties
claimed for the current estimates are shown to be heavily dependent on often
unrealistic assumptions.Comment: 35 pages, 29 figures, under revie
Fitting Latent Cluster Models for Networks with latentnet
latentnet is a package to fit and evaluate statistical latent position and cluster models for networks. Hoff, Raftery, and Handcock (2002) suggested an approach to modeling networks based on positing the existence of an latent space of characteristics of the actors. Relationships form as a function of distances between these characteristics as well as functions of observed dyadic level covariates. In latentnet social distances are represented in a Euclidean space. It also includes a variant of the extension of the latent position model to allow for clustering of the positions developed in Handcock, Raftery, and Tantrum (2007). The package implements Bayesian inference for the models based on an Markov chain Monte Carlo algorithm. It can also compute maximum likelihood estimates for the latent position model and a two-stage maximum likelihood method for the latent position cluster model. For latent position cluster models, the package provides a Bayesian way of assessing how many groups there are, and thus whether or not there is any clustering (since if the preferred number of groups is 1, there is little evidence for clustering). It also estimates which cluster each actor belongs to. These estimates are probabilistic, and provide the probability of each actor belonging to each cluster. It computes four types of point estimates for the coefficients and positions: maximum likelihood estimate, posterior mean, posterior mode and the estimator which minimizes Kullback-Leibler divergence from the posterior. You can assess the goodness-of-fit of the model via posterior predictive checks. It has a function to simulate networks from a latent position or latent position cluster model.
On "Sexual contacts and epidemic thresholds," models and inference for Sexual partnership distributions
Recent work has focused attention on statistical inference for the population
distribution of the number of sexual partners based on survey data.
The characteristics of these distributions are of interest as components of
mathematical models for the transmission dynamics of sexually-transmitted
diseases (STDs). Such information can be used both to calibrate theoretical
models, to make predictions for real populations, and as a tool for guiding
public health policy.
Our previous work on this subject has developed likelihood-based statistical
methods for inference that allow for low-dimensional, semi-parametric models.
Inference has been based on several proposed stochastic process models for the
formation of sexual partnership networks. We have also developed model
selection criteria to choose between competing models, and assessed the fit of
different models to three populations: Uganda, Sweden, and the USA. Throughout
this work, we have emphasized the correct assessment of the uncertainty of the
estimates based on the data analyzed. We have also widened the question of
interest to the limitations of inferences from such data, and the utility of
degree-based epidemiological models more generally.
In this paper we address further statistical issues that are important in
this area, and a number of confusions that have arisen in interpreting our
work. In particular, we consider the use of cumulative lifetime partner
distributions, heaping and other issues raised by Liljeros et al. in a recent
working paper.Comment: 22 pages, 5 figures in linked working pape
- …