38 research outputs found
Modeling social networks from sampled data
Network models are widely used to represent relational information among
interacting units and the structural implications of these relations. Recently,
social network studies have focused a great deal of attention on random graph
models of networks whose nodes represent individual social actors and whose
edges represent a specified relationship between the actors. Most inference for
social network models assumes that the presence or absence of all possible
links is observed, that the information is completely reliable, and that there
are no measurement (e.g., recording) errors. This is clearly not true in
practice, as much network data is collected though sample surveys. In addition
even if a census of a population is attempted, individuals and links between
individuals are missed (i.e., do not appear in the recorded data). In this
paper we develop the conceptual and computational theory for inference based on
sampled network information. We first review forms of network sampling designs
used in practice. We consider inference from the likelihood framework, and
develop a typology of network data that reflects their treatment within this
frame. We then develop inference for social network models based on information
from adaptive network designs. We motivate and illustrate these ideas by
analyzing the effect of link-tracing sampling designs on a collaboration
network.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS221 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
On the Concept of Snowball Sampling
This brief comment reflects on the historical and current uses of the term
"snowball sampling."Comment: 5 pages, 0 figures. To appear in Sociological Methodolog
Respondent-Driven Sampling: An Assessment of Current Methodology
Respondent-Driven Sampling (RDS) employs a variant of a link-tracing network
sampling strategy to collect data from hard-to-reach populations. By tracing
the links in the underlying social network, the process exploits the social
structure to expand the sample and reduce its dependence on the initial
(convenience) sample.
The primary goal of RDS is typically to estimate population averages in the
hard-to-reach population. The current estimates make strong assumptions in
order to treat the data as a probability sample. In particular, we evaluate
three critical sensitivities of the estimators: to bias induced by the initial
sample, to uncontrollable features of respondent behavior, and to the
without-replacement structure of sampling.
This paper sounds a cautionary note for the users of RDS. While current RDS
methodology is powerful and clever, the favorable statistical properties
claimed for the current estimates are shown to be heavily dependent on often
unrealistic assumptions.Comment: 35 pages, 29 figures, under revie
Recommended from our members
Correcting for differential recruitment in respondent-driven sampling data using ego-network information
Respondent-Driven sampling (RDS) is a sampling method devised to overcome challenges with sampling hard-to-reach human populations. The sampling starts with a limited number of individuals who are asked to recruit a small number of their contacts. Every surveyed individual is subsequently given the same opportunity to recruit additional members of the target population until a pre-established sample size is achieved. The recruitment process consequently implies that the survey respondents are responsible for deciding who enters the study. Most RDS prevalence estimators assume that participants select among their contacts completely at random. The main objective of this work is to correct the inference for departure from this assumption, such as systematic recruitment based on the characteristics of the individuals or based on the nature of relationships. To accomplish this, we introduce three forms of non-random recruitment, provide estimators for these recruitment behaviors and extend three estimators and their associated variance procedures. The proposed methodology is assessed through a simulation study capturing various sampling and network features. Finally, the proposed methods are applied to a public health setting
Unequal Edge Inclusion Probabilities in Link-Tracing Network Sampling With Implications for Respondent-Driven Sampling
Respondent-Driven Sampling (RDS) is a widely adopted linktracing sampling design used to draw valid statistical inference from samples of populations for which there is no available sampling frame. RDS estimators rely upon the assumption that each edge (representing a relationship between two individuals) in the underlying network has an equal probability of being sampled. We show that this assumption is violated in even the simplest cases, and that RDS estimators are sensitive to the violation of this assumption
Bayesian Peer Calibration with Application to Alcohol Use
Peers are often able to provide important additional information to supplement self-reported behavioral measures. The study motivating this work collected data on alcohol in a social network formed by college students living in a freshman dormitory. By using two imperfect sources of information (self-reported and peer-reported alcohol consumption), rather than solely self-reports or peer-reports, we are able to gain insight into alcohol consumption on both the population and the individual level, as well as information on the discrepancy of individual peer-reports. We develop a novel Bayesian comparative calibration model for continuous, count and binary outcomes that uses covariate information to characterize the joint distribution of both self and peer-reports on the network for estimating peer-reporting discrepancies in network surveys, and apply this to the data for fully Bayesian inference. We use this model to understand the effects of covariates on both drinking behavior and peer-reporting discrepancies
Reduced Bias for Respondent Driven Sampling: Accounting for Non-Uniform Edge Sampling Probabilities in People Who Inject Drugs in Mauritius
People who inject drugs are an important population to study in order to reduce transmission of blood-borne illnesses including HIV and Hepatitis. In this paper we estimate the HIV and Hepatitis C prevalence among people who inject drugs, as well as the proportion of people who inject drugs who are female in Mauritius. Respondent driven sampling (RDS), a widely adopted link-tracing sampling design used to collect samples from hard-to-reach human populations, was used to collect this sample. The random walk approximation underlying many common RDS estimators assumes that each social relation (edge) in the underlying social network has an equal probability of being traced in the collection of the sample. This assumption does not hold in practice. We show that certain RDS estimators are sensitive to the violation of this assumption. In order to address this limitation in current methodology, and the impact it may have on prevalence estimates, we present a new method for improving RDS prevalence estimators using estimated edge inclusion probabilities, and apply this to data from Mauritius