38 research outputs found

    Modeling social networks from sampled data

    Full text link
    Network models are widely used to represent relational information among interacting units and the structural implications of these relations. Recently, social network studies have focused a great deal of attention on random graph models of networks whose nodes represent individual social actors and whose edges represent a specified relationship between the actors. Most inference for social network models assumes that the presence or absence of all possible links is observed, that the information is completely reliable, and that there are no measurement (e.g., recording) errors. This is clearly not true in practice, as much network data is collected though sample surveys. In addition even if a census of a population is attempted, individuals and links between individuals are missed (i.e., do not appear in the recorded data). In this paper we develop the conceptual and computational theory for inference based on sampled network information. We first review forms of network sampling designs used in practice. We consider inference from the likelihood framework, and develop a typology of network data that reflects their treatment within this frame. We then develop inference for social network models based on information from adaptive network designs. We motivate and illustrate these ideas by analyzing the effect of link-tracing sampling designs on a collaboration network.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS221 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On the Concept of Snowball Sampling

    Full text link
    This brief comment reflects on the historical and current uses of the term "snowball sampling."Comment: 5 pages, 0 figures. To appear in Sociological Methodolog

    Respondent-Driven Sampling: An Assessment of Current Methodology

    Full text link
    Respondent-Driven Sampling (RDS) employs a variant of a link-tracing network sampling strategy to collect data from hard-to-reach populations. By tracing the links in the underlying social network, the process exploits the social structure to expand the sample and reduce its dependence on the initial (convenience) sample. The primary goal of RDS is typically to estimate population averages in the hard-to-reach population. The current estimates make strong assumptions in order to treat the data as a probability sample. In particular, we evaluate three critical sensitivities of the estimators: to bias induced by the initial sample, to uncontrollable features of respondent behavior, and to the without-replacement structure of sampling. This paper sounds a cautionary note for the users of RDS. While current RDS methodology is powerful and clever, the favorable statistical properties claimed for the current estimates are shown to be heavily dependent on often unrealistic assumptions.Comment: 35 pages, 29 figures, under revie

    Unequal Edge Inclusion Probabilities in Link-Tracing Network Sampling With Implications for Respondent-Driven Sampling

    Get PDF
    Respondent-Driven Sampling (RDS) is a widely adopted linktracing sampling design used to draw valid statistical inference from samples of populations for which there is no available sampling frame. RDS estimators rely upon the assumption that each edge (representing a relationship between two individuals) in the underlying network has an equal probability of being sampled. We show that this assumption is violated in even the simplest cases, and that RDS estimators are sensitive to the violation of this assumption

    Bayesian Peer Calibration with Application to Alcohol Use

    Get PDF
    Peers are often able to provide important additional information to supplement self-reported behavioral measures. The study motivating this work collected data on alcohol in a social network formed by college students living in a freshman dormitory. By using two imperfect sources of information (self-reported and peer-reported alcohol consumption), rather than solely self-reports or peer-reports, we are able to gain insight into alcohol consumption on both the population and the individual level, as well as information on the discrepancy of individual peer-reports. We develop a novel Bayesian comparative calibration model for continuous, count and binary outcomes that uses covariate information to characterize the joint distribution of both self and peer-reports on the network for estimating peer-reporting discrepancies in network surveys, and apply this to the data for fully Bayesian inference. We use this model to understand the effects of covariates on both drinking behavior and peer-reporting discrepancies

    Reduced Bias for Respondent Driven Sampling: Accounting for Non-Uniform Edge Sampling Probabilities in People Who Inject Drugs in Mauritius

    Get PDF
    People who inject drugs are an important population to study in order to reduce transmission of blood-borne illnesses including HIV and Hepatitis. In this paper we estimate the HIV and Hepatitis C prevalence among people who inject drugs, as well as the proportion of people who inject drugs who are female in Mauritius. Respondent driven sampling (RDS), a widely adopted link-tracing sampling design used to collect samples from hard-to-reach human populations, was used to collect this sample. The random walk approximation underlying many common RDS estimators assumes that each social relation (edge) in the underlying social network has an equal probability of being traced in the collection of the sample. This assumption does not hold in practice. We show that certain RDS estimators are sensitive to the violation of this assumption. In order to address this limitation in current methodology, and the impact it may have on prevalence estimates, we present a new method for improving RDS prevalence estimators using estimated edge inclusion probabilities, and apply this to data from Mauritius
    corecore