153 research outputs found
Seeing the Unseen Network: Inferring Hidden Social Ties from Respondent-Driven Sampling
Learning about the social structure of hidden and hard-to-reach populations
--- such as drug users and sex workers --- is a major goal of epidemiological
and public health research on risk behaviors and disease prevention.
Respondent-driven sampling (RDS) is a peer-referral process widely used by many
health organizations, where research subjects recruit other subjects from their
social network. In such surveys, researchers observe who recruited whom, along
with the time of recruitment and the total number of acquaintances (network
degree) of respondents. However, due to privacy concerns, the identities of
acquaintances are not disclosed. In this work, we show how to reconstruct the
underlying network structure through which the subjects are recruited. We
formulate the dynamics of RDS as a continuous-time diffusion process over the
underlying graph and derive the likelihood for the recruitment time series
under an arbitrary recruitment time distribution. We develop an efficient
stochastic optimization algorithm called RENDER (REspoNdent-Driven nEtwork
Reconstruction) that finds the network that best explains the collected data.
We support our analytical results through an exhaustive set of experiments on
both synthetic and real data.Comment: A full version with technical proofs. Accepted by AAAI-1
Sex, lies and self-reported counts: Bayesian mixture models for heaping in longitudinal count data via birth-death processes
Surveys often ask respondents to report nonnegative counts, but respondents
may misremember or round to a nearby multiple of 5 or 10. This phenomenon is
called heaping, and the error inherent in heaped self-reported numbers can bias
estimation. Heaped data may be collected cross-sectionally or longitudinally
and there may be covariates that complicate the inferential task. Heaping is a
well-known issue in many survey settings, and inference for heaped data is an
important statistical problem. We propose a novel reporting distribution whose
underlying parameters are readily interpretable as rates of misremembering and
rounding. The process accommodates a variety of heaping grids and allows for
quasi-heaping to values nearly but not equal to heaping multiples. We present a
Bayesian hierarchical model for longitudinal samples with covariates to infer
both the unobserved true distribution of counts and the parameters that control
the heaping process. Finally, we apply our methods to longitudinal
self-reported counts of sex partners in a study of high-risk behavior in
HIV-positive youth.Comment: Published at http://dx.doi.org/10.1214/15-AOAS809 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …