We consider the problem of obtaining unbiased estimates of group properties
in social networks when using a classifier for node labels. Inference for this
problem is complicated by two factors: the network is not known and must be
crawled, and even high-performance classifiers provide biased estimates of
group proportions. We propose and evaluate AdjustedWalk for addressing this
problem. This is a three step procedure which entails: 1) walking the graph
starting from an arbitrary node; 2) learning a classifier on the nodes in the
walk; and 3) applying a post-hoc adjustment to classification labels. The walk
step provides the information necessary to make inferences over the nodes and
edges, while the adjustment step corrects for classifier bias in estimating
group proportions. This process provides de-biased estimates at the cost of
additional variance. We evaluate AdjustedWalk on four tasks: the proportion of
nodes belonging to a minority group, the proportion of the minority group among
high degree nodes, the proportion of within-group edges, and Coleman's
homophily index. Simulated and empirical graphs show that this procedure
performs well compared to optimal baselines in a variety of circumstances,
while indicating that variance increases can be large for low-recall
classifiers.Comment: 19 pages, 6 figures, 1 tabl