45 research outputs found
Sampling Online Social Networks via Heterogeneous Statistics
Most sampling techniques for online social networks (OSNs) are based on a
particular sampling method on a single graph, which is referred to as a
statistics. However, various realizing methods on different graphs could
possibly be used in the same OSN, and they may lead to different sampling
efficiencies, i.e., asymptotic variances. To utilize multiple statistics for
accurate measurements, we formulate a mixture sampling problem, through which
we construct a mixture unbiased estimator which minimizes asymptotic variance.
Given fixed sampling budgets for different statistics, we derive the optimal
weights to combine the individual estimators; given fixed total budget, we show
that a greedy allocation towards the most efficient statistics is optimal. In
practice, the sampling efficiencies of statistics can be quite different for
various targets and are unknown before sampling. To solve this problem, we
design a two-stage framework which adaptively spends a partial budget to test
different statistics and allocates the remaining budget to the inferred best
statistics. We show that our two-stage framework is a generalization of 1)
randomly choosing a statistics and 2) evenly allocating the total budget among
all available statistics, and our adaptive algorithm achieves higher efficiency
than these benchmark strategies in theory and experiment
On sampling social networking services
This article aims at summarizing the existing methods for sampling social
networking services and proposing a faster confidence interval for related
sampling methods. It also includes comparisons of common network sampling
techniques
Do we really need to catch them all? A new User-guided Social Media Crawling method
With the growing use of popular social media services like Facebook and
Twitter it is challenging to collect all content from the networks without
access to the core infrastructure or paying for it. Thus, if all content cannot
be collected one must consider which data are of most importance. In this work
we present a novel User-guided Social Media Crawling method (USMC) that is able
to collect data from social media, utilizing the wisdom of the crowd to decide
the order in which user generated content should be collected to cover as many
user interactions as possible. USMC is validated by crawling 160 public
Facebook pages, containing content from 368 million users including 1.3 billion
interactions, and it is compared with two other crawling methods. The results
show that it is possible to cover approximately 75% of the interactions on a
Facebook page by sampling just 20% of its posts, and at the same time reduce
the crawling time by 53%. In addition, the social network constructed from the
20% sample contains more than 75% of the users and edges compared to the social
network created from all posts, and it has similar degree distribution
Degree Ranking Using Local Information
Most real world dynamic networks are evolved very fast with time. It is not
feasible to collect the entire network at any given time to study its
characteristics. This creates the need to propose local algorithms to study
various properties of the network. In the present work, we estimate degree rank
of a node without having the entire network. The proposed methods are based on
the power law degree distribution characteristic or sampling techniques. The
proposed methods are simulated on synthetic networks, as well as on real world
social networks. The efficiency of the proposed methods is evaluated using
absolute and weighted error functions. Results show that the degree rank of a
node can be estimated with high accuracy using only samples of the
network size. The accuracy of the estimation decreases from high ranked to low
ranked nodes. We further extend the proposed methods for random networks and
validate their efficiency on synthetic random networks, that are generated
using Erd\H{o}s-R\'{e}nyi model. Results show that the proposed methods can be
efficiently used for random networks as well