1,029 research outputs found
Towards Unbiased BFS Sampling
Breadth First Search (BFS) is a widely used approach for sampling large
unknown Internet topologies. Its main advantage over random walks and other
exploration techniques is that a BFS sample is a plausible graph on its own,
and therefore we can study its topological characteristics. However, it has
been empirically observed that incomplete BFS is biased toward high-degree
nodes, which may strongly affect the measurements. In this paper, we first
analytically quantify the degree bias of BFS sampling. In particular, we
calculate the node degree distribution expected to be observed by BFS as a
function of the fraction f of covered nodes, in a random graph RG(pk) with an
arbitrary degree distribution pk. We also show that, for RG(pk), all commonly
used graph traversal techniques (BFS, DFS, Forest Fire, Snowball Sampling, RDS)
suffer from exactly the same bias. Next, based on our theoretical analysis, we
propose a practical BFS-bias correction procedure. It takes as input a
collected BFS sample together with its fraction f. Even though RG(pk) does not
capture many graph properties common in real-life graphs (such as
assortativity), our RG(pk)-based correction technique performs well on a broad
range of Internet topologies and on two large BFS samples of Facebook and Orkut
networks. Finally, we consider and evaluate a family of alternative correction
procedures, and demonstrate that, although they are unbiased for an arbitrary
topology, their large variance makes them far less effective than the
RG(pk)-based technique.Comment: BFS, RDS, graph traversal, sampling bias correctio
On sampling social networking services
This article aims at summarizing the existing methods for sampling social
networking services and proposing a faster confidence interval for related
sampling methods. It also includes comparisons of common network sampling
techniques
Crawling Facebook for Social Network Analysis Purposes
We describe our work in the collection and analysis of massive data describing the connections between participants to online social networks. Alternative approaches to social network data collection are defined and evaluated in practice, against the popular Facebook Web site. Thanks to our ad-hoc, privacy-compliant crawlers, two large samples, comprising millions of connections, have been collected; the data is anonymous and organized as an undirected graph. We describe a set of tools that we developed to analyze specific properties of such social-network graphs, i.e., among others, degree distribution, centrality measures, scaling laws and distribution of friendship.\u
Sampling Online Social Networks via Heterogeneous Statistics
Most sampling techniques for online social networks (OSNs) are based on a
particular sampling method on a single graph, which is referred to as a
statistics. However, various realizing methods on different graphs could
possibly be used in the same OSN, and they may lead to different sampling
efficiencies, i.e., asymptotic variances. To utilize multiple statistics for
accurate measurements, we formulate a mixture sampling problem, through which
we construct a mixture unbiased estimator which minimizes asymptotic variance.
Given fixed sampling budgets for different statistics, we derive the optimal
weights to combine the individual estimators; given fixed total budget, we show
that a greedy allocation towards the most efficient statistics is optimal. In
practice, the sampling efficiencies of statistics can be quite different for
various targets and are unknown before sampling. To solve this problem, we
design a two-stage framework which adaptively spends a partial budget to test
different statistics and allocates the remaining budget to the inferred best
statistics. We show that our two-stage framework is a generalization of 1)
randomly choosing a statistics and 2) evenly allocating the total budget among
all available statistics, and our adaptive algorithm achieves higher efficiency
than these benchmark strategies in theory and experiment
Extraction and Analysis of Facebook Friendship Relations
Online Social Networks (OSNs) are a unique Web and social phenomenon, affecting tastes and behaviors of their users and helping them to maintain/create friendships. It is interesting to analyze the growth and evolution of Online Social Networks both from the point of view of marketing and other of new services and from a scientific viewpoint, since their structure and evolution may share similarities with real-life social networks. In social sciences, several techniques for analyzing (online) social networks have been developed, to evaluate quantitative properties (e.g., defining metrics and measures of structural characteristics of the networks) or qualitative aspects (e.g., studying the attachment model for the network evolution, the binary trust relationships, and the link prediction problem).\ud
However, OSN analysis poses novel challenges both to Computer and Social scientists. We present our long-term research effort in analyzing Facebook, the largest and arguably most successful OSN today: it gathers more than 500 million users. Access to data about Facebook users and their friendship relations, is restricted; thus, we acquired the necessary information directly from the front-end of the Web site, in order to reconstruct a sub-graph representing anonymous interconnections among a significant subset of users. We describe our ad-hoc, privacy-compliant crawler for Facebook data extraction. To minimize bias, we adopt two different graph mining techniques: breadth-first search (BFS) and rejection sampling. To analyze the structural properties of samples consisting of millions of nodes, we developed a specific tool for analyzing quantitative and qualitative properties of social networks, adopting and improving existing Social Network Analysis (SNA) techniques and algorithms
- …