7,730 research outputs found
Towards Unbiased BFS Sampling
Breadth First Search (BFS) is a widely used approach for sampling large
unknown Internet topologies. Its main advantage over random walks and other
exploration techniques is that a BFS sample is a plausible graph on its own,
and therefore we can study its topological characteristics. However, it has
been empirically observed that incomplete BFS is biased toward high-degree
nodes, which may strongly affect the measurements. In this paper, we first
analytically quantify the degree bias of BFS sampling. In particular, we
calculate the node degree distribution expected to be observed by BFS as a
function of the fraction f of covered nodes, in a random graph RG(pk) with an
arbitrary degree distribution pk. We also show that, for RG(pk), all commonly
used graph traversal techniques (BFS, DFS, Forest Fire, Snowball Sampling, RDS)
suffer from exactly the same bias. Next, based on our theoretical analysis, we
propose a practical BFS-bias correction procedure. It takes as input a
collected BFS sample together with its fraction f. Even though RG(pk) does not
capture many graph properties common in real-life graphs (such as
assortativity), our RG(pk)-based correction technique performs well on a broad
range of Internet topologies and on two large BFS samples of Facebook and Orkut
networks. Finally, we consider and evaluate a family of alternative correction
procedures, and demonstrate that, although they are unbiased for an arbitrary
topology, their large variance makes them far less effective than the
RG(pk)-based technique.Comment: BFS, RDS, graph traversal, sampling bias correctio
Bounding the Bias of Tree-Like Sampling in IP Topologies
It is widely believed that the Internet's AS-graph degree distribution obeys
a power-law form. Most of the evidence showing the power-law distribution is
based on BGP data. However, it was recently argued that since BGP collects data
in a tree-like fashion, it only produces a sample of the degree distribution,
and this sample may be biased. This argument was backed by simulation data and
mathematical analysis, which demonstrated that under certain conditions a tree
sampling procedure can produce an artificail power-law in the degree
distribution. Thus, although the observed degree distribution of the AS-graph
follows a power-law, this phenomenon may be an artifact of the sampling
process. In this work we provide some evidence to the contrary. We show, by
analysis and simulation, that when the underlying graph degree distribution
obeys a power-law with an exponent larger than 2, a tree-like sampling process
produces a negligible bias in the sampled degree distribution. Furthermore,
recent data collected from the DIMES project, which is not based on BGP
sampling, indicates that the underlying AS-graph indeed obeys a power-law
degree distribution with an exponent larger than 2. By combining this empirical
data with our analysis, we conclude that the bias in the degree distribution
calculated from BGP data is negligible.Comment: 12 pages, 1 figur
Crawling Facebook for Social Network Analysis Purposes
We describe our work in the collection and analysis of massive data describing the connections between participants to online social networks. Alternative approaches to social network data collection are defined and evaluated in practice, against the popular Facebook Web site. Thanks to our ad-hoc, privacy-compliant crawlers, two large samples, comprising millions of connections, have been collected; the data is anonymous and organized as an undirected graph. We describe a set of tools that we developed to analyze specific properties of such social-network graphs, i.e., among others, degree distribution, centrality measures, scaling laws and distribution of friendship.\u
Sampling properties of directed networks
For many real-world networks only a small "sampled" version of the original
network may be investigated; those results are then used to draw conclusions
about the actual system. Variants of breadth-first search (BFS) sampling, which
are based on epidemic processes, are widely used. Although it is well
established that BFS sampling fails, in most cases, to capture the
IN-component(s) of directed networks, a description of the effects of BFS
sampling on other topological properties are all but absent from the
literature. To systematically study the effects of sampling biases on directed
networks, we compare BFS sampling to random sampling on complete large-scale
directed networks. We present new results and a thorough analysis of the
topological properties of seven different complete directed networks (prior to
sampling), including three versions of Wikipedia, three different sources of
sampled World Wide Web data, and an Internet-based social network. We detail
the differences that sampling method and coverage can make to the structural
properties of sampled versions of these seven networks. Most notably, we find
that sampling method and coverage affect both the bow-tie structure, as well as
the number and structure of strongly connected components in sampled networks.
In addition, at low sampling coverage (i.e. less than 40%), the values of
average degree, variance of out-degree, degree auto-correlation, and link
reciprocity are overestimated by 30% or more in BFS-sampled networks, and only
attain values within 10% of the corresponding values in the complete networks
when sampling coverage is in excess of 65%. These results may cause us to
rethink what we know about the structure, function, and evolution of real-world
directed networks.Comment: 21 pages, 11 figure
Spin-polarized Quantum Transport in Mesoscopic Conductors: Computational Concepts and Physical Phenomena
Mesoscopic conductors are electronic systems of sizes in between nano- and
micrometers, and often of reduced dimensionality. In the phase-coherent regime
at low temperatures, the conductance of these devices is governed by quantum
interference effects, such as the Aharonov-Bohm effect and conductance
fluctuations as prominent examples. While first measurements of quantum charge
transport date back to the 1980s, spin phenomena in mesoscopic transport have
moved only recently into the focus of attention, as one branch of the field of
spintronics. The interplay between quantum coherence with confinement-,
disorder- or interaction-effects gives rise to a variety of unexpected spin
phenomena in mesoscopic conductors and allows moreover to control and engineer
the spin of the charge carriers: spin interference is often the basis for
spin-valves, -filters, -switches or -pumps. Their underlying mechanisms may
gain relevance on the way to possible future semiconductor-based spin devices.
A quantitative theoretical understanding of spin-dependent mesoscopic
transport calls for developing efficient and flexible numerical algorithms,
including matrix-reordering techniques within Green function approaches, which
we will explain, review and employ.Comment: To appear in the Encyclopedia of Complexity and System Scienc
Network Sampling: From Static to Streaming Graphs
Network sampling is integral to the analysis of social, information, and
biological networks. Since many real-world networks are massive in size,
continuously evolving, and/or distributed in nature, the network structure is
often sampled in order to facilitate study. For these reasons, a more thorough
and complete understanding of network sampling is critical to support the field
of network science. In this paper, we outline a framework for the general
problem of network sampling, by highlighting the different objectives,
population and units of interest, and classes of network sampling methods. In
addition, we propose a spectrum of computational models for network sampling
methods, ranging from the traditionally studied model based on the assumption
of a static domain to a more challenging model that is appropriate for
streaming domains. We design a family of sampling methods based on the concept
of graph induction that generalize across the full spectrum of computational
models (from static to streaming) while efficiently preserving many of the
topological properties of the input graphs. Furthermore, we demonstrate how
traditional static sampling algorithms can be modified for graph streams for
each of the three main classes of sampling methods: node, edge, and
topology-based sampling. Our experimental results indicate that our proposed
family of sampling methods more accurately preserves the underlying properties
of the graph for both static and streaming graphs. Finally, we study the impact
of network sampling algorithms on the parameter estimation and performance
evaluation of relational classification algorithms
On sampling social networking services
This article aims at summarizing the existing methods for sampling social
networking services and proposing a faster confidence interval for related
sampling methods. It also includes comparisons of common network sampling
techniques
- âŠ