177,955 research outputs found
Sampling from social networks with attributes
Sampling from large networks represents a fundamental challenge for social
network research. In this paper, we explore the sensitivity of different
sampling techniques (node sampling, edge sampling, random walk sampling, and
snowball sampling) on social networks with attributes. We consider the special
case of networks (i) where we have one attribute with two values (e.g., male
and female in the case of gender), (ii) where the size of the two groups is
unequal (e.g., a male majority and a female minority), and (iii) where nodes
with the same or different attribute value attract or repel each other (i.e.,
homophilic or heterophilic behavior). We evaluate the different sampling
techniques with respect to conserving the position of nodes and the visibility
of groups in such networks. Experiments are conducted both on synthetic and
empirical social networks. Our results provide evidence that different network
sampling techniques are highly sensitive with regard to capturing the expected
centrality of nodes, and that their accuracy depends on relative group size
differences and on the level of homophily that can be observed in the network.
We conclude that uninformed sampling from social networks with attributes thus
can significantly impair the ability of researchers to draw valid conclusions
about the centrality of nodes and the visibility or invisibility of groups in
social networks.Comment: Published at WWW'1
Recommended from our members
Applications of Sampling and Estimation on Networks
Networks or graphs are fundamental abstractions that allow us to study many important real systems, such as the Web, social networks and scientific collaboration. It is impossible to completely understand these systems and answer fundamental questions related to them without considering the way their components are connected, i.e., their topology. However, topology is not the only relevant aspect of networks. Nodes often have information associated with them, which can be regarded as node attributes or labels. An important problem is then how to characterize a network w.r.t. topology and node label distributions. Another important problem is how to design efficient algorithms to accomplish tasks on networks. Since nodes often have attributes, an interesting avenue for investigation consists in learning and exploiting existing correlations between node and neighbor attributes for accomplishing a task more efficiently. One of the challenges faced when studying networks in the wild is the fact that in general their topology and information associated with its nodes cannot be directly obtained. Thus, one must resort to collecting the data, but when obtaining the entire network is infeasible, sampling and estimation are the best option. This dissertation investigates the use of sampling and estimation to characterize networks and to accomplish a particular task. More precisely, we study (i) the problem of characterizing directed and undirected networks through random walk-based sampling, (ii) the problem of estimating the set-size distribution from an information-theoretic standpoint, which has application to characterizing the in-degree distribution in large graphs, and (iii) the problem of searching networks to find nodes that exhibit a specific trait while subject to a sampling budget by learning a model from node attributes and structural properties, which has application to recruiting in social networks
Sampling networks by nodal attributes
In a social network individuals or nodes connect to other nodes by choosing
one of the channels of communication at a time to re-establish the existing
social links. Since available data sets are usually restricted to a limited
number of channels or layers, these autonomous decision making processes by the
nodes constitute the sampling of a multiplex network leading to just one
(though very important) example of sampling bias caused by the behavior of the
nodes. We develop a general setting to get insight and understand the class of
network sampling models, where the probability of sampling a link in the
original network depends on the attributes of its adjacent nodes. Assuming
that the nodal attributes are independently drawn from an arbitrary
distribution and that the sampling probability for a
link of nodal attributes and is also arbitrary, we derive
exact analytic expressions of the sampled network for such network
characteristics as the degree distribution, degree correlation, and clustering
spectrum. The properties of the sampled network turn out to be sums of
quantities for the original network topology weighted by the factors stemming
from the sampling. Based on our analysis, we find that the sampled network may
have sampling-induced network properties that are absent in the original
network, which implies the potential risk of a naive generalization of the
results of the sample to the entire original network. We also consider the
case, when neighboring nodes have correlated attributes to show how to
generalize our formalism for such sampling bias and we get good agreement
between the analytic results and the numerical simulations.Comment: 11 pages, 5 figure
Leveraging Node Attributes for Incomplete Relational Data
Relational data are usually highly incomplete in practice, which inspires us
to leverage side information to improve the performance of community detection
and link prediction. This paper presents a Bayesian probabilistic approach that
incorporates various kinds of node attributes encoded in binary form in
relational models with Poisson likelihood. Our method works flexibly with both
directed and undirected relational networks. The inference can be done by
efficient Gibbs sampling which leverages sparsity of both networks and node
attributes. Extensive experiments show that our models achieve the
state-of-the-art link prediction results, especially with highly incomplete
relational data.Comment: Appearing in ICML 201
Adjusting for Network Size and Composition Effects in Exponential-Family Random Graph Models
Exponential-family random graph models (ERGMs) provide a principled way to
model and simulate features common in human social networks, such as
propensities for homophily and friend-of-a-friend triad closure. We show that,
without adjustment, ERGMs preserve density as network size increases. Density
invariance is often not appropriate for social networks. We suggest a simple
modification based on an offset which instead preserves the mean degree and
accommodates changes in network composition asymptotically. We demonstrate that
this approach allows ERGMs to be applied to the important situation of
egocentrically sampled data. We analyze data from the National Health and
Social Life Survey (NHSLS).Comment: 37 pages, 2 figures, 5 tables; notation revised and clarified, some
sections (particularly 4.3 and 5) made more rigorous, some derivations moved
into the appendix, typos fixed, some wording change
- …