Search CORE

177,955 research outputs found

Sampling from social networks with attributes

Author: Karimi Fariba
Pfeffer Jürgen
Singer Philipp
Strohmaier Markus
Wagner Claudia
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Sampling from large networks represents a fundamental challenge for social network research. In this paper, we explore the sensitivity of different sampling techniques (node sampling, edge sampling, random walk sampling, and snowball sampling) on social networks with attributes. We consider the special case of networks (i) where we have one attribute with two values (e.g., male and female in the case of gender), (ii) where the size of the two groups is unequal (e.g., a male majority and a female minority), and (iii) where nodes with the same or different attribute value attract or repel each other (i.e., homophilic or heterophilic behavior). We evaluate the different sampling techniques with respect to conserving the position of nodes and the visibility of groups in such networks. Experiments are conducted both on synthetic and empirical social networks. Our results provide evidence that different network sampling techniques are highly sensitive with regard to capturing the expected centrality of nodes, and that their accuracy depends on relative group size differences and on the level of homophily that can be observed in the network. We conclude that uninformed sampling from social networks with attributes thus can significantly impair the ability of researchers to draw valid conclusions about the centrality of nodes and the visibility or invisibility of groups in social networks.Comment: Published at WWW'1

arXiv.org e-Print Archive

Crossref

MAnnheim DOCument Server

SSOAR - Social Science Open Access Repository

Recommended from our members

Applications of Sampling and Estimation on Networks

Author: Murai Ferreira Fabricio
Publication venue: ScholarWorks@UMass Amherst
Publication date: 14/11/2016
Field of study

Networks or graphs are fundamental abstractions that allow us to study many important real systems, such as the Web, social networks and scientific collaboration. It is impossible to completely understand these systems and answer fundamental questions related to them without considering the way their components are connected, i.e., their topology. However, topology is not the only relevant aspect of networks. Nodes often have information associated with them, which can be regarded as node attributes or labels. An important problem is then how to characterize a network w.r.t. topology and node label distributions. Another important problem is how to design efficient algorithms to accomplish tasks on networks. Since nodes often have attributes, an interesting avenue for investigation consists in learning and exploiting existing correlations between node and neighbor attributes for accomplishing a task more efficiently. One of the challenges faced when studying networks in the wild is the fact that in general their topology and information associated with its nodes cannot be directly obtained. Thus, one must resort to collecting the data, but when obtaining the entire network is infeasible, sampling and estimation are the best option. This dissertation investigates the use of sampling and estimation to characterize networks and to accomplish a particular task. More precisely, we study (i) the problem of characterizing directed and undirected networks through random walk-based sampling, (ii) the problem of estimating the set-size distribution from an information-theoretic standpoint, which has application to characterizing the in-degree distribution in large graphs, and (iii) the problem of searching networks to find nodes that exhibit a specific trait while subject to a sampling budget by learning a model from node attributes and structural properties, which has application to recruiting in social networks

ScholarWorks@UMass Amherst

Sampling networks by nodal attributes

Author: Jo Hang-Hyun
Kaski Kimmo
Kertész János
Murase Yohsuke
Török János
Publication venue: 'American Physical Society (APS)'
Publication date: 15/05/2019
Field of study

In a social network individuals or nodes connect to other nodes by choosing one of the channels of communication at a time to re-establish the existing social links. Since available data sets are usually restricted to a limited number of channels or layers, these autonomous decision making processes by the nodes constitute the sampling of a multiplex network leading to just one (though very important) example of sampling bias caused by the behavior of the nodes. We develop a general setting to get insight and understand the class of network sampling models, where the probability of sampling a link in the original network depends on the attributes

h

of its adjacent nodes. Assuming that the nodal attributes are independently drawn from an arbitrary distribution

\rho(h)

and that the sampling probability

r(h_i , h_j)

for a link

ij

of nodal attributes

h_i

and

h_j

is also arbitrary, we derive exact analytic expressions of the sampled network for such network characteristics as the degree distribution, degree correlation, and clustering spectrum. The properties of the sampled network turn out to be sums of quantities for the original network topology weighted by the factors stemming from the sampling. Based on our analysis, we find that the sampled network may have sampling-induced network properties that are absent in the original network, which implies the potential risk of a naive generalization of the results of the sample to the entire original network. We also consider the case, when neighboring nodes have correlated attributes to show how to generalize our formalism for such sampling bias and we get good agreement between the analytic results and the numerical simulations.Comment: 11 pages, 5 figure

arXiv.org e-Print Archive

Aaltodoc Publication Archive

Leveraging Node Attributes for Incomplete Relational Data

Author: Buntine Wray
Du Lan
Zhao He
Publication venue
Publication date: 01/01/2017
Field of study

Relational data are usually highly incomplete in practice, which inspires us to leverage side information to improve the performance of community detection and link prediction. This paper presents a Bayesian probabilistic approach that incorporates various kinds of node attributes encoded in binary form in relational models with Poisson likelihood. Our method works flexibly with both directed and undirected relational networks. The inference can be done by efficient Gibbs sampling which leverages sparsity of both networks and node attributes. Extensive experiments show that our models achieve the state-of-the-art link prediction results, especially with highly incomplete relational data.Comment: Appearing in ICML 201

arXiv.org e-Print Archive

Monash University Research Portal

Adjusting for Network Size and Composition Effects in Exponential-Family Random Graph Models

Author: Anderson
Davison
Frank
Goodreau
Hamilton
Handcock
Helleringer
Holland
Hunter
Hunter
Klovdahl
Koehly
Laumann
Leskovec
Mark S. Handcock
Martina Morris
McCullagh
McPherson
Morris
Morris
Pattison
Pattison
Pavel N. Krivitsky
Robins
Robins
Snijders
Strauss
Strichartz
van Duijn
Wasserman
Woodhouse
Publication venue: 'Elsevier BV'
Publication date: 27/12/2010
Field of study

Exponential-family random graph models (ERGMs) provide a principled way to model and simulate features common in human social networks, such as propensities for homophily and friend-of-a-friend triad closure. We show that, without adjustment, ERGMs preserve density as network size increases. Density invariance is often not appropriate for social networks. We suggest a simple modification based on an offset which instead preserves the mean degree and accommodates changes in network composition asymptotically. We demonstrate that this approach allows ERGMs to be applied to the important situation of egocentrically sampled data. We analyze data from the National Health and Social Life Survey (NHSLS).Comment: 37 pages, 2 figures, 5 tables; notation revised and clarified, some sections (particularly 4.3 and 5) made more rigorous, some derivations moved into the appendix, typos fixed, some wording change

arXiv.org e-Print Archive

Crossref

PubMed Central

eScholarship - University of California

Research Online