8 research outputs found
Missing data in multiplex networks: a preliminary study
A basic problem in the analysis of social networks is missing data. When a
network model does not accurately capture all the actors or relationships in
the social system under study, measures computed on the network and ultimately
the final outcomes of the analysis can be severely distorted. For this reason,
researchers in social network analysis have characterised the impact of
different types of missing data on existing network measures. Recently a lot of
attention has been devoted to the study of multiple-network systems, e.g.,
multiplex networks. In these systems missing data has an even more significant
impact on the outcomes of the analyses. However, to the best of our knowledge,
no study has focused on this problem yet. This work is a first step in the
direction of understanding the impact of missing data in multiple networks. We
first discuss the main reasons for missingness in these systems, then we
explore the relation between various types of missing information and their
effect on network properties. We provide initial experimental evidence based on
both real and synthetic data.Comment: 7 page
Recommended from our members
Applications of Sampling and Estimation on Networks
Networks or graphs are fundamental abstractions that allow us to study many important real systems, such as the Web, social networks and scientific collaboration. It is impossible to completely understand these systems and answer fundamental questions related to them without considering the way their components are connected, i.e., their topology. However, topology is not the only relevant aspect of networks. Nodes often have information associated with them, which can be regarded as node attributes or labels. An important problem is then how to characterize a network w.r.t. topology and node label distributions. Another important problem is how to design efficient algorithms to accomplish tasks on networks. Since nodes often have attributes, an interesting avenue for investigation consists in learning and exploiting existing correlations between node and neighbor attributes for accomplishing a task more efficiently. One of the challenges faced when studying networks in the wild is the fact that in general their topology and information associated with its nodes cannot be directly obtained. Thus, one must resort to collecting the data, but when obtaining the entire network is infeasible, sampling and estimation are the best option. This dissertation investigates the use of sampling and estimation to characterize networks and to accomplish a particular task. More precisely, we study (i) the problem of characterizing directed and undirected networks through random walk-based sampling, (ii) the problem of estimating the set-size distribution from an information-theoretic standpoint, which has application to characterizing the in-degree distribution in large graphs, and (iii) the problem of searching networks to find nodes that exhibit a specific trait while subject to a sampling budget by learning a model from node attributes and structural properties, which has application to recruiting in social networks
Sampling-based estimation of in-degree distribution in directed networks
The focus of this thesis is on the estimation of the in-degree distribution in directed networks from sampling network nodes or edges. A number of sampling schemes are considered, including random sampling with and without replacement, and several approaches based on random walks with possible jumps. When sampling nodes, it is assumed that only the out-edges of that node are visible, that is, the in-degree of that node is not observed. The suggested estimation of the in-degree distribution is based on two approaches. The inversion approach exploits the relation between the original and sample in-degree distributions, and can estimate the bulk of the in-degree distribution, but not the tail of the distribution. The tail of the in-degree distribution is estimated through an asymptotic approach, which itself has two versions: one assuming a power-law tail and the other for a tail of general form. The two estimation approaches are examined on synthetic and real networks, with good performance results, especially striking for the asymptotic approach.Bachelor of Scienc