6 research outputs found

    VA-index: Quantifying assortativity patterns in networks with multidimensional nodal attributes

    Get PDF
    Network connections have been shown to be correlated with structural or external attributes of the network vertices in a variety of cases. Given the prevalence of this phenomenon network scientists have developed metrics to quantify its extent. In particular, the assortativity coefficient is used to capture the level of correlation between a single-dimensional attribute (categorical or scalar) of the network nodes and the observed connections, i.e., the edges. Nevertheless, in many cases a multi-dimensional, i.e., vector feature of the nodes is of interest. Similar attributes can describe complex behavioral patterns (e.g., mobility) of the network entities. To date little attention has been given to this setting and there has not been a general and formal treatment of this problem. In this study we develop a metric, the vector assortativity index (VA-index for short), based on network randomization and (empirical) statistical hypothesis testing that is able to quantify the assortativity patterns of a network with respect to a vector attribute. Our extensive experimental results on synthetic network data show that the VA-index outperforms a baseline extension of the assortativity coefficient, which has been used in the literature to cope with similar cases. Furthermore, the VAindex can be calibrated (in terms of parameters) fairly easy, while its benefits increase with the (co-)variance of the vector elements, where the baseline systematically over(under)estimate the true mixing patterns of the network

    Prediction, evolution and privacy in social and affiliation networks

    Get PDF
    In the last few years, there has been a growing interest in studying online social and affiliation networks, leading to a new category of inference problems that consider the actor characteristics and their social environments. These problems have a variety of applications, from creating more effective marketing campaigns to designing better personalized services. Predictive statistical models allow learning hidden information automatically in these networks but also bring many privacy concerns. Three of the main challenges that I address in my thesis are understanding 1) how the complex observed and unobserved relationships among actors can help in building better behavior models, and in designing more accurate predictive algorithms, 2) what are the processes that drive the network growth and link formation, and 3) what are the implications of predictive algorithms to the privacy of users who share content online. The majority of previous work in prediction, evolution and privacy in online social networks has concentrated on the single-mode networks which form around user-user links, such as friendship and email communication. However, single-mode networks often co-exist with two-mode affiliation networks in which users are linked to other entities, such as social groups, online content and events. We study the interplay between these two types of networks and show that analyzing these higher-order interactions can reveal dependencies that are difficult to extract from the pair-wise interactions alone. In particular, we present our contributions to the challenging problems of collective classification, link prediction, network evolution, anonymization and preserving privacy in social and affiliation networks. We evaluate our models on real-world data sets from well-known online social networks, such as Flickr, Facebook, Dogster and LiveJournal

    Centrality measures and analyzing dot-product graphs

    Full text link
    In this thesis we investigate two topics in data mining on graphs; in the first part we investigate the notion of centrality in graphs, in the second part we look at reconstructing graphs from aggregate information. In many graph related problems the goal is to rank nodes based on an importance score. This score is in general referred to as node centrality. In Part I. we start by giving a novel and more efficient algorithm for computing betweenness centrality. In many applications not an individual node but rather a set of nodes is chosen to perform some task. We generalize the notion of centrality to groups of nodes. While group centrality was first formally defined by Everett and Borgatti (1999), we are the first to pose it as a combinatorial optimization problem; find a group of k nodes with largest centrality. We give an algorithm for solving this optimization problem for a general notion of centrality that subsumes various instantiations of centrality that find paths in the graph. We prove that this problem is NP-hard for specific centrality definitions and we provide a universal algorithm for this problem that can be modified to optimize the specific measures. We also investigate the problem of increasing node centrality by adding or deleting edges in the graph. We conclude this part by solving the optimization problem for two specific applications; one for minimizing redundancy in information propagation networks and one for optimizing the expected number of interceptions of a group in a random navigational network. In the second part of the thesis we investigate what we can infer about a bipartite graph if only some aggregate information -- the number of common neighbors among each pair of nodes -- is given. First, we observe that the given data is equivalent to the dot-product of the adjacency vectors of each node. Based on this knowledge we develop an algorithm that is based on SVD-decomposition, that is capable of almost perfectly reconstructing graphs from such neighborhood data. We investigate two versions of this problem, in the versions the dot-product of nodes with themselves, e.g. the node degrees, are either known or hidden

    Reconstructing randomized social networks

    No full text
    corecore