3 research outputs found

    Centrality measures and analyzing dot-product graphs

    Full text link
    In this thesis we investigate two topics in data mining on graphs; in the first part we investigate the notion of centrality in graphs, in the second part we look at reconstructing graphs from aggregate information. In many graph related problems the goal is to rank nodes based on an importance score. This score is in general referred to as node centrality. In Part I. we start by giving a novel and more efficient algorithm for computing betweenness centrality. In many applications not an individual node but rather a set of nodes is chosen to perform some task. We generalize the notion of centrality to groups of nodes. While group centrality was first formally defined by Everett and Borgatti (1999), we are the first to pose it as a combinatorial optimization problem; find a group of k nodes with largest centrality. We give an algorithm for solving this optimization problem for a general notion of centrality that subsumes various instantiations of centrality that find paths in the graph. We prove that this problem is NP-hard for specific centrality definitions and we provide a universal algorithm for this problem that can be modified to optimize the specific measures. We also investigate the problem of increasing node centrality by adding or deleting edges in the graph. We conclude this part by solving the optimization problem for two specific applications; one for minimizing redundancy in information propagation networks and one for optimizing the expected number of interceptions of a group in a random navigational network. In the second part of the thesis we investigate what we can infer about a bipartite graph if only some aggregate information -- the number of common neighbors among each pair of nodes -- is given. First, we observe that the given data is equivalent to the dot-product of the adjacency vectors of each node. Based on this knowledge we develop an algorithm that is based on SVD-decomposition, that is capable of almost perfectly reconstructing graphs from such neighborhood data. We investigate two versions of this problem, in the versions the dot-product of nodes with themselves, e.g. the node degrees, are either known or hidden

    On targeting Markov segments

    No full text
    Consider two user populations, of which one is targeted and the other is not. Users in the targeted population follow a Markov chain on a space of n states. The untargeted population follows another Markov chain, also defined on the same set of n states. Each time a user arrives at a state, he/she is presented with information appropriate for the targeted population (an advertisement, or a recommendation) with some probability. Presenting the advertisement incurs a cost. Notice that while the revenue grows in proportion to the flow of targeted users through the state, the cost grows in proportion to the total flow (targeted and untargeted) through the state. How can we compute the best advertisement policy ? The world-wide web is a natural setting for such a problem. Internet service providers have trail information for building such Markovian user models where states correspond to pages on the web. In this paper we study the simple problem above, as well as the variants with multiple..

    On targeting Markov segments

    No full text
    Consider two user populations, of which one is targeted and the other is not. Users in the targeted population follow a Markov chain on a space ofnstates. The untargeted population follows another Markov chain, also defined on the same set ofnstates. Each time a user arrives at a state, he/she is presented with information appropriate for the targeted population (an advertisement, or a recommendation) with some probability. Presenting the advertisement incurs a cost. Notice that while the revenue grows in proportion to the flow of targeted users through the state, the cost grows in proportion to the total flow (targeted and untargeted) through the state. How can we compute the best advertisement policy? The world-wide web is a natural setting for such a problem. Internet service providers have trail information for building such Markovian user models where states correspond to pages on the web. In this paper we study the simple problem above, as well as the variants with multiple targetable segments. In some settings the policy need not be a static probability distribution on states. Instead, we can dynamically vary the policy based on the user’s path through the states
    corecore