6 research outputs found

    On the Troll-Trust Model for Edge Sign Prediction in Social Networks

    Get PDF
    In the problem of edge sign prediction, we are given a directed graph (representing a social network), and our task is to predict the binary labels of the edges (i.e., the positive or negative nature of the social relationships). Many successful heuristics for this problem are based on the troll-trust features, estimating at each node the fraction of outgoing and incoming positive/negative edges. We show that these heuristics can be understood, and rigorously analyzed, as approximators to the Bayes optimal classifier for a simple probabilistic model of the edge labels. We then show that the maximum likelihood estimator for this model approximately corresponds to the predictions of a Label Propagation algorithm run on a transformed version of the original social graph. Extensive experiments on a number of real-world datasets show that this algorithm is competitive against state-of-the-art classifiers in terms of both accuracy and scalability. Finally, we show that troll-trust features can also be used to derive online learning algorithms which have theoretical guarantees even when edges are adversarially labeled.Comment: v5: accepted to AISTATS 201

    ALPINE : active link prediction using network embedding

    Get PDF
    Many real-world problems can be formalized as predicting links in a partially observed network. Examples include Facebook friendship suggestions, the prediction of protein-protein interactions, and the identification of hidden relationships in a crime network. Several link prediction algorithms, notably those recently introduced using network embedding, are capable of doing this by just relying on the observed part of the network. Often, whether two nodes are linked can be queried, albeit at a substantial cost (e.g., by questionnaires, wet lab experiments, or undercover work). Such additional information can improve the link prediction accuracy, but owing to the cost, the queries must be made with due consideration. Thus, we argue that an active learning approach is of great potential interest and developed ALPINE (Active Link Prediction usIng Network Embedding), a framework that identifies the most useful link status by estimating the improvement in link prediction accuracy to be gained by querying it. We proposed several query strategies for use in combination with ALPINE, inspired by the optimal experimental design and active learning literature. Experimental results on real data not only showed that ALPINE was scalable and boosted link prediction accuracy with far fewer queries, but also shed light on the relative merits of the strategies, providing actionable guidance for practitioners

    LEARNING ON GRAPHS: ALGORITHMS FOR CLASSIFICATION AND SEQUENTIAL DECISIONS

    Get PDF
    In recent years, networked data have become widespread due to the increasing importance of social networks and other web-related applications. This growing interest is driving researchers to design new algorithms for solving important problems that involve networked data. In this thesis we present a few practical yet principled algorithms for learning and sequential decision-making on graphs. Classification of networked data is an important problem that has recently received a great deal of attention from the machine learning community. This is due to its many important practical applications: computer vision, bioinformatics, spam detection and text categorization, just to cite a few of the more conspicuous examples. We focus our attention on the task called ``node classification'', often studied in the semi-supervised (transductive) setting. We present two algorithms, motivated by different theoretical frameworks. The first algorithm is studied in the well-known online adversarial setting, within which it enjoys an optimal mistake bound (up to logarithmic factors). The second algorithm is based on a game-theoretic approach, where each node of the network is maximizing its own payoff. The setting corresponds to a Graph Transduction Game in which the graph is a tree. For this special case, we show that the Nash Equilibrium of the game can be reached in linear time. We complement our theoretical findings with an extensive set of experiments using datasets from many different domains. In the second part of the thesis, we present a rapidly emerging theme in the analysis of networked data: signed networks, graphs whose edges carry a label encoding the positive or negative nature of the relationship between the connected nodes. For example, social networks and e-commerce offer several examples of signed relationships: Slashdot users can tag other users as friends or foes, Epinions users can rate each other positively or negatively, Ebay users develop trust and distrust towards sellers in the network. More generally, two individuals that are related because they rate similar products in a recommendation website may agree or disagree in their ratings. Many heuristics for link classification in social networks are based on a form of social balance summarized by the motto \u201cthe enemy of my enemy is my friend\u201d. This is equivalent to saying that the signs on the edges of a social graph tend to be consistent with some two-clustering structure of the nodes, where edges connecting nodes from the same cluster are positive and edges connecting nodes from different clusters are negative. We present algorithms for the batch transductive active learning setting, where the topology of the graph is known in advance and our algorithms can ask for the label of some specific edges during the training phase (before starting with the predictions). These algorithms can achieve different tradeoffs between the number of mistakes during the test phase and the number of labels required during the training phase. We also presented an experimental comparison against some state-of-the-art spectral heuristics presented in a previous work, where we show that the simplest or our algorithms is already competitive with the best of these heuristics. In the last chapter we present another way to exploit relational information for sequential predictions: the networks of bandits. Contextual bandits adequately formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such online advertisement and recommendation systems. Many practical applications have a strong social component whose integration in the bandit algorithm could lead to a significant performance improvement: for example, since often friends have similar taste, we may want to serve contents to a group of users by taking advantage of an underlying network of social relationships among them. We introduce a novel algorithmic approach to a particular networked bandit problem. More specifically, we run a bandit algorithm on each network node (e.g., user), allowing it to ``share'' feedback signals with the other nodes by employing the multi-task kernel. We derive the regret analysis of this algorithm and, finally, we report on the results of an experimental comparison between our approach and the state of the art techniques, on both artificial and real-world social networks

    A Linear Time Active Learning Algorithm for Link Classification

    No full text
    We present very efficient active learning algorithms for link classification in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant factor) number of mistakes on any graph G = (V, E) such that |E | = Ω(|V| 3/2) by querying O(|V| 3/2) edge labels. More generally, we show an algorithm that achieves optimality to within a factor of O(k) by querying at most order of |V| + (|V|/k) 3/2 edge labels. The running time of this algorithm is at most of order |E| + |V| log |V|
    corecore