12 research outputs found

    On the Troll-Trust Model for Edge Sign Prediction in Social Networks

    Get PDF
    In the problem of edge sign prediction, we are given a directed graph (representing a social network), and our task is to predict the binary labels of the edges (i.e., the positive or negative nature of the social relationships). Many successful heuristics for this problem are based on the troll-trust features, estimating at each node the fraction of outgoing and incoming positive/negative edges. We show that these heuristics can be understood, and rigorously analyzed, as approximators to the Bayes optimal classifier for a simple probabilistic model of the edge labels. We then show that the maximum likelihood estimator for this model approximately corresponds to the predictions of a Label Propagation algorithm run on a transformed version of the original social graph. Extensive experiments on a number of real-world datasets show that this algorithm is competitive against state-of-the-art classifiers in terms of both accuracy and scalability. Finally, we show that troll-trust features can also be used to derive online learning algorithms which have theoretical guarantees even when edges are adversarially labeled.Comment: v5: accepted to AISTATS 201

    "What Is the City but the People?" Exploring Urban Activity Using Social Web Traces

    Get PDF
    International audienceWe demonstrate GeoTopics, a system to explore geographical patterns of urban activity. The system collects publicly shared check-ins generated by Foursquare users, that reveal who spends time where, when, and on what type of activity. It then employs sparse probabilistic modeling techniques to learn associations between different regions of a city and multi-feature descriptions of urban activity. Through a web interface, users of the system can select a city of interest and explore visualizations that highlight how different types of activity are spatially and temporally distributed in the city. We discuss the opportunities that web data offer to understand urban activity and the challenges one faces in that task. We then describe our approach and the architecture of GeoTopics. Finally, we lay out the demonstration scenario

    Caractérisation des arêtes dans les graphes signés et attribués

    No full text
    In this thesis, we develop methods to efficiently and accurately characterize edges in complex networks. In simple graphs, nodes are connected by a single semantic. For instance, two users are friends in a social networks, or there is a hypertext link from one webpage to another. Furthermore, those connections are typically driven by node similarity, in what is known as the homophily mechanism. In the previous examples, users become friends because of common features, and webpages link to each other based on common topics. By contrast, complex networks are graphs where every connection has one semantic among k possible ones. Those connections are moreover based on both partial homophily and heterophily of their endpoints. This additional information enable finer analysis of real world graphs. However, it can be expensive to acquire, or is sometimes not known beforehand. We address the problems of inferring edge semantics in various settings. First, we consider graphs where edges have two opposite semantics, and where we observe the label of some edges. These so-called signed graphs are a convenient way to represent polarized interactions. We propose two learning biases suited for directed and undirected signed graphs respectively. This leads us to design several algorithms leveraging the graph topology to solve a binary classification problem that we call Edge Sign Prediction. Second, we consider graphs with k ≥ 2 available semantics for edge. In that case of multilayer graphs, we are not provided with any edge label, but instead are given one feature vector for each node. Faced with such an unsupervised Edge Attributed Clustering problem, we devise a quality criterion expressing how well an edge k-partition and k semantical vectors explains the observed connections. We optimize this goodness of explanation criterion in vectorial and matricial forms, and show how those two methods perform on synthetic data.Dans cette thèse, nous proposons des méthodes pour caractériser efficacement et précisément les arêtes au sein de réseaux complexes. Dans les graphes simples, les nœuds sont liés au travers d’une sémantique unique. Par exemple, deux utilisateurs sont amis dans un réseau social, ou une page web contient un lien hypertexte pointant vers un autre page. De plus, ces connexions sont généralement guidées par la similarité entre les nœuds, au travers d’un mécanisme appelé homophilie. Dans les exemples précédents, les utilisateurs deviennent amis à cause de caractéristiques communes, et les pages web sont reliées les unes aux autres sur la base de sujets communs. En revanche, les réseaux complexes sont des graphes où chaque connexion possède une sémantique parmi k possibles. Ces connexions sont en outre basées à la fois sur une homophilie et une hétérophilie partielle des nœuds à leurs extrémité. Cette information supplémentaire permet une analyse plus fine des graphes issus d’applications réelles. Cependant, elle peut être coûteuse à acquérir, ou n’est pas toujours disponible a priori. Nous abordons donc le problème d’inférer la sémantique des arêtes dans plusieurs contextes. Tout d’abord, nous considérons les graphes où les arêtes ont deux sémantiques opposées, et où nous observons l’étiquette de certaines arêtes. Ces « graphes signés » sont une façon élégante de représenter des interactions polarisées. Nous proposons deux biais d’apprentissage, adaptés respectivement aux graphes signés dirigés et non dirigés. Ceci nous amène à concevoir plusieurs algorithmes utilisant la topologie du graphe pour résoudre un problème de classification binaire que nous appelons Edge Sign Prediction. Deuxièmement, nous considérons les graphes avec k ≥ 2 sémantiques possibles pour les arêtes. Dans ce cas, nous ne recevons pas d’étiquette d’arêtes, mais plutôt un vecteur de caractéristiques pour chaque nœud. Face à ce problème non supervisé d’Edge Attributed Clustering, nous concevons un critère de qualité exprimant dans quelle mesure une k-partition des arêtes et k vecteurs sémantiques expliquent les connexions observées. Nous optimisons ce critère « qualité explicative » sous une forme vectorielle et matricielle et illustrons le comportement de ces deux méthodes sur des données synthétiques

    Caractérisation des arêtes dans les graphes signés et attribués

    Get PDF
    In this thesis, we develop methods to efficiently and accurately characterize edges in complex networks. In simple graphs, nodes are connected by a single semantic. For instance, two users are friends in a social networks, or there is a hypertext link from one webpage to another. Furthermore, those connections are typically driven by node similarity, in what is known as the homophily mechanism. In the previous examples, users become friends because of common features, and webpages link to each other based on common topics. By contrast, complex networks are graphs where every connection has one semantic among k possible ones. Those connections are moreover based on both partial homophily and heterophily of their endpoints. This additional information enable finer analysis of real world graphs. However, it can be expensive to acquire, or is sometimes not known beforehand. We address the problems of inferring edge semantics in various settings. First, we consider graphs where edges have two opposite semantics, and where we observe the label of some edges. These so-called signed graphs are a convenient way to represent polarized interactions. We propose two learning biases suited for directed and undirected signed graphs respectively. This leads us to design several algorithms leveraging the graph topology to solve a binary classification problem that we call Edge Sign Prediction. Second, we consider graphs with k ≥ 2 available semantics for edge. In that case of multilayer graphs, we are not provided with any edge label, but instead are given one feature vector for each node. Faced with such an unsupervised Edge Attributed Clustering problem, we devise a quality criterion expressing how well an edge k-partition and k semantical vectors explains the observed connections. We optimize this goodness of explanation criterion in vectorial and matricial forms, and show how those two methods perform on synthetic data.Dans cette thèse, nous proposons des méthodes pour caractériser efficacement et précisément les arêtes au sein de réseaux complexes. Dans les graphes simples, les nœuds sont liés au travers d’une sémantique unique. Par exemple, deux utilisateurs sont amis dans un réseau social, ou une page web contient un lien hypertexte pointant vers un autre page. De plus, ces connexions sont généralement guidées par la similarité entre les nœuds, au travers d’un mécanisme appelé homophilie. Dans les exemples précédents, les utilisateurs deviennent amis à cause de caractéristiques communes, et les pages web sont reliées les unes aux autres sur la base de sujets communs. En revanche, les réseaux complexes sont des graphes où chaque connexion possède une sémantique parmi k possibles. Ces connexions sont en outre basées à la fois sur une homophilie et une hétérophilie partielle des nœuds à leurs extrémité. Cette information supplémentaire permet une analyse plus fine des graphes issus d’applications réelles. Cependant, elle peut être coûteuse à acquérir, ou n’est pas toujours disponible a priori. Nous abordons donc le problème d’inférer la sémantique des arêtes dans plusieurs contextes. Tout d’abord, nous considérons les graphes où les arêtes ont deux sémantiques opposées, et où nous observons l’étiquette de certaines arêtes. Ces « graphes signés » sont une façon élégante de représenter des interactions polarisées. Nous proposons deux biais d’apprentissage, adaptés respectivement aux graphes signés dirigés et non dirigés. Ceci nous amène à concevoir plusieurs algorithmes utilisant la topologie du graphe pour résoudre un problème de classification binaire que nous appelons Edge Sign Prediction. Deuxièmement, nous considérons les graphes avec k ≥ 2 sémantiques possibles pour les arêtes. Dans ce cas, nous ne recevons pas d’étiquette d’arêtes, mais plutôt un vecteur de caractéristiques pour chaque nœud. Face à ce problème non supervisé d’Edge Attributed Clustering, nous concevons un critère de qualité exprimant dans quelle mesure une k-partition des arêtes et k vecteurs sémantiques expliquent les connexions observées. Nous optimisons ce critère « qualité explicative » sous une forme vectorielle et matricielle et illustrons le comportement de ces deux méthodes sur des données synthétiques

    Density of Paris Foursquare venues

    No full text
    <p>Result of Scikit Gaussian Kernel Density estimation on a set of 4008 venues in Paris.</p

    Where is the Soho of Rome? Measures and algorithms for finding similar neighborhoods in cities

    No full text
    International audienceData generated on location-aware social media provide rich information about the places (shopping malls, restaurants, cafés , etc) where citizens spend their time. That information can, in turn, be used to describe city neighborhoods in terms of the activity that takes place therein. For example, the data might reveal that citizens visit one neighborhood mainly for shopping , while another for its dining venues. In this paper, we present a methodology to analyze such data, describe neighborhoods in terms of the activity they host, and discover similar neighborhoods across cities. Using millions of Foursquare check-ins from cities in Eu-rope and the US, we conduct an extensive study on features and measures that can be used to quantify similarity of city neighborhoods. We find that the earth-mover's distance outper-forms other candidate measures in finding similar neighborhoods. Subsequently, using the earth-mover's distance as our measure of choice, we address the issue of computational efficiency: given a neighborhood in one city, how to efficiently retrieve the k most similar neighborhoods in other cities. We propose a similarity-search strategy that yields significant speed improvement over the brute-force search, with minimal loss in accuracy. We conclude with a case study that compares neighborhoods of Paris to neighborhoods of other cities

    Modeling Urban Behavior by Mining Geotagged Social Data

    Get PDF
    International audienceData generated on location-based social networks provide rich information on the whereabouts of urban dwellers. Specifically, such data reveal who spends time where, when, and on what type of activity (e.g., shopping at a mall, or dining at a restaurant). That information can, in turn, be used to describe city regions in terms of activity that takes place therein. For example, the data might reveal that citizens visit one region mainly for shopping in the morning, while another for dining in the evening. Furthermore, once such a description is available, one can ask more elaborate questions. For example, one might ask what features distinguish one region from another – some regions might be different in terms of the type of venues they host and others in terms of the visitors they attract. As another example, one might ask which regions are similar across cities. In this paper, we present a method to answer such questions using publicly shared Foursquare data. Our analysis makes use of a probabilistic model, the features of which include the exact location of activity, the users who participate in the activity, as well as the time of the day and day of week the activity takes place. Compared to previous approaches to similar tasks, our probabilistic modeling approach allows us to make minimal assumptions about the data – which relieves us from having to set arbitrary parameters in our analysis (e.g., regarding the granularity of discovered regions or the importance of different features). We demonstrate how the model learned with our method can be used to identify the most likely and distinctive features of a geographical area, quantify the importance features used in the model, and discover similar regions across different cities. Finally, we perform an empirical comparison with previous work and discuss insights obtained through our findings
    corecore