154 research outputs found

    Unsupervised learning on social data

    Get PDF

    Unsupervised learning on social data

    Get PDF

    Algorithmes et techniques de détection des bots dans les réseaux sociaux

    Get PDF
    Dans cette thèse nous proposons des techniques d'apprentissage automatique ayant comme but la détection et caractérisation des bots malveillants dans les réseaux sociaux. Une nouveauté de ces méthodes est qu'uniquement des motifs d'interaction avec des " amis " des comptes analysés sont utilisés comme source de données pour la détection des bots. Les techniques proposées ont plusieurs nouveaux avantages. Il n'y a plus de nécessité de télécharger des gros volumes de données textuelles et médiatiques, qui dépendent fortement du langage. Cela permet aussi détecter des bots cachés par des paramètres de confidentialité ou bloqués, des bots camouflés imitant des personnes réelles, les groupes de bots, et estimer la qualité et le prix d'un bot. Dans une solution que nous avons développée, nous proposons extraire des données pour l'analyse sous la forme des graphes sociaux, utilisant un modèle de réseau social hiérarchisé. Après, afin de déterminer des paramètres, nous utilisons les méthodes statistiques, algorithmes de graphes, et les méthodes nous permettant d'analyser le plongement de graphe. La décision finale est prise utilisant le modèle de foret aléatoire ou le réseau de neurones. A la base de ce schéma, nous proposons 4 techniques nous permettant de réaliser le cycle complet de détection des attaques - 2 techniques de détection des bots (détection individuelle et détection de groupe); et 2 techniques pour les caractériser - l'estimation de qualité et l'estimation de prix. La thèse aussi présente des expériences permettant à évaluer les solutions proposées. Comme exemple le réseau social VKontacte a été choisi. A ce but, nous avons développé le logiciel prototype qui peut effectuer toute la chaine d'analyse, de collection des données à la prise de décision. Et afin d'entrainer nos modèles, nous avons obtenu directement de vendeurs les données concernant les bots de qualité, prix et stratégies de camouflage différentes. L'étude a montré qu'en utilisant uniquement l'information concernant les graphes des amis il est possible de reconnaitre et caractériser les bots très efficacement (AUC-ROC ~ 0.9). En même temps, la solution proposée est robuste par rapport à l'émergence de nouveaux types des bots, et au changement de leur type - de bots générés automatiquement et comptes piratés jusqu'aux utilisateurs humaines qui se chargent de l'activité malveillante contre une rémunération.In this thesis, we propose machine learning techniques to detecting and characterizing malicious bots in social networks. The novelty of these techniques is that only interaction patterns of friends' of analysed accounts are used as the source data to detect bots. The proposed techniques have a number of novel advantages. There is no need to download a large amount of text and media data, which are highly language-dependent. Furthermore, it allows one to detect bots that are hidden by privacy settings or blocked, to detect cam- ouflages bots that mimic real people, to detect a group of bots, and to estimate quality and price of a bot. In the developed solution, we propose to extract the input data for the analysis in form of a social graphs, using a hierarchical social network model. After, to construct features from this graph we use statistical methods, graph algorithms, and methods that analyze graph embedding. And finally, we make a decision using a random forest model or a neural network. Based on this schema we propose 4 techniques, that allows one to perform the full cycle attack detection pipeline - 2 techniques for bot detection: individual bot detection, and group bot detection; and 2 techniques for characterization of bots: estimation of bot quality, and estimation of bot price. The thesis also presents experiments that evaluate the proposed solutions on the example of bot detection in VKontakte social network. For this, we developed the software prototype that implements the entire chain of analysis - from data collection to decision making. And in order to train the models, we collected the data about bots with different quality, price and camouflage strategies directly from the bot sellers. The study showed that using only information about the graphs of friends it is possible to recognize and characterize bots with high efficiency (AUC - ROC ˜ 0.9). At the same time, the proposed solution is quite resistant to the emergence of new types of bots, and to bots of various types - from automatically generated and hacked accounts to users that perform malicious activity for money

    On the topology Of network fine structures

    Get PDF
    Multi-relational dynamics are ubiquitous in many complex systems like transportations, social and biological. This thesis studies the two mathematical objects that encapsulate these relationships --- multiplexes and interval graphs. The former is the modern outlook in Network Science to generalize the edges in graphs while the latter was popularized during the 1960s in Graph Theory. Although multiplexes and interval graphs are nearly 50 years apart, their motivations are similar and it is worthwhile to investigate their structural connections and properties. This thesis look into these mathematical objects and presents their connections. For example we will look at the community structures in multiplexes and learn how unstable the detection algorithms are. This can lead researchers to the wrong conclusions. Thus it is important to get formalism precise and this thesis shows that the complexity of interval graphs is an indicator to the precision. However this measure of complexity is a computational hard problem in Graph Theory and in turn we use a heuristic strategy from Network Science to tackle the problem. One of the main contributions of this thesis is the compilation of the disparate literature on these mathematical objects. The novelty of this contribution is in using the statistical tools from population biology to deduce the completeness of this thesis's bibliography. It can also be used as a framework for researchers to quantify the comprehensiveness of their preliminary investigations. From the large body of multiplex research, the thesis focuses on the statistical properties of the projection of multiplexes (the reduction of multi-relational system to a single relationship network). It is important as projection is always used as the baseline for many relevant algorithms and its topology is insightful to understand the dynamics of the system.Open Acces

    Geomatics Applications to Contemporary Social and Environmental Problems in Mexico

    Get PDF
    Trends in geospatial technologies have led to the development of new powerful analysis and representation techniques that involve processing of massive datasets, some unstructured, some acquired from ubiquitous sources, and some others from remotely located sensors of different kinds, all of which complement the structured information produced on a regular basis by governmental and international agencies. In this chapter, we provide both an extensive revision of such techniques and an insight of the applications of some of these techniques in various study cases in Mexico for various scales of analysis: from regional migration flows of highly qualified people at the country level and the spatio-temporal analysis of unstructured information in geotagged tweets for sentiment assessment, to more local applications of participatory cartography for policy definitions jointly between local authorities and citizens, and an automated method for three dimensional (3D) modelling and visualisation of forest inventorying with laser scanner technology

    Localized Events in Social Media Streams: Detection, Tracking, and Recommendation

    Get PDF
    From the recent proliferation of social media channels to the immense amount of user-generated content, an increasing interest in social media mining is currently being witnessed. Messages continuously posted via these channels report a broad range of topics from daily life to global and local events. As a consequence, this has opened new opportunities for mining event information crucial in many application domains, especially in increasing the situational awareness in critical scenarios. Interestingly, many of these messages are enriched with location information, due to the wide- spread of mobile devices and the recent advancements of today’s location acquisition techniques. This enables location-aware event mining, i.e., the detection and tracking of localized events. In this thesis, we propose novel frameworks and models that digest social media content for localized event detection, tracking, and recommendation. We first develop KeyPicker, a framework to extract and score event-related keywords in an online fashion, accounting for high levels of noise, temporal heterogeneity and outliers in the data. Then, LocEvent is proposed to incrementally detect and track events using a 4-stage procedure. That is, LocEvent receives the keywords extracted by KeyPicker, identifies local keywords, spatially clusters them, and finally scores the generated clusters. For each detected event, a set of descriptive keywords, a location, and a time interval are estimated at a fine-grained resolution. In addition to the sparsity of geo-tagged messages, people sometimes post about events far away from an event’s location. Such spatial problems are handled by novel spatial regularization techniques, namely, graph- and gazetteer-based regularization. To ensure scalability, we utilize a hierarchical spatial index in addition to a multi-stage filtering procedure that gradually suppresses noisy words and considers only event-related ones for complex spatial computations. As for recommendation applications, we propose an event recommender system built upon model-based collaborative filtering. Our model is able to suggest events to users, taking into account a number of contextual features including the social links between users, the topical similarities of events, and the spatio-temporal proximity between users and events. To realize this model, we employ and adapt matrix factorization, which allows for uncovering latent user-event patterns. Our proposed features contribute to directing the learning process towards recommendations that better suit the taste of users, in particular when new users have very sparse (or even no) event attendance history. To evaluate the effectiveness and efficiency of our proposed approaches, extensive comparative experiments are conducted using datasets collected from social media channels. Our analysis of the experimental results reveals the superiority and advantages of our frameworks over existing methods in terms of the relevancy and precision of the obtained results

    Information Reliability on the Social Web - Models and Applications in Intelligent User Interfaces

    Get PDF
    The Social Web is undergoing continued evolution, changing the paradigm of information production, processing and sharing. Information sources have shifted from institutions to individual users, vastly increasing the amount of information available online. To overcome the information overload problem, modern filtering algorithms have enabled people to find relevant information in efficient ways. However, noisy, false and otherwise useless information remains a problem. We believe that the concept of information reliability needs to be considered along with information relevance to adapt filtering algorithms to today's Social Web. This approach helps to improve information search and discovery and can also improve user experience by communicating aspects of information reliability.This thesis first shows the results of a cross-disciplinary study into perceived reliability by reporting on a novel user experiment. This is followed by a discussion of modeling, validating, and communicating information reliability, including its various definitions across disciplines. A selection of important reliability attributes such as source credibility, competence, influence and timeliness are examined through different case studies. Results show that perceived reliability of information can vary greatly across contexts. Finally, recent studies on visual analytics, including algorithm explanations and interactive interfaces are discussed with respect to their impact on the perception of information reliability in a range of application domains
    • …
    corecore