68 research outputs found

    Interpretable Network Representations

    Get PDF
    Networks (or interchangeably graphs) have been ubiquitous across the globe and within science and engineering: social networks, collaboration networks, protein-protein interaction networks, infrastructure networks, among many others. Machine learning on graphs, especially network representation learning, has shown remarkable performance in network-based applications, such as node/graph classification, graph clustering, and link prediction. Like performance, it is equally crucial for individuals to understand the behavior of machine learning models and be able to explain how these models arrive at a certain decision. Such needs have motivated many studies on interpretability in machine learning. For example, for social network analysis, we may need to know the reasons why certain users (or groups) are classified or clustered together by the machine learning models, or why a friend recommendation system considers some users similar so that they are recommended to connect with each other. Therefore, an interpretable network representation is necessary and it should carry the graph information to a level understandable by humans. Here, we first introduce our method on interpretable network representations: the network shape. It provides a framework to represent a network with a 3-dimensional shape, and one can customize network shapes for their need, by choosing various graph sampling methods, 3D network embedding methods and shape-fitting methods. In this thesis, we introduce the two types of network shape: a Kronecker hull which represents a network as a 3D convex polyhedron using stochastic Kronecker graphs as the network embedding method, and a Spectral Path which represents a network as a 3D path connecting the spectral moments of the network and its subgraphs. We demonstrate that network shapes can capture various properties of not only the network, but also its subgraphs. For instance, they can provide the distribution of subgraphs within a network, e.g., what proportion of subgraphs are structurally similar to the whole network? Network shapes are interpretable on different levels, so one can quickly understand the structural properties of a network and its subgraphs by its network shape. Using experiments on real-world networks, we demonstrate that network shapes can be used in various applications, including (1) network visualization, the most intuitive way for users to understand a graph; (2) network categorization (e.g., is this a social or a biological network?); (3) computing similarity between two graphs. Moreover, we utilize network shapes to extend biometrics studies to network data, by solving two problems: network identification (Given an anonymized graph, can we identify the network from which it is collected? i.e., answering questions such as ``where is this anonymized graph sampled from, Twitter or Facebook? ) and network authentication (If one claims the graph is sampled from a certain network, can we verify this claim?). The overall objective of the thesis is to provide a compact, interpretable, visualizable, comparable and efficient representation of networks

    Privacy-preserving social network analysis

    Get PDF
    Data privacy in social networks is a growing concern that threatens to limit access to important information contained in these data structures. Analysis of the graph structure of social networks can provide valuable information for revenue generation and social science research, but unfortunately, ensuring this analysis does not violate individual privacy is difficult. Simply removing obvious identifiers from graphs or even releasing only aggregate results of analysis may not provide sufficient protection. Differential privacy is an alternative privacy model, popular in data-mining over tabular data, that uses noise to obscure individuals\u27 contributions to aggregate results and offers a strong mathematical guarantee that individuals\u27 presence in the data-set is hidden. Analyses that were previously vulnerable to identification of individuals and extraction of private data may be safely released under differential-privacy guarantees. However, existing adaptations of differential privacy to social network analysis are often complex and have considerable impact on the utility of the results, making it less likely that they will see widespread adoption in the social network analysis world. In fact, social scientists still often use the weakest form of privacy protection, simple anonymization, in their social network analysis publications. ^ We review the existing work in graph-privatization, including the two existing standards for adapting differential privacy to network data. We then proposecontributor-privacy and partition-privacy , novel standards for differential privacy over network data, and introduce simple, powerful private algorithms using these standards for common network analysis techniques that were infeasible to privatize under previous differential privacy standards. We also ensure that privatized social network analysis does not violate the level of rigor required in social science research, by proposing a method of determining statistical significance for paired samples under differential privacy using the Wilcoxon Signed-Rank Test, which is appropriate for non-normally distributed data. ^ Finally, we return to formally consider the case where differential privacy is not applied to data. Naive, deterministic approaches to privacy protection, including anonymization and aggregation of data, are often used in real world practice. De-anonymization research demonstrates that some naive approaches to privacy are highly vulnerable to reidentification attacks, and none of these approaches offer the robust guarantee of differential privacy. However, we propose that these methods fall across a range of protection: Some are better than others. In cases where adding noise to data is especially problematic, or acceptance and adoption of differential privacy is especially slow, it is critical to have a formal understanding of the alternatives. We define De Facto Privacy, a metric for comparing the relative privacy protection provided by deterministic approaches

    Echo Chambers in Parliamentary Twitter Networks:The Catalan Case

    Get PDF
    Social media is transforming relations among members of parliaments, but are members taking advantage of these new media to broaden their party and ideological communication environment, or they are mainly communicating with other party members and ideologically aligned peers? This article tests whether parliamentarians’ use of Twitter is opening communication flows or confining them to representatives of the same party or ideology. The study is based on a data set spanning the period January 1, 2013, to March 31, 2014, which covers all relations (4,516), retweets (6,045), and mentions (19,507) among Catalan parliamentarians. Our results indicate that communication flows are polarized along party and ideological lines. The degree of polarization of this network depends, however, on where the interactions occur: The relations network is the most polarized; cross-party and cross-ideological interactions are greater in the retweet network and most present in the mention network

    Diffusion and Supercritical Spreading Processes on Complex Networks

    Get PDF
    Die große Menge an Datensätzen, die in den letzten Jahren verfügbar wurden, hat es ermöglicht, sowohl menschlich-getriebene als auch biologische komplexe Systeme in einem beispiellosen Ausmaß empirisch zu untersuchen. Parallel dazu ist die Vorhersage und Kontrolle epidemischer Ausbrüche für Fragen der öffentlichen Gesundheit sehr wichtig geworden. In dieser Arbeit untersuchen wir einige wichtige Aspekte von Diffusionsphänomenen und Ausbreitungsprozeßen auf Netzwerken. Wir untersuchen drei verschiedene Probleme im Zusammenhang mit Ausbreitungsprozeßen im überkritischen Regime. Zunächst untersuchen wir die Reaktionsdiffusion auf Ensembles zufälliger Netzwerke, die durch die beobachteten Levy-Flugeigenschaften der menschlichen Mobilität charakterisiert sind. Das zweite Problem ist die Schätzung der Ankunftszeiten globaler Pandemien. Zu diesem Zweck leiten wir geeignete verborgene Geometrien netzgetriebener Streuprozeße, unter Nutzung der Random-Walk-Theorie, her und identifizieren diese. Durch die Definition von effective distances wird das Problem komplexer raumzeitlicher Muster auf einfache, homogene Wellenausbreitungsmuster reduziert. Drittens führen wir durch die Einbettung von Knoten in den verborgenen Raum, der durch effective distances im Netzwerk definiert ist, eine neuartige Netzwerkzentralität ein, die ViralRank genannt wird und quantifiziert, wie nahe ein Knoten, im Durchschnitt, den anderen Knoten im Netzwerk ist. Diese drei Studien bilden einen einheitlichen Rahmen zur Charakterisierung von Diffusions- und Ausbreitungsprozeßen, die sich auf komplexen Netzwerken allgemein abzeichnen, und bieten neue Ansätze für herausfordernde theoretische Probleme, die für die Bewertung künftiger Modelle verwendet werden können.The large amount of datasets that became available in recent years has made it possible to empirically study humanly-driven, as well as biological complex systems to an unprecedented extent. In parallel, the prediction and control of epidemic outbreaks have become very important for public health issues. In this thesis, we investigate some important aspects of diffusion phenomena and spreading processes unfolding on networks. We study three different problems related to spreading processes in the supercritical regime. First, we study reaction-diffusion on ensembles of random networks characterized by the observed Levy-flight properties of human mobility. The second problem is the estimation of the arrival times of global pandemics. To this end, we derive and identify suitable hidden geometries of network-driven spreading processes, leveraging on random-walk theory. Through the definition of network effective distances, the problem of complex spatiotemporal patterns is reduced to simple, homogeneous wave propagation patterns. Third, by embedding nodes in the hidden space defined by network effective distances, we introduce a novel network centrality, called ViralRank, which quantifies how close a node is, on average, to the other nodes. These three studies constitute a unified framework to characterize diffusion and spreading processes unfolding on complex networks in very general settings, and provide new approaches to challenging theoretical problems that can be used to benchmark future models

    Dynamic OSINT System Sourcing from Social Networks

    Get PDF
    In the past, it was humanly impossible to observe and extract large amounts of textual information from web platforms in short periods of time, but the trend has changed and in recent years several surveillance, selection, and extraction of textual information systems have emerged, based on Open­Source Intelligence (OSINT). These platforms became popular among computer security professionals, allowing them to detect new threats and respond in a timely manner by locating, collecting and analysing information made available to the public through social networks, blogs, newspapers, television, etc., proving to be a great advantage in terms of information gathering and a good help with regards to preventing problems, especially in the area of information security. This dissertation focuses on the development of a platform based on OSINT, and has two main objectives. First, to continue the work previously developed in another technology ­ Hypertext Preprocessor (PHP), in which formulas and algorithms were developed to classify posts from Twitter. And second, to present a new platform (using Node JS technology), by applying the formulas from the previous work, evaluating the new platform with users, and improving the user experience (UX). During the development process two versions were provided to the users and hosted on a virtual machine, based on cloud services of Microsoft Azure. The platform architecture is composed by three processes developed in Node JS (one that provides the page, the web server, one that collects the posts, and another one that does the classification of each post). The posts are collected through an API provided by Twitter, and stored and managed in PHPMyAdmin a platform based on MySql database. The User­Centered Design (UCD) was applied during the development process, a process that is focused on the user and his experience. The participation of users has contributed to define new features and to improve the presented layout. Users were included in the testing phase, being called to fill forms, one form for each version. Based on the collected feedback, the following improvements were implemented: the possibility of searching for several topics at the same time, the possibility of havving header monitors by ranges of time, and the possibility of applying filters, such as the number of minutes the posts are available on the screen, and the order by which they are presented.No passado, era humanamente impossível observar e extrair grandes quantidades de informações textuais de plataformas da web em curtos espaços de tempo, mas a tendência mudou e nos últimos anos surgiram diversos sistemas de vigilância baseados na seleção e extração de informação textual proveniente de fontes de informação abertas, denominadas Open­Source Intelligence (OSINT), que se têm tornado populares principalmente entre os profissionais de segurança informática, permitindo a deteção de novas ameaças, a localização e recolha de informação disponível para o público em geral através das redes sociais, blogs, jornais, televisão, etc., revelando­se uma grande vantagem em termos de recolha de informação e uma boa ajuda no que diz respeito à prevenção de problemas principalmente na área de segurança da informação. Esta dissertação foca­se no desenvolvimento de uma plataforma com base em informação open source, dando continuidade a um trabalho anteriormente desenvolvido numa outra tecnologia ­ Hypertext Preprocessor (PHP), onde se apresentaram fórmulas e algoritmos para classificação de posts do Twitter sobre o tema da segurança da informação. Focandose este trabalho no desenvolvimento de novas versões da plataforma com base na tecnologia Node JS, na implementação das fórmulas apresentadas, na melhoria da experiência do utilizador (UX) e na avaliação da plataforma desenvolvida com utilizadores. Durante o desenvolvimento do trabalho foram apresentadas duas versões da plataforma, e hospedadas numa máquina virtual, tornando­as acessíveis aos utilizadores, que na fase final contribuíram com o seu feedback sobre as mesmas. Essa máquina virtual baseia­se em serviços cloud da Microsoft Azure, onde estão instalados três processos desenvolvidos em Node JS (um que disponibiliza a página, um que classifica, e outro que recolhe posts), os posts são recolhidos através de uma API disponibilizada pelo Twitter, e guardados numa base de dados MySql, baseada na plataforma de administração de base de dados PHPMyAdmin, disponibilizando à comunidade as notícias mais recentes e relevantes sobre vários temas. Durante o processo de desenvolvimento teve­se em conta o modelo User­Centered Design (UCD), um processo focado no utilizador e na experiência de utilização. A participação de utilizadores foi assim a chave para a definição das características, e da forma como é apresentado o front­end da plataforma, sendo estes incluídos na fase de testes, com o preenchimento de formulários visando recolher feedback sobre os protótipos desenvolvidos. Com base no feedback recolhido foram implementadas novas melhorias. De todas as mais relevantes foram: a possibilidade de pesquisa por vários temas em simultâneo, a inserção de monitores, e a possibilidade de aplicar filtros, como o número de minutos em que os posts ficam disponíveis no ecrã, e a ordem com que os mesmos devem ser apresentados

    Platform pop:disentangling Spotify’s intermediary role in the music industry

    Get PDF
    It has been widely recognized that platforms utilize their editorial capacity to transform the industries they intermediate. In this paper, we examine the intermediary role of the leading audio streaming platform – Spotify – on the recorded music industry. Spotify is often called the ‘new radio’ for the influence it has on breaking songs and artists, and for the role it plays in music discovery and consumption. Our purpose is to determine whether Spotify is leveling the playing field or entrenching hierarchies between major labels and independent labels. We attempt to answer this question through a longitudinal analysis of content owners (major labels or ‘indies’) and formats (albums, tracks, or playlists) promoted by Spotify through its global Twitter account: @Spotify. As a carefully curated venue for corporate speech @Spotify provides a window into continuities and changes in Spotify’s corporate strategy. By using @Spotify as a proxy through which to track patterns of promotion between the years 2012 and 2018, this paper offers a novel empirical examination of how Spotify is shaping the consumption of music, and in turn the structure of the recording industry. In doing so, we provide evidence for speculating about the future of the recorded music industry in a platform era

    Digital traces of human mobility and interaction: models and applications

    Get PDF
    In the last decade digital devices and services have permeated many aspects of everyday life. They generate massive amounts of data that provide insightful information about how people move across geographic areas and how they interact with others. By analysing this detailed information, it is possible to investigate aspects of human mobility and interaction. Therefore, the thesis of this dissertation is that the analysis of mobility and interaction traces generated by digital devices and services, at different timescales and spatial granularity, can be used to gain a better understanding of human behaviour, build new applications and improve existing services. In order to substantiate this statement I develop analytical models and applications supported by three sources of mobility and interaction data: online social networks, mobile phone networks and GPS traces. First, I present three applications related to data gathered from online social networks, namely the analysis of a global rumour spreading in Twitter, the definition of spatial dissemination measures in a social graph and the analysis of collaboration between developers in GitHub. Then I describe two applications of the analysis of country-wide data of cellular phone networks: the modelling of epidemic containment strategies, with the goal of assessing their efficacy in curbing infectious diseases; the definition of a mobility-based measure of individual risk, which can be used to identify who needs targeted treatment. Finally, I present two applications based on GPS traces: the estimation of trajectories from spatially-coarse temporally-sparse location traces and the analysis of routing behaviour in urban settings
    corecore