243 research outputs found

    Mining and modeling graphs using patterns and priors

    No full text

    FROM SMALL-WORLDS TO BIG DATA:TEMPORAL AND MULTIDIMENSIONAL ASPECTS OF HUMAN NETWORKS

    Get PDF
    In this thesis we address the close interplay among mobility, offline relationships and online interactions and the related human networks at different dimensional scales and temporal granularities. By generally adopting a data-driven approach, we move from small datasets about physical interactions mediated by human-carried devices, describing small social realities, to large-scale graphs that evolve over time, as well as from human mobility trajectories to face-to-face contacts occurring in different geographical contexts. We explore in depth the relation between human mobility and the social structure induced by the overlapping of different people's trajectories on GPS traces collected in urban and metropolitan areas. We define the notions of geo-location and geo-community which are operational in describing in a unique framework both spatial and social aspects of human behavior. Through the concept of geo-community we model the human mobility adopting a bipartite graph. Thanks to this graph representation we can generate a social structure that is plausible w.r.t. the real interactions. In general the modeling approach have the merit for reporting the mobility in a graph-theoretic framework making the study of the interplay mobility/sociality more affordable and intuitive. Our modeling approach also results in a mobility model, Geo-CoMM, which lies on and exploits the idea of geo-community. The model represents a particular instance of a general framework we provide. A framework where the social structure behind the preferred-location based mobility models emerges. We validate Geo-CoMM on spatial, temporal, pairwise connectivity and social features showing that it reproduces the main statistical properties observed in real traces. As concerns the offline/online interplay we provide a complete overview of the close connection between online and offline sociality. To reach our goal we gather data about offline contacts and social interactions on Facebook of a group of students and we propose a multidimensional network analysis which allows us to deeply understand how the characteristics of users in the distinct networks impact each other. Results show how offline and Facebook friends are different. This way we confirm and worsen the general intuition that online social networks have shifted away from their original goal to mirror the offline sociality of individuals. As for the role and the social importance, it becomes apparent that social features such as user popularity or community structure do not transfer along social dimensions, as confirmed by our correlation analysis of the network layers and by the comparison among the communities. In the last chapters we analyze the evolution of the online social network from a physical time perspective, i.e. considering the graph evolution as a graph time-series and not as a function of the network basic properties (number of nodes or links). As for the physical time in a user-centric viewpoint, we investigate the bursty nature of the link creation process in online social network. We prove not only that it is a highly inhomogeneous process, but also identify patterns of burstiness common to all nodes. Then we focus on the dynamic formation of two fundamental network building components: dyads and triads. We propose two new metrics to aid the temporal analysis on physical time: link creation delay and triangle closure delay. These two metrics enable us to study the dynamic creation of dyads and triads, and to highlight network behavior that would otherwise remain hidden. In our analysis, we find that link delays are generally very low in absolute time and are largely independent of the dates people join the network. To highlight the social nature of this metric, we introduce the term \textit{peerness} to quantify how well linked users overlap in lifetimes. As for triadic closure delay we first introduce an algorithm to extract of temporal triangle which enables us to monitor the triangle formation process, and to detect sudden changes in the triangle formation behavior, possibly related to external events. In particular, we show that the introduction of new service functionalities had a disruptive impact on the triangle creation process in the network

    Theoretical Tools for Network Analysis: Game Theory, Graph Centrality, and Statistical Inference.

    Full text link
    A computer-driven data explosion has made the difficulty of interpreting large data sets of interconnected entities ever more salient. My work focuses on theoretical tools for summarizing, analyzing, and understanding network data sets, or data sets of things and their pairwise connections. I address four network science issues, improving our ability to analyze networks from a variety of domains. I first show that the sophistication of game-theoretic agent decision making can crucially effect network cascades: differing decision making assumptions can lead to dramatically different cascade outcomes. This highlights the importance of diligence when making assumptions about agent behavior on networks and in general. I next analytically demonstrate a significant irregularity in the popular eigenvector centrality, and propose a new spectral centrality measure, nonbacktracking centrality, showing that it avoids this irregularity. This tool contributes a more robust way of ranking nodes, as well as an additional mathematical understanding of the effects of network localization. I next give a new model for uncertain networks, networks in which one has no access to true network data but instead observes only probabilistic information about edge existence. I give a fast maximum-likelihood algorithm for recovering edges and communities in this model, and show that it outperforms a typical approach of thresholding to an unweighted network. This model gives a better tool for understanding and analyzing real-world uncertain networks such as those arising in the experimental sciences. Lastly, I give a new lens for understanding scientific literature, specifically as a hybrid coauthorship and citation network. I use this for exploratory analysis of the Physical Review journals over a hundred-year period, and I make new observations about the interplay between these two networks and how this relationship has changed over time.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133463/1/travisbm_1.pd

    Dynamics and stability of small social networks

    No full text
    The choices and behaviours of individuals in social systems combine in unpredictable ways to create complex, often surprising, social outcomes. The structure of these behaviours, or interactions between individuals, can be represented as a social network. These networks are not static but vary over time as connections are made and broken or change in intensity. Generally these changes are gradual, but in some cases individuals disagree and as a result "fall out" with each other, i.e. , actively end their relationship by ceasing all contact. These "fallouts" have been shown to be capable of fragmenting the social network into disconnected parts. Fragmentation can impair the functioning of social networks and it is thus important to better understand the social processes that have such consequences. In this thesis we investigate the question of how networks fragment: what mechanism drives the changes that ultimately result in fragmentation? To do so, we also aim to understand the necessary conditions for fragmentation to be possible and identify the connections that are most important for the cohesion of the network. To answer these questions, we need a model of social network dynamics that is stable enough such that fragmentation does not occur spontaneously, but is simultaneously dynamic enough to allow the system to react to perturbations (i.e. , disagreements). We present such a model and show that it is able to grow and maintain networks exhibiting the characteristic properties of social networks, and does so using local behavioural rules inspired by sociological theory. We then provide a detailed investigation of fragmentation and confirm basic intuitions on the importance of bridges for network cohesion. Furthermore, we show that this topological feature alone does not explain which points of the network are most vulnerable to fragmentation. Rather, we find that dependencies between edges are crucial for understanding subtle differences between stable and vulnerable bridges. This understandingof the vulnerability of different network components is likely to be valuable for preventing fragmentation and limiting the impact of social fallou

    Mining complex trees for hidden fruit : a graph–based computational solution to detect latent criminal networks : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Information Technology at Massey University, Albany, New Zealand.

    Get PDF
    The detection of crime is a complex and difficult endeavour. Public and private organisations – focusing on law enforcement, intelligence, and compliance – commonly apply the rational isolated actor approach premised on observability and materiality. This is manifested largely as conducting entity-level risk management sourcing ‘leads’ from reactive covert human intelligence sources and/or proactive sources by applying simple rules-based models. Focusing on discrete observable and material actors simply ignores that criminal activity exists within a complex system deriving its fundamental structural fabric from the complex interactions between actors - with those most unobservable likely to be both criminally proficient and influential. The graph-based computational solution developed to detect latent criminal networks is a response to the inadequacy of the rational isolated actor approach that ignores the connectedness and complexity of criminality. The core computational solution, written in the R language, consists of novel entity resolution, link discovery, and knowledge discovery technology. Entity resolution enables the fusion of multiple datasets with high accuracy (mean F-measure of 0.986 versus competitors 0.872), generating a graph-based expressive view of the problem. Link discovery is comprised of link prediction and link inference, enabling the high-performance detection (accuracy of ~0.8 versus relevant published models ~0.45) of unobserved relationships such as identity fraud. Knowledge discovery uses the fused graph generated and applies the “GraphExtract” algorithm to create a set of subgraphs representing latent functional criminal groups, and a mesoscopic graph representing how this set of criminal groups are interconnected. Latent knowledge is generated from a range of metrics including the “Super-broker” metric and attitude prediction. The computational solution has been evaluated on a range of datasets that mimic an applied setting, demonstrating a scalable (tested on ~18 million node graphs) and performant (~33 hours runtime on a non-distributed platform) solution that successfully detects relevant latent functional criminal groups in around 90% of cases sampled and enables the contextual understanding of the broader criminal system through the mesoscopic graph and associated metadata. The augmented data assets generated provide a multi-perspective systems view of criminal activity that enable advanced informed decision making across the microscopic mesoscopic macroscopic spectrum

    Social closure in markets, families, and networks: explaining the emergence of intergroup inequality as a result of exclusionary action across contexts

    Get PDF
    Cardona A. Social closure in markets, families, and networks: explaining the emergence of intergroup inequality as a result of exclusionary action across contexts. Bielefeld: UniversitÀt Bielefeld; 2015.This dissertation explores how social closure produces intergroup inequality in the context of markets, families, and personal networks. Understood as exclusionary action, closure encompasses all forms of preferential or discriminatory interactions and transactions among groups or categorically bounded individuals that accrue or secure benefits to one group or category by means of excluding others, both intentionally and unintentionally. The study investigates closure in different contexts using agent-based simulation (ABM) and exponential random graph models (ERGM). First, closure in labor markets is studied as practiced by professional groups in markets. Second, closure is further explored when carried out by parents who follow different strategies to allocate resources among siblings, thereby producing skill inequality both within and across generations. And third, exclusionary action is analyzed in processes of friendship formation that lead to the segregation of personal networks into clusters of individuals sharing either positive or negative attributes

    Inference And Learning: Computational Difficulty And Efficiency

    Get PDF
    In this thesis, we mainly investigate two collections of problems: statistical network inference and model selection in regression. The common feature shared by these two types of problems is that they typically exhibit an interesting phenomenon in terms of computational difficulty and efficiency. For statistical network inference, our goal is to infer the network structure based on a noisy observation of the network. Statistically, we model the network as generated from the structural information with the presence of noise, for example, planted submatrix model (for bipartite weighted graph), stochastic block model, and Watts-Strogatz model. As the relative amount of ``signal-to-noise\u27\u27 varies, the problems exhibit different stages of computational difficulty. On the theoretical side, we investigate these stages through characterizing the transition thresholds on the ``signal-to-noise\u27\u27 ratio, for the aforementioned models. On the methodological side, we provide new computationally efficient procedures to reconstruct the network structure for each model. For model selection in regression, our goal is to learn a ``good\u27\u27 model based on a certain model class from the observed data sequences (feature and response pairs), when the model can be misspecified. More concretely, we study two model selection problems: to learn from general classes of functions based on i.i.d. data with minimal assumptions, and to select from the sparse linear model class based on possibly adversarially chosen data in a sequential fashion. We develop new theoretical and algorithmic tools beyond empirical risk minimization to study these problems from a learning theory point of view

    Empirical studies on the social structure of knowledge

    Get PDF
    This thesis applies micro–econometric techniques to examine the effect of social structure on knowledge. Chapter 2 investigates the role of mass migration in the passage of compulsory schooling laws. It provides qualitative and quantitative evidence that compulsory schooling laws were used as a nation–building tool to homogenise the civic values held by the culturally diverse migrants who moved to America during the “Age of Mass Migration”. Our central finding is that the adoption of compulsory schooling by American-born median voters occurs significantly earlier in time in states that host many migrants who had lower exposure to civic values in their home countries and had lower demand for common schooling when in the US. Chapter 3 explores whether, and to what extent, the position in the coauthorship network of medical scientists matters for the productivity of a researcher. I use sudden and unexpected deaths of star scientists as exogenous shocks to the network thus providing a causal identification of the loss of a star on the productivity of a scientist. I characterise the heterogeneity in the impact of the death by exploiting the position of the deceased scientists. Following the death of a star, coauthors suffer on average a 8% decrease in annual publications and this effect can differ by up to 31% depending on the network position. Chapter 4 examines knowledge spillovers by measuring the relative intensity of patent citations in two technological fields for which clean and dirty inventions can be clearly distinguished: energy production (renewables vs. fossil fuel energy generation) and automobiles (electric cars vs. internal combustion engines). We develop a new methodology based on Google’s PageRank algorithm to measure the social benefit of knowledge spillover. We find that clean technologies generate 40% higher spillovers than their dirty counterparts
    • 

    corecore