7 research outputs found

    Alignment and Assembly:Inferring Networks from Noisy Observations

    Get PDF
    Over recent years, many large network datasets become available, giving rise to novel and valuable applications of data mining and machine learning techniques. These datasets include social networks, the structure of the Internet, and protein-interaction networks, to name just a few. Graph mining exploits information hidden in these data to shed light on such problems as finding relevant pages on the web, or identifying communities of strongly connected individuals. Clearly, to address such problems, we first need the complete and reliable network graph. In many real-world scenarios, the full graph is not available for free. For example, data-collection processes may be noisy and unreliable or node identifiers may be hidden for privacy protection. Therefore, we cannot rely on the node labels to infer the full graph. In this thesis, we address fundamental and practical questions of inferring a true full network from multiple ambiguous observations. We formulate two variations of this problem: network alignment and network assembly. In each variant, we address two types of questions: first, we characterize how graph features impact the fundamental feasibility of reconstruction; second, we seek efficient algorithms that can scale to very large networks. In the first part of this thesis, we consider network alignment. We assume two large, noisy observations of the true network that are not labeled. Network alignment refers to the problem of aligning the vertices of the two networks using only structural cues and it can be viewed as a generalization of the classic graph-isomorphism problem. We make the following contributions. First, we introduce a random bigraph model with parameters p, t and s that generates two correlated graphs. We characterize conditions on p, t and s for the feasibility of alignment of two graphs. Second, we create an algorithm named percolation graph-matching (PGM) that builds an alignment from a small set of pre-matched nodes S. We prove conditions on the parameters p, t , s and r for which PGM succeeds, and we establish a phase transition in |S|. In the second part of this thesis, we consider network assembly. We assume many small, noisy observations of the true network, called patches. The node labels are either absent or not unique. The network assembly problem consists in reconstructing the true graph from these patches. We make the following contributions. First, we introduce a novel random-graph model with parameters p and q that generates a network with high clustering. We characterize conditions on p and q for feasibility of assembly. Second, we propose a heuristic assembly algorithm to reconstruct the true graph from arbitrary patches with label ambiguity

    Mitigating Epidemics through Mobile Micro-measures

    Full text link
    Epidemics of infectious diseases are among the largest threats to the quality of life and the economic and social well-being of developing countries. The arsenal of measures against such epidemics is well-established, but costly and insufficient to mitigate their impact. In this paper, we argue that mobile technology adds a powerful weapon to this arsenal, because (a) mobile devices endow us with the unprecedented ability to measure and model the detailed behavioral patterns of the affected population, and (b) they enable the delivery of personalized behavioral recommendations to individuals in real time. We combine these two ideas and propose several strategies to generate such recommendations from mobility patterns. The goal of each strategy is a large reduction in infections, with a small impact on the normal course of daily life. We evaluate these strategies over the Orange D4D dataset and show the benefit of mobile micro-measures, even if only a fraction of the population participates. These preliminary results demonstrate the potential of mobile technology to complement other measures like vaccination and quarantines against disease epidemics.Comment: Presented at NetMob 2013, Bosto

    When Can Two Unlabeled Networks Be Aligned Under Partial Overlap?

    Get PDF
    Network alignment refers to the problem of matching the vertex sets of two unlabeled graphs, which can be viewed as a generalization of the classic graph isomorphism problem. Network alignment has applications in several fields, including social network analysis, privacy, pattern recognition, computer vision, and computational biology. A number of heuristic algorithms have been proposed in these fields. Recent progress in the analysis of network alignment over stochastic models sheds light on the interplay between network parameters and matchability. In this paper, we consider the alignment problem when the two networks overlap only partially, i.e., there exist vertices in one network that have no counterpart in the other. We define a random bigraph model that generates two correlated graphs G1,2G_{1,2}; it is parameterized by the expected node overlap t2t^2 and by the expected edge overlap s2s^2. We define a cost function for structural mismatch under a particular alignment, and we identify a threshold for perfect matchability: if the average node degrees of G1,2G_{1,2} grow as ω((s2t1log(n))\omega\left( (s^{-2}t^{-1} \log(n) \right), then minimization of the proposed cost function results in an alignment which (i) is over exactly the set of shared nodes between G1G_1 and G2G_2, and (ii) agrees with the true matching between these shared nodes. Our result shows that network alignment is fundamentally robust to partial edge and node overlaps

    Assembling a Network out of Ambiguous Patches

    Get PDF
    Many graph mining and network analysis problems rely on the availability of the full network over a set of nodes. But inferring a full network is sometimes non-trivial if the raw data is in the form of many small {\em patches} or subgraphs, of the true network, and if there are ambiguities in the identities of nodes or edges in these patches. This may happen because of noise or because of the nature of data; for instance, in social networks, names are typically not unique. \textit{Graph assembly} refers to the problem of reconstructing a graph from these many, possibly noisy, partial observations. Prior work suggests that graph assembly is essentially impossible in regimes of interest when the true graph is Erdos-Renyi. The purpose of the present paper is to show that a modest amount of clustering is sufficient to assemble even very large graphs. We introduce the G(n,p;q)G(n,p;q) random graph model, which is the random closure over all open triangles of a G(n,p)G(n,p) Erdos-Renyi, and show that this model exhibits higher clustering than an equivalent Erdos-Renyi. We focus on an extreme case of graph assembly: the patches are small (11-hop egonets) and are unlabeled. We show that in realistic regimes, graph assembly is fundamentally feasible, because we can identify, for every edge ee, a subgraph induced by its neighbors that is unique and present in every patch containing ee. Using this result, we build a practical algorithm that uses canonical labeling to reconstruct the original graph from noiseless patches. We also provide an achievability result for noisy patches, which are obtained by edge-sampling the original egonets

    On the Performance of Percolation Graph Matching

    Get PDF
    Graph matching is a generalization of the classic graph isomorphism problem. By using only their structures a graph-matching algorithm finds a map between the vertex sets of two similar graphs. This has applications in the deanonymization of social and information networks and, more generally, in the merging of structural data from different domains. One class of graph-matching algorithms starts with a known seed set of matched node pairs. Despite the success of these algorithms in practical applications, their performance has been observed to be very sensitive to the size of the seed set. The lack of a rigorous understanding of parameters and performance makes it difficult to design systems and predict their behavior. In this paper, we propose and analyze a very simple percolation-based graph matching algorithm that incrementally maps every pair of nodes (i, j) with at least r neighboring mapped pairs. The simplicity of this algorithm makes possible a rigorous analysis that relies on recent advances in bootstrap percolation theory for the G(n, p) random graph. We prove conditions on the model parameters in which percolation graph matching succeeds, and we establish a phase transition in the size of the seed set. We also confirm through experiments that the performance of percolation graph matching is surprisingly good, both for synthetic graphs and real social-network data
    corecore