1,750 research outputs found

    SLIM : Scalable Linkage of Mobility Data

    Get PDF
    We present a scalable solution to link entities across mobility datasets using their spatio-temporal information. This is a fundamental problem in many applications such as linking user identities for security, understanding privacy limitations of location based services, or producing a unified dataset from multiple sources for urban planning. Such integrated datasets are also essential for service providers to optimise their services and improve business intelligence. In this paper, we first propose a mobility based representation and similarity computation for entities. An efficient matching process is then developed to identify the final linked pairs, with an automated mechanism to decide when to stop the linkage. We scale the process with a locality-sensitive hashing (LSH) based approach that significantly reduces candidate pairs for matching. To realize the effectiveness and efficiency of our techniques in practice, we introduce an algorithm called SLIM. In the experimental evaluation, SLIM outperforms the two existing state-of-the-art approaches in terms of precision and recall. Moreover, the LSH-based approach brings two to four orders of magnitude speedup

    Statistically validated network of portfolio overlaps and systemic risk

    Get PDF
    Common asset holding by financial institutions, namely portfolio overlap, is nowadays regarded as an important channel for financial contagion with the potential to trigger fire sales and thus severe losses at the systemic level. In this paper we propose a method to assess the statistical significance of the overlap between pairs of heterogeneously diversified portfolios, which then allows us to build a validated network of financial institutions where links indicate potential contagion channels due to realized portfolio overlaps. The method is implemented on a historical database of institutional holdings ranging from 1999 to the end of 2013, but can be in general applied to any bipartite network where the presence of similar sets of neighbors is of interest. We find that the proportion of validated network links (i.e., of statistically significant overlaps) increased steadily before the 2007-2008 global financial crisis and reached a maximum when the crisis occurred. We argue that the nature of this measure implies that systemic risk from fire sales liquidation was maximal at that time. After a sharp drop in 2008, systemic risk resumed its growth in 2009, with a notable acceleration in 2013, reaching levels not seen since 2007. We finally show that market trends tend to be amplified in the portfolios identified by the algorithm, such that it is possible to have an informative signal about financial institutions that are about to suffer (enjoy) the most significant losses (gains)

    git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories

    Full text link
    Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure

    Interdependent policy instrument preferences: a two-mode network approach

    Get PDF
    In policymaking, actors are likely to take the preferences of others into account when strategically positioning themselves. However, there is a lack of research that conceives of policy preferences as an interdependent system. In order to analyse interdependencies, we link actors to their policy preferences in water protection, which results in an actor-instrument network. As actors exhibit multiple preferences, a complex two-mode network between actors and policies emerges. We analyse whether actors exhibit interdependent preference profiles given shared policy objectives or social interactions among them. By fitting an exponential random graph model to the actor-instrument network, we find considerable clustering, meaning that actors tend to exhibit preferences for multiple policy instruments in common. Actors tend to exhibit interdependent policy preferences when they are interconnected, that is, they collaborate with each other. By contrast, actors are less likely to share policy preferences when a conflict line divides them

    Equilibrium statistical mechanics on correlated random graphs

    Full text link
    Biological and social networks have recently attracted enormous attention between physicists. Among several, two main aspects may be stressed: A non trivial topology of the graph describing the mutual interactions between agents exists and/or, typically, such interactions are essentially (weighted) imitative. Despite such aspects are widely accepted and empirically confirmed, the schemes currently exploited in order to generate the expected topology are based on a-priori assumptions and in most cases still implement constant intensities for links. Here we propose a simple shift in the definition of patterns in an Hopfield model to convert frustration into dilution: By varying the bias of the pattern distribution, the network topology -which is generated by the reciprocal affinities among agents - crosses various well known regimes (fully connected, linearly diverging connectivity, extreme dilution scenario, no network), coupled with small world properties, which, in this context, are emergent and no longer imposed a-priori. The model is investigated at first focusing on these topological properties of the emergent network, then its thermodynamics is analytically solved (at a replica symmetric level) by extending the double stochastic stability technique, and presented together with its fluctuation theory for a picture of criticality. At least at equilibrium, dilution simply decreases the strength of the coupling felt by the spins, but leaves the paramagnetic/ferromagnetic flavors unchanged. The main difference with respect to previous investigations and a naive picture is that within our approach replicas do not appear: instead of (multi)-overlaps as order parameters, we introduce a class of magnetizations on all the possible sub-graphs belonging to the main one investigated: As a consequence, for these objects a closure for a self-consistent relation is achieved.Comment: 30 pages, 4 figure
    • …
    corecore