4,237 research outputs found

    Sharing Social Network Data: Differentially Private Estimation of Exponential-Family Random Graph Models

    Get PDF
    Motivated by a real-life problem of sharing social network data that contain sensitive personal information, we propose a novel approach to release and analyze synthetic graphs in order to protect privacy of individual relationships captured by the social network while maintaining the validity of statistical results. A case study using a version of the Enron e-mail corpus dataset demonstrates the application and usefulness of the proposed techniques in solving the challenging problem of maintaining privacy \emph{and} supporting open access to network data to ensure reproducibility of existing studies and discovering new scientific insights that can be obtained by analyzing such data. We use a simple yet effective randomized response mechanism to generate synthetic networks under ϵ\epsilon-edge differential privacy, and then use likelihood based inference for missing data and Markov chain Monte Carlo techniques to fit exponential-family random graph models to the generated synthetic networks.Comment: Updated, 39 page

    The glass ceiling in NLP

    Get PDF

    Social dynamics in conferences: analyses of data from the Live Social Semantics application

    Get PDF
    Popularity and spread of online social networking in recent years has given a great momentum to the study of dynamics and patterns of social interactions. However, these studies have often been confined to the online world, neglecting its interdependencies with the offline world. This is mainly due to the lack of real data that spans across this divide. The Live Social Semantics application is a novel platform that dissolves this divide, by collecting and integrating data about people from (a) their online social networks and tagging activities from popular social networking sites, (b) their publications and co-authorship networks from semantic repositories, and (c) their real-world face-to-face contacts with other attendees collected via a network of wearable active sensors. This paper investigates the data collected by this application during its deployment at three major conferences, where it was used by more than 400 people. Our analyses show the robustness of the patterns of contacts at various conferences, and the influence of various personal properties (e.g. seniority, conference attendance) on social networking patterns

    Modeling social networks from sampled data

    Full text link
    Network models are widely used to represent relational information among interacting units and the structural implications of these relations. Recently, social network studies have focused a great deal of attention on random graph models of networks whose nodes represent individual social actors and whose edges represent a specified relationship between the actors. Most inference for social network models assumes that the presence or absence of all possible links is observed, that the information is completely reliable, and that there are no measurement (e.g., recording) errors. This is clearly not true in practice, as much network data is collected though sample surveys. In addition even if a census of a population is attempted, individuals and links between individuals are missed (i.e., do not appear in the recorded data). In this paper we develop the conceptual and computational theory for inference based on sampled network information. We first review forms of network sampling designs used in practice. We consider inference from the likelihood framework, and develop a typology of network data that reflects their treatment within this frame. We then develop inference for social network models based on information from adaptive network designs. We motivate and illustrate these ideas by analyzing the effect of link-tracing sampling designs on a collaboration network.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS221 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Gender Disparities in Science? Dropout, Productivity, Collaborations and Success of Male and Female Computer Scientists

    Get PDF
    Scientific collaborations shape ideas as well as innovations and are both the substrate for, and the outcome of, academic careers. Recent studies show that gender inequality is still present in many scientific practices ranging from hiring to peer-review processes and grant applications. In this work, we investigate gender-specific differences in collaboration patterns of more than one million computer scientists over the course of 47 years. We explore how these patterns change over years and career ages and how they impact scientific success. Our results highlight that successful male and female scientists reveal the same collaboration patterns: compared to scientists in the same career age, they tend to collaborate with more colleagues than other scientists, seek innovations as brokers and establish longer-lasting and more repetitive collaborations. However, women are on average less likely to adapt the collaboration patterns that are related with success, more likely to embed into ego networks devoid of structural holes, and they exhibit stronger gender homophily as well as a consistently higher dropout rate than men in all career ages
    corecore