1,217 research outputs found

    Network Analysis on Incomplete Structures.

    Full text link
    Over the past decade, networks have become an increasingly popular abstraction for problems in the physical, life, social and information sciences. Network analysis can be used to extract insights into an underlying system from the structure of its network representation. One of the challenges of applying network analysis is the fact that networks do not always have an observed and complete structure. This dissertation focuses on the problem of imputation and/or inference in the presence of incomplete network structures. I propose four novel systems, each of which, contain a module that involves the inference or imputation of an incomplete network that is necessary to complete the end task. I first propose EdgeBoost, a meta-algorithm and framework that repeatedly applies a non-deterministic link predictor to improve the efficacy of community detection algorithms on networks with missing edges. On average EdgeBoost improves performance of existing algorithms by 7% on artificial data and 17% on ego networks collected from Facebook. The second system, Butterworth, identifies a social network user's topic(s) of interests and automatically generates a set of social feed ``rankers'' that enable the user to see topic specific sub-feeds. Butterworth uses link prediction to infer the missing semantics between members of a user's social network in order to detect topical clusters embedded in the network structure. For automatically generated topic lists, Butterworth achieves an average top-10 precision of 78%, as compared to a time-ordered baseline of 45%. Next, I propose Dobby, a system for constructing a knowledge graph of user-defined keyword tags. Leveraging a sparse set of labeled edges, Dobby trains a supervised learning algorithm to infer the hypernym relationships between keyword tags. Dobby was evaluated by constructing a knowledge graph of LinkedIn's skills dataset, achieving an average precision of 85% on a set of human labeled hypernym edges between skills. Lastly, I propose Lobbyback, a system that automatically identifies clusters of documents that exhibit text reuse and generates ``prototypes'' that represent a canonical version of text shared between the documents. Lobbyback infers a network structure in a corpus of documents and uses community detection in order to extract the document clusters.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133443/1/mattburg_1.pd

    Ego Group Partition: A Novel Framework for Improving Ego Experiments in Social Networks

    Full text link
    Estimating the average treatment effect in social networks is challenging due to individuals influencing each other. One approach to address interference is ego cluster experiments, where each cluster consists of a central individual (ego) and its peers (alters). Clusters are randomized, and only the effects on egos are measured. In this work, we propose an improved framework for ego cluster experiments called ego group partition (EGP), which directly generates two groups and an ego sub-population instead of ego clusters. Under specific model assumptions, we propose two ego group partition algorithms. Compared to the original ego clustering algorithm, our algorithms produce more egos, yield smaller biases, and support parallel computation. The performance of our algorithms is validated through simulation and real-world case studies

    Enabling Social Applications via Decentralized Social Data Management

    Full text link
    An unprecedented information wealth produced by online social networks, further augmented by location/collocation data, is currently fragmented across different proprietary services. Combined, it can accurately represent the social world and enable novel socially-aware applications. We present Prometheus, a socially-aware peer-to-peer service that collects social information from multiple sources into a multigraph managed in a decentralized fashion on user-contributed nodes, and exposes it through an interface implementing non-trivial social inferences while complying with user-defined access policies. Simulations and experiments on PlanetLab with emulated application workloads show the system exhibits good end-to-end response time, low communication overhead and resilience to malicious attacks.Comment: 27 pages, single ACM column, 9 figures, accepted in Special Issue of Foundations of Social Computing, ACM Transactions on Internet Technolog

    Evolution of Ego-networks in Social Media with Link Recommendations

    Full text link
    Ego-networks are fundamental structures in social graphs, yet the process of their evolution is still widely unexplored. In an online context, a key question is how link recommender systems may skew the growth of these networks, possibly restraining diversity. To shed light on this matter, we analyze the complete temporal evolution of 170M ego-networks extracted from Flickr and Tumblr, comparing links that are created spontaneously with those that have been algorithmically recommended. We find that the evolution of ego-networks is bursty, community-driven, and characterized by subsequent phases of explosive diameter increase, slight shrinking, and stabilization. Recommendations favor popular and well-connected nodes, limiting the diameter expansion. With a matching experiment aimed at detecting causal relationships from observational data, we find that the bias introduced by the recommendations fosters global diversity in the process of neighbor selection. Last, with two link prediction experiments, we show how insights from our analysis can be used to improve the effectiveness of social recommender systems.Comment: Proceedings of the 10th ACM International Conference on Web Search and Data Mining (WSDM 2017), Cambridge, UK. 10 pages, 16 figures, 1 tabl

    When 'friends' collide: social heterogeneity and user vulnerability on social network sites

    Get PDF
    The present study examines how the use of social network sites (SNS) increases the potential of experiencing psychological, reputational and physical vulnerability online. From our theoretical perspective, concerns over the use of social network sites and online vulnerability stem from the ease with which users can amass large and diverse sets of online social connections and the associated maintenance costs . To date most studies of online vulnerability have relied on self -rep ort measures, rarely combining such information with user’s validated digital characteristics. Here, f or a stratified sample of 177 UK-based Facebook users aged 1 3 to 77, digitally derived network data , coded for content and subjected to structural analysis, were integrated with self -report measures of social network heterogeneity and user vulnerability. Findings indicated a positive association between Facebook network size and online vulnerability mediated by both social diversity and structural features of the network . In particular, network clustering and the number of non- person contacts were predictive of vulnerability. Our findings support the notion that connecting to large networks of online ‘friends’ can lead to increasingly complex online socialising that is no longer controllable at a desirable level

    Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology

    Full text link
    The rise of internet-based services and products in the late 1990's brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the past two decades, organizations such as Airbnb, Alibaba, Amazon, Baidu, Booking, Alphabet's Google, LinkedIn, Lyft, Meta's Facebook, Microsoft, Netflix, Twitter, Uber, and Yandex have invested tremendous resources in online controlled experiments (OCEs) to assess the impact of innovation on their customers and businesses. Running OCEs at scale has presented a host of challenges requiring solutions from many domains. In this paper we review challenges that require new statistical methodologies to address them. In particular, we discuss the practice and culture of online experimentation, as well as its statistics literature, placing the current methodologies within their relevant statistical lineages and providing illustrative examples of OCE applications. Our goal is to raise academic statisticians' awareness of these new research opportunities to increase collaboration between academia and the online industry

    Protecting attributes and contents in online social networks

    Get PDF
    With the extreme popularity of online social networks, security and privacy issues become critical. In particular, it is important to protect user privacy without preventing them from normal socialization. User privacy in the context of data publishing and structural re-identification attacks has been well studied. However, protection of attributes and data content was mostly neglected in the research community. While social network data is rarely published, billions of messages are shared in various social networks on a daily basis. Therefore, it is more important to protect attributes and textual content in social networks. We first study the vulnerabilities of user attributes and contents, in particular, the identifiability of the users when the adversary learns a small piece of information about the target. We have presented two attribute-reidentification attacks that exploit information retrieval and web search techniques. We have shown that large portions of users with online presence are very identifiable, even with a small piece of seed information, and the seed information could be inaccurate. To protect user attributes and content, we adopt the social circle model derived from the concepts of "privacy as user perception" and "information boundary". Users will have different social circles, and share different information in different circles. We introduce a social circle discovery approach using multi-view clustering. We present our observations on the key features of social circles, including friendship links, content similarity and social interactions. We treat each feature as one view, and propose a one-side co-trained spectral clustering technique, which is tailored for the sparse nature of our data. We also propose two evaluation measurements. One is based on the quantitative measure of similarity ratio, while the other employs human evaluators to examine pairs of users, who are selected by the max-risk active evaluation approach. We evaluate our approach on ego networks of twitter users, and present our clustering results. We also compare our proposed clustering technique with single-view clustering and original co-trained spectral clustering techniques. Our results show that multi-view clustering is more accurate for social circle detection; and our proposed approach gains significantly higher similarity ratio than the original multi-view clustering approach. In addition, we build a proof-of-concept implementation of automatic circle detection and recommendation methods. For a user, the system will return its circle detection result from our proposed multi-view clustering technique, and the key words for each circle are also presented. Users can also enter a message they want to post, and the system will suggest which circle to disseminate the message
    • …
    corecore