20,395 research outputs found

    Predicting Twitter user socioeconomic attributes with network and language information

    Get PDF
    Inferring socioeconomic attributes of social media users such as occupation and income is an important problem in computational social science. Automated inference of such characteristics has applications in personalised recommender systems, targeted computational advertising and online political campaigning. While previous work has shown that language features can reliably predict socioeconomic attributes on Twitter, employing information coming from users' social networks has not yet been explored for such complex user characteristics. In this paper, we describe a method for predicting the occupational class and the income of Twitter users given information extracted from their extended networks by learning a low-dimensional vector representation of users, i.e. graph embeddings. We use this representation to train predictive models for occupational class and income. Results on two publicly available datasets show that our method consistently outperforms the state-of-the-art methods in both tasks. We also obtain further significant improvements when we combine graph embeddings with textual features, demonstrating that social network and language information are complementary

    Organizational Chart Inference

    Full text link
    Nowadays, to facilitate the communication and cooperation among employees, a new family of online social networks has been adopted in many companies, which are called the "enterprise social networks" (ESNs). ESNs can provide employees with various professional services to help them deal with daily work issues. Meanwhile, employees in companies are usually organized into different hierarchies according to the relative ranks of their positions. The company internal management structure can be outlined with the organizational chart visually, which is normally confidential to the public out of the privacy and security concerns. In this paper, we want to study the IOC (Inference of Organizational Chart) problem to identify company internal organizational chart based on the heterogeneous online ESN launched in it. IOC is very challenging to address as, to guarantee smooth operations, the internal organizational charts of companies need to meet certain structural requirements (about its depth and width). To solve the IOC problem, a novel unsupervised method Create (ChArT REcovEr) is proposed in this paper, which consists of 3 steps: (1) social stratification of ESN users into different social classes, (2) supervision link inference from managers to subordinates, and (3) consecutive social classes matching to prune the redundant supervision links. Extensive experiments conducted on real-world online ESN dataset demonstrate that Create can perform very well in addressing the IOC problem.Comment: 10 pages, 9 figures, 1 table. The paper is accepted by KDD 201

    Online Privacy as a Collective Phenomenon

    Full text link
    The problem of online privacy is often reduced to individual decisions to hide or reveal personal information in online social networks (OSNs). However, with the increasing use of OSNs, it becomes more important to understand the role of the social network in disclosing personal information that a user has not revealed voluntarily: How much of our private information do our friends disclose about us, and how much of our privacy is lost simply because of online social interaction? Without strong technical effort, an OSN may be able to exploit the assortativity of human private features, this way constructing shadow profiles with information that users chose not to share. Furthermore, because many users share their phone and email contact lists, this allows an OSN to create full shadow profiles for people who do not even have an account for this OSN. We empirically test the feasibility of constructing shadow profiles of sexual orientation for users and non-users, using data from more than 3 Million accounts of a single OSN. We quantify a lower bound for the predictive power derived from the social network of a user, to demonstrate how the predictability of sexual orientation increases with the size of this network and the tendency to share personal information. This allows us to define a privacy leak factor that links individual privacy loss with the decision of other individuals to disclose information. Our statistical analysis reveals that some individuals are at a higher risk of privacy loss, as prediction accuracy increases for users with a larger and more homogeneous first- and second-order neighborhood of their social network. While we do not provide evidence that shadow profiles exist at all, our results show that disclosing of private information is not restricted to an individual choice, but becomes a collective decision that has implications for policy and privacy regulation

    Listening between the Lines: Learning Personal Attributes from Conversations

    Full text link
    Open-domain dialogue agents must be able to converse about many topics while incorporating knowledge about the user into the conversation. In this work we address the acquisition of such knowledge, for personalization in downstream Web applications, by extracting personal attributes from conversations. This problem is more challenging than the established task of information extraction from scientific publications or Wikipedia articles, because dialogues often give merely implicit cues about the speaker. We propose methods for inferring personal attributes, such as profession, age or family status, from conversations using deep learning. Specifically, we propose several Hidden Attribute Models, which are neural networks leveraging attention mechanisms and embeddings. Our methods are trained on a per-predicate basis to output rankings of object values for a given subject-predicate combination (e.g., ranking the doctor and nurse professions high when speakers talk about patients, emergency rooms, etc). Experiments with various conversational texts including Reddit discussions, movie scripts and a collection of crowdsourced personal dialogues demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines.Comment: published in WWW'1
    • …
    corecore