20,395 research outputs found
Predicting Twitter user socioeconomic attributes with network and language information
Inferring socioeconomic attributes of social media users such as occupation and income is an important problem in computational social science. Automated inference of such characteristics has applications in personalised recommender systems, targeted computational advertising and online political campaigning. While previous work has shown that language features can reliably predict socioeconomic attributes on Twitter, employing information coming from users' social networks has not yet been explored for such complex user characteristics. In this paper, we describe a method for predicting the occupational class and the income of Twitter users given information extracted from their extended networks by learning a low-dimensional vector representation of users, i.e. graph embeddings. We use this representation to train predictive models for occupational class and income. Results on two publicly available datasets show that our method consistently outperforms the state-of-the-art methods in both tasks. We also obtain further significant improvements when we combine graph embeddings with textual features, demonstrating that social network and language information are complementary
Organizational Chart Inference
Nowadays, to facilitate the communication and cooperation among employees, a
new family of online social networks has been adopted in many companies, which
are called the "enterprise social networks" (ESNs). ESNs can provide employees
with various professional services to help them deal with daily work issues.
Meanwhile, employees in companies are usually organized into different
hierarchies according to the relative ranks of their positions. The company
internal management structure can be outlined with the organizational chart
visually, which is normally confidential to the public out of the privacy and
security concerns. In this paper, we want to study the IOC (Inference of
Organizational Chart) problem to identify company internal organizational chart
based on the heterogeneous online ESN launched in it. IOC is very challenging
to address as, to guarantee smooth operations, the internal organizational
charts of companies need to meet certain structural requirements (about its
depth and width). To solve the IOC problem, a novel unsupervised method Create
(ChArT REcovEr) is proposed in this paper, which consists of 3 steps: (1)
social stratification of ESN users into different social classes, (2)
supervision link inference from managers to subordinates, and (3) consecutive
social classes matching to prune the redundant supervision links. Extensive
experiments conducted on real-world online ESN dataset demonstrate that Create
can perform very well in addressing the IOC problem.Comment: 10 pages, 9 figures, 1 table. The paper is accepted by KDD 201
Online Privacy as a Collective Phenomenon
The problem of online privacy is often reduced to individual decisions to
hide or reveal personal information in online social networks (OSNs). However,
with the increasing use of OSNs, it becomes more important to understand the
role of the social network in disclosing personal information that a user has
not revealed voluntarily: How much of our private information do our friends
disclose about us, and how much of our privacy is lost simply because of online
social interaction? Without strong technical effort, an OSN may be able to
exploit the assortativity of human private features, this way constructing
shadow profiles with information that users chose not to share. Furthermore,
because many users share their phone and email contact lists, this allows an
OSN to create full shadow profiles for people who do not even have an account
for this OSN.
We empirically test the feasibility of constructing shadow profiles of sexual
orientation for users and non-users, using data from more than 3 Million
accounts of a single OSN. We quantify a lower bound for the predictive power
derived from the social network of a user, to demonstrate how the
predictability of sexual orientation increases with the size of this network
and the tendency to share personal information. This allows us to define a
privacy leak factor that links individual privacy loss with the decision of
other individuals to disclose information. Our statistical analysis reveals
that some individuals are at a higher risk of privacy loss, as prediction
accuracy increases for users with a larger and more homogeneous first- and
second-order neighborhood of their social network. While we do not provide
evidence that shadow profiles exist at all, our results show that disclosing of
private information is not restricted to an individual choice, but becomes a
collective decision that has implications for policy and privacy regulation
Listening between the Lines: Learning Personal Attributes from Conversations
Open-domain dialogue agents must be able to converse about many topics while
incorporating knowledge about the user into the conversation. In this work we
address the acquisition of such knowledge, for personalization in downstream
Web applications, by extracting personal attributes from conversations. This
problem is more challenging than the established task of information extraction
from scientific publications or Wikipedia articles, because dialogues often
give merely implicit cues about the speaker. We propose methods for inferring
personal attributes, such as profession, age or family status, from
conversations using deep learning. Specifically, we propose several Hidden
Attribute Models, which are neural networks leveraging attention mechanisms and
embeddings. Our methods are trained on a per-predicate basis to output rankings
of object values for a given subject-predicate combination (e.g., ranking the
doctor and nurse professions high when speakers talk about patients, emergency
rooms, etc). Experiments with various conversational texts including Reddit
discussions, movie scripts and a collection of crowdsourced personal dialogues
demonstrate the viability of our methods and their superior performance
compared to state-of-the-art baselines.Comment: published in WWW'1
- …