1,417 research outputs found
How Polarized Have We Become? A Multimodal Classification of Trump Followers and Clinton Followers
Polarization in American politics has been extensively documented and
analyzed for decades, and the phenomenon became all the more apparent during
the 2016 presidential election, where Trump and Clinton depicted two radically
different pictures of America. Inspired by this gaping polarization and the
extensive utilization of Twitter during the 2016 presidential campaign, in this
paper we take the first step in measuring polarization in social media and we
attempt to predict individuals' Twitter following behavior through analyzing
ones' everyday tweets, profile images and posted pictures. As such, we treat
polarization as a classification problem and study to what extent Trump
followers and Clinton followers on Twitter can be distinguished, which in turn
serves as a metric of polarization in general. We apply LSTM to processing
tweet features and we extract visual features using the VGG neural network.
Integrating these two sets of features boosts the overall performance. We are
able to achieve an accuracy of 69%, suggesting that the high degree of
polarization recorded in the literature has started to manifest itself in
social media as well.Comment: 16 pages, SocInfo 2017, 9th International Conference on Social
Informatic
Co-training for Demographic Classification Using Deep Learning from Label Proportions
Deep learning algorithms have recently produced state-of-the-art accuracy in
many classification tasks, but this success is typically dependent on access to
many annotated training examples. For domains without such data, an attractive
alternative is to train models with light, or distant supervision. In this
paper, we introduce a deep neural network for the Learning from Label
Proportion (LLP) setting, in which the training data consist of bags of
unlabeled instances with associated label distributions for each bag. We
introduce a new regularization layer, Batch Averager, that can be appended to
the last layer of any deep neural network to convert it from supervised
learning to LLP. This layer can be implemented readily with existing deep
learning packages. To further support domains in which the data consist of two
conditionally independent feature views (e.g. image and text), we propose a
co-training algorithm that iteratively generates pseudo bags and refits the
deep LLP model to improve classification accuracy. We demonstrate our models on
demographic attribute classification (gender and race/ethnicity), which has
many applications in social media analysis, public health, and marketing. We
conduct experiments to predict demographics of Twitter users based on their
tweets and profile image, without requiring any user-level annotations for
training. We find that the deep LLP approach outperforms baselines for both
text and image features separately. Additionally, we find that co-training
algorithm improves image and text classification by 4% and 8% absolute F1,
respectively. Finally, an ensemble of text and image classifiers further
improves the absolute F1 measure by 4% on average
Listening between the Lines: Learning Personal Attributes from Conversations
Open-domain dialogue agents must be able to converse about many topics while
incorporating knowledge about the user into the conversation. In this work we
address the acquisition of such knowledge, for personalization in downstream
Web applications, by extracting personal attributes from conversations. This
problem is more challenging than the established task of information extraction
from scientific publications or Wikipedia articles, because dialogues often
give merely implicit cues about the speaker. We propose methods for inferring
personal attributes, such as profession, age or family status, from
conversations using deep learning. Specifically, we propose several Hidden
Attribute Models, which are neural networks leveraging attention mechanisms and
embeddings. Our methods are trained on a per-predicate basis to output rankings
of object values for a given subject-predicate combination (e.g., ranking the
doctor and nurse professions high when speakers talk about patients, emergency
rooms, etc). Experiments with various conversational texts including Reddit
discussions, movie scripts and a collection of crowdsourced personal dialogues
demonstrate the viability of our methods and their superior performance
compared to state-of-the-art baselines.Comment: published in WWW'1
- …