12,876 research outputs found
Gender Detection on Social Networks using Ensemble Deep Learning
Analyzing the ever-increasing volume of posts on social media sites such as
Facebook and Twitter requires improved information processing methods for
profiling authorship. Document classification is central to this task, but the
performance of traditional supervised classifiers has degraded as the volume of
social media has increased. This paper addresses this problem in the context of
gender detection through ensemble classification that employs multi-model deep
learning architectures to generate specialized understanding from different
feature spaces
Co-training for Demographic Classification Using Deep Learning from Label Proportions
Deep learning algorithms have recently produced state-of-the-art accuracy in
many classification tasks, but this success is typically dependent on access to
many annotated training examples. For domains without such data, an attractive
alternative is to train models with light, or distant supervision. In this
paper, we introduce a deep neural network for the Learning from Label
Proportion (LLP) setting, in which the training data consist of bags of
unlabeled instances with associated label distributions for each bag. We
introduce a new regularization layer, Batch Averager, that can be appended to
the last layer of any deep neural network to convert it from supervised
learning to LLP. This layer can be implemented readily with existing deep
learning packages. To further support domains in which the data consist of two
conditionally independent feature views (e.g. image and text), we propose a
co-training algorithm that iteratively generates pseudo bags and refits the
deep LLP model to improve classification accuracy. We demonstrate our models on
demographic attribute classification (gender and race/ethnicity), which has
many applications in social media analysis, public health, and marketing. We
conduct experiments to predict demographics of Twitter users based on their
tweets and profile image, without requiring any user-level annotations for
training. We find that the deep LLP approach outperforms baselines for both
text and image features separately. Additionally, we find that co-training
algorithm improves image and text classification by 4% and 8% absolute F1,
respectively. Finally, an ensemble of text and image classifiers further
improves the absolute F1 measure by 4% on average
Group-level Emotion Recognition using Transfer Learning from Face Identification
In this paper, we describe our algorithmic approach, which was used for
submissions in the fifth Emotion Recognition in the Wild (EmotiW 2017)
group-level emotion recognition sub-challenge. We extracted feature vectors of
detected faces using the Convolutional Neural Network trained for face
identification task, rather than traditional pre-training on emotion
recognition problems. In the final pipeline an ensemble of Random Forest
classifiers was learned to predict emotion score using available training set.
In case when the faces have not been detected, one member of our ensemble
extracts features from the whole image. During our experimental study, the
proposed approach showed the lowest error rate when compared to other explored
techniques. In particular, we achieved 75.4% accuracy on the validation data,
which is 20% higher than the handcrafted feature-based baseline. The source
code using Keras framework is publicly available.Comment: 5 pages, 3 figures, accepted for publication at ICMI17 (EmotiW Grand
Challenge
Adversarial Removal of Demographic Attributes from Text Data
Recent advances in Representation Learning and Adversarial Training seem to
succeed in removing unwanted features from the learned representation. We show
that demographic information of authors is encoded in -- and can be recovered
from -- the intermediate representations learned by text-based neural
classifiers. The implication is that decisions of classifiers trained on
textual data are not agnostic to -- and likely condition on -- demographic
attributes. When attempting to remove such demographic information using
adversarial training, we find that while the adversarial component achieves
chance-level development-set accuracy during training, a post-hoc classifier,
trained on the encoded sentences from the first part, still manages to reach
substantially higher classification accuracies on the same data. This behavior
is consistent across several tasks, demographic properties and datasets. We
explore several techniques to improve the effectiveness of the adversarial
component. Our main conclusion is a cautionary one: do not rely on the
adversarial training to achieve invariant representation to sensitive features
- …