25 research outputs found
Learning from Noisy Label Distributions
In this paper, we consider a novel machine learning problem, that is,
learning a classifier from noisy label distributions. In this problem, each
instance with a feature vector belongs to at least one group. Then, instead of
the true label of each instance, we observe the label distribution of the
instances associated with a group, where the label distribution is distorted by
an unknown noise. Our goals are to (1) estimate the true label of each
instance, and (2) learn a classifier that predicts the true label of a new
instance. We propose a probabilistic model that considers true label
distributions of groups and parameters that represent the noise as hidden
variables. The model can be learned based on a variational Bayesian method. In
numerical experiments, we show that the proposed model outperforms existing
methods in terms of the estimation of the true labels of instances.Comment: Accepted in ICANN201
Predicting the age of social network users from user-generated texts with word embeddings
© 2016 FRUCT.Many web-based applications such as advertising or recommender systems often critically depend on the demographic information, which may be unavailable for new or anonymous users. We study the problem of predicting demographic information based on user-generated texts on a Russian-language dataset from a large social network. We evaluate the efficiency of age prediction algorithms based on word2vec word embeddings and conduct a comprehensive experimental evaluation, comparing these algorithms with each other and with classical baseline approaches