300 research outputs found
Deep Text Mining of Instagram Data Without Strong Supervision
With the advent of social media, our online feeds increasingly consist of
short, informal, and unstructured text. This textual data can be analyzed for
the purpose of improving user recommendations and detecting trends. Instagram
is one of the largest social media platforms, containing both text and images.
However, most of the prior research on text processing in social media is
focused on analyzing Twitter data, and little attention has been paid to text
mining of Instagram data. Moreover, many text mining methods rely on annotated
training data, which in practice is both difficult and expensive to obtain. In
this paper, we present methods for unsupervised mining of fashion attributes
from Instagram text, which can enable a new kind of user recommendation in the
fashion domain. In this context, we analyze a corpora of Instagram posts from
the fashion domain, introduce a system for extracting fashion attributes from
Instagram, and train a deep clothing classifier with weak supervision to
classify Instagram posts based on the associated text.
With our experiments, we confirm that word embeddings are a useful asset for
information extraction. Experimental results show that information extraction
using word embeddings outperforms a baseline that uses Levenshtein distance.
The results also show the benefit of combining weak supervision signals using
generative models instead of majority voting. Using weak supervision and
generative modeling, an F1 score of 0.61 is achieved on the task of classifying
the image contents of Instagram posts based solely on the associated text,
which is on level with human performance. Finally, our empirical study provides
one of the few available studies on Instagram text and shows that the text is
noisy, that the text distribution exhibits the long-tail phenomenon, and that
comment sections on Instagram are multi-lingual.Comment: 8 pages, 5 figures. Pre-print for paper to appear in conference
proceedings for the Web Intelligence Conferenc
Learning Representations of Social Media Users
User representations are routinely used in recommendation systems by platform
developers, targeted advertisements by marketers, and by public policy
researchers to gauge public opinion across demographic groups. Computer
scientists consider the problem of inferring user representations more
abstractly; how does one extract a stable user representation - effective for
many downstream tasks - from a medium as noisy and complicated as social media?
The quality of a user representation is ultimately task-dependent (e.g. does
it improve classifier performance, make more accurate recommendations in a
recommendation system) but there are proxies that are less sensitive to the
specific task. Is the representation predictive of latent properties such as a
person's demographic features, socioeconomic class, or mental health state? Is
it predictive of the user's future behavior?
In this thesis, we begin by showing how user representations can be learned
from multiple types of user behavior on social media. We apply several
extensions of generalized canonical correlation analysis to learn these
representations and evaluate them at three tasks: predicting future hashtag
mentions, friending behavior, and demographic features. We then show how user
features can be employed as distant supervision to improve topic model fit.
Finally, we show how user features can be integrated into and improve existing
classifiers in the multitask learning framework. We treat user representations
- ground truth gender and mental health features - as auxiliary tasks to
improve mental health state prediction. We also use distributed user
representations learned in the first chapter to improve tweet-level stance
classifiers, showing that distant user information can inform classification
tasks at the granularity of a single message.Comment: PhD thesi
Learning Representations of Social Media Users
User representations are routinely used in recommendation systems by platform
developers, targeted advertisements by marketers, and by public policy
researchers to gauge public opinion across demographic groups. Computer
scientists consider the problem of inferring user representations more
abstractly; how does one extract a stable user representation - effective for
many downstream tasks - from a medium as noisy and complicated as social media?
The quality of a user representation is ultimately task-dependent (e.g. does
it improve classifier performance, make more accurate recommendations in a
recommendation system) but there are proxies that are less sensitive to the
specific task. Is the representation predictive of latent properties such as a
person's demographic features, socioeconomic class, or mental health state? Is
it predictive of the user's future behavior?
In this thesis, we begin by showing how user representations can be learned
from multiple types of user behavior on social media. We apply several
extensions of generalized canonical correlation analysis to learn these
representations and evaluate them at three tasks: predicting future hashtag
mentions, friending behavior, and demographic features. We then show how user
features can be employed as distant supervision to improve topic model fit.
Finally, we show how user features can be integrated into and improve existing
classifiers in the multitask learning framework. We treat user representations
- ground truth gender and mental health features - as auxiliary tasks to
improve mental health state prediction. We also use distributed user
representations learned in the first chapter to improve tweet-level stance
classifiers, showing that distant user information can inform classification
tasks at the granularity of a single message.Comment: PhD thesi
Sentiment Analysis of Persian Language: Review of Algorithms, Approaches and Datasets
Sentiment analysis aims to extract people's emotions and opinion from their
comments on the web. It widely used in businesses to detect sentiment in social
data, gauge brand reputation, and understand customers. Most of articles in
this area have concentrated on the English language whereas there are limited
resources for Persian language. In this review paper, recent published articles
between 2018 and 2022 in sentiment analysis in Persian Language have been
collected and their methods, approach and dataset will be explained and
analyzed. Almost all the methods used to solve sentiment analysis are machine
learning and deep learning. The purpose of this paper is to examine 40
different approach sentiment analysis in the Persian Language, analysis
datasets along with the accuracy of the algorithms applied to them and also
review strengths and weaknesses of each. Among all the methods, transformers
such as BERT and RNN Neural Networks such as LSTM and Bi-LSTM have achieved
higher accuracy in the sentiment analysis. In addition to the methods and
approaches, the datasets reviewed are listed between 2018 and 2022 and
information about each dataset and its details are provided
- …