1,223 research outputs found
Multilingual Twitter Sentiment Classification: The Role of Human Annotators
What are the limits of automated Twitter sentiment classification? We analyze
a large set of manually labeled tweets in different languages, use them as
training data, and construct automated classification models. It turns out that
the quality of classification models depends much more on the quality and size
of training data than on the type of the model trained. Experimental results
indicate that there is no statistically significant difference between the
performance of the top classification models. We quantify the quality of
training data by applying various annotator agreement measures, and identify
the weakest points of different datasets. We show that the model performance
approaches the inter-annotator agreement when the size of the training set is
sufficiently large. However, it is crucial to regularly monitor the self- and
inter-annotator agreements since this improves the training datasets and
consequently the model performance. Finally, we show that there is strong
evidence that humans perceive the sentiment classes (negative, neutral, and
positive) as ordered
Hierarchical Character-Word Models for Language Identification
Social media messages' brevity and unconventional spelling pose a challenge
to language identification. We introduce a hierarchical model that learns
character and contextualized word-level representations for language
identification. Our method performs well against strong base- lines, and can
also reveal code-switching
Predicting Rising Follower Counts on Twitter Using Profile Information
When evaluating the cause of one's popularity on Twitter, one thing is
considered to be the main driver: Many tweets. There is debate about the kind
of tweet one should publish, but little beyond tweets. Of particular interest
is the information provided by each Twitter user's profile page. One of the
features are the given names on those profiles. Studies on psychology and
economics identified correlations of the first name to, e.g., one's school
marks or chances of getting a job interview in the US. Therefore, we are
interested in the influence of those profile information on the follower count.
We addressed this question by analyzing the profiles of about 6 Million Twitter
users. All profiles are separated into three groups: Users that have a first
name, English words, or neither of both in their name field. The assumption is
that names and words influence the discoverability of a user and subsequently
his/her follower count. We propose a classifier that labels users who will
increase their follower count within a month by applying different models based
on the user's group. The classifiers are evaluated with the area under the
receiver operator curve score and achieves a score above 0.800.Comment: 10 pages, 3 figures, 8 tables, WebSci '17, June 25--28, 2017, Troy,
NY, US
- …