1,223 research outputs found

    Multilingual Twitter Sentiment Classification: The Role of Human Annotators

    Get PDF
    What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered

    Hierarchical Character-Word Models for Language Identification

    Full text link
    Social media messages' brevity and unconventional spelling pose a challenge to language identification. We introduce a hierarchical model that learns character and contextualized word-level representations for language identification. Our method performs well against strong base- lines, and can also reveal code-switching

    Predicting Rising Follower Counts on Twitter Using Profile Information

    Full text link
    When evaluating the cause of one's popularity on Twitter, one thing is considered to be the main driver: Many tweets. There is debate about the kind of tweet one should publish, but little beyond tweets. Of particular interest is the information provided by each Twitter user's profile page. One of the features are the given names on those profiles. Studies on psychology and economics identified correlations of the first name to, e.g., one's school marks or chances of getting a job interview in the US. Therefore, we are interested in the influence of those profile information on the follower count. We addressed this question by analyzing the profiles of about 6 Million Twitter users. All profiles are separated into three groups: Users that have a first name, English words, or neither of both in their name field. The assumption is that names and words influence the discoverability of a user and subsequently his/her follower count. We propose a classifier that labels users who will increase their follower count within a month by applying different models based on the user's group. The classifiers are evaluated with the area under the receiver operator curve score and achieves a score above 0.800.Comment: 10 pages, 3 figures, 8 tables, WebSci '17, June 25--28, 2017, Troy, NY, US
    • …
    corecore