42 research outputs found
ConStance: Modeling Annotation Contexts to Improve Stance Classification
Manual annotations are a prerequisite for many applications of machine
learning. However, weaknesses in the annotation process itself are easy to
overlook. In particular, scholars often choose what information to give to
annotators without examining these decisions empirically. For subjective tasks
such as sentiment analysis, sarcasm, and stance detection, such choices can
impact results. Here, for the task of political stance detection on Twitter, we
show that providing too little context can result in noisy and uncertain
annotations, whereas providing too strong a context may cause it to outweigh
other signals. To characterize and reduce these biases, we develop ConStance, a
general model for reasoning about annotations across information conditions.
Given conflicting labels produced by multiple annotators seeing the same
instances with different contexts, ConStance simultaneously estimates gold
standard labels and also learns a classifier for new instances. We show that
the classifier learned by ConStance outperforms a variety of baselines at
predicting political stance, while the model's interpretable parameters shed
light on the effects of each context.Comment: To appear at EMNLP 201
Global Contagion of Non-Viral Information
Contagion in Online Social Networks (OSN) is typically measured by the tendency of users to re-post information or to adopt a new behavior after exposure to that information/behavior. Most contagion research is bound by modeling: (i) only local neighbor-to-neighbor contagion (ii) the spread of viral information. However, most contagion events are non-viral and can also occur globally by non-neighbors through for example, exposure to information by exploratory browsing, or by content recommendation algorithms. This study is the first to address the phenomenon of both global and local contagion of non-viral information in a quantitative way. Analysis of Twitter networks reveals the prevailing nature of global contagion, the different temporal patterns between global and local contagion, and the ways it varies across topical categories. An interesting finding shows that users who retweeted due to global contagion have more Followers than those who retweeted due to local contagion
Sharp power in social media: Patterns from datasets across electoral campaigns
Using Christopher Walkerâs and Jessica Ludwigâs âsharp powerâ theoretical framework, and based on some preliminary findings from the May 2019 European Parliament election and the two 2019 rounds of elections in Israel, this article describes a novel method for the automatic detection of political trolls and bots active in Twitter in the October 2019 federal election in Canada. The research identified thousands of accounts invested in Canadian politics that presented a unique activity pattern, significantly different from accounts in a control group. The large-scale cross-cross-sectional approach enabled a distinctive perspective on foreign political meddling in Twitter during the recent federal election campaign. Thisforeign political meddling, we argue, aims at manipulating and poisoning the democratic process and can challenge democracies and their values, as well as their societal resilience
Predicting Rising Follower Counts on Twitter Using Profile Information
When evaluating the cause of one's popularity on Twitter, one thing is
considered to be the main driver: Many tweets. There is debate about the kind
of tweet one should publish, but little beyond tweets. Of particular interest
is the information provided by each Twitter user's profile page. One of the
features are the given names on those profiles. Studies on psychology and
economics identified correlations of the first name to, e.g., one's school
marks or chances of getting a job interview in the US. Therefore, we are
interested in the influence of those profile information on the follower count.
We addressed this question by analyzing the profiles of about 6 Million Twitter
users. All profiles are separated into three groups: Users that have a first
name, English words, or neither of both in their name field. The assumption is
that names and words influence the discoverability of a user and subsequently
his/her follower count. We propose a classifier that labels users who will
increase their follower count within a month by applying different models based
on the user's group. The classifiers are evaluated with the area under the
receiver operator curve score and achieves a score above 0.800.Comment: 10 pages, 3 figures, 8 tables, WebSci '17, June 25--28, 2017, Troy,
NY, US
Proceedings of the Second Workshop on Natural Language Processing and Computational Social Science
Negative Confidence-Aware Weakly Supervised Binary Classification for Effective Review Helpfulness Classification
The incompleteness of positive labels and the presence of many unlabelled instances are common problems in binary classification applications such as in review helpfulness classification. Various studies from the classification literature consider all unlabelled instances as negative examples. However, a classification model that learns to classify binary instances with incomplete positive labels while assuming all unlabelled data to be negative examples will often generate a biased classifier. In this work, we propose a novel Negative Confidence-aware Weakly Supervised approach (NCWS), which customises a binary classification loss function by discriminating the unlabelled examples with different negative confidences during the classifier's training. NCWS allows to effectively, unbiasedly identify and separate positive and negative instances after its integration into various binary classifiers from the literature, including SVM, CNN and BERT-based classifiers. We use the review helpfulness classification as a test case for examining the effectiveness of our NCWS approach. We thoroughly evaluate NCWS by using three different datasets, namely one from Yelp (venue reviews), and two from Amazon (Kindle and Electronics reviews). Our results show that NCWS outperforms strong baselines from the literature including an existing SVM-based approach (i.e. SVM-P), the positive and unlabelled learning-based approach (i.e. C-PU) and the positive confidence-based approach (i.e. P-conf) in addressing the classifier's bias problem. Moreover, we further examine the effectiveness of NCWS by using its classified helpful reviews in a state-of-the-art review-based venue recommendation model (i.e. DeepCoNN) and demonstrate the benefits of using NCWS in enhancing venue recommendation effectiveness in comparison to the baselines
Whatâs in a Hashtag? Content based Prediction of the Spread of Ideas in Microblogging Communities
Current social media research mainly focuses on temporal trends of the information flow and on the topology of the social graph that facilitates the propagation of information. In this paper we study the effect of the content of the idea on the information propagation. We present an efficient hybrid approach based on a linear regression for predicting the spread of an idea in a given time frame. We show that a combination of content features with temporal and topological features minimizes prediction error. Our algorithm is evaluated on Twitter hashtags extracted from a dataset of more than 400 million tweets. We analyze the contribution and the limitations of the various feature types to the spread of information, demonstrating that content aspects can be used as strong predictors thus should not be disregarded. We also study the dependencies between global features such as graph topology and content features