42 research outputs found

    ConStance: Modeling Annotation Contexts to Improve Stance Classification

    Full text link
    Manual annotations are a prerequisite for many applications of machine learning. However, weaknesses in the annotation process itself are easy to overlook. In particular, scholars often choose what information to give to annotators without examining these decisions empirically. For subjective tasks such as sentiment analysis, sarcasm, and stance detection, such choices can impact results. Here, for the task of political stance detection on Twitter, we show that providing too little context can result in noisy and uncertain annotations, whereas providing too strong a context may cause it to outweigh other signals. To characterize and reduce these biases, we develop ConStance, a general model for reasoning about annotations across information conditions. Given conflicting labels produced by multiple annotators seeing the same instances with different contexts, ConStance simultaneously estimates gold standard labels and also learns a classifier for new instances. We show that the classifier learned by ConStance outperforms a variety of baselines at predicting political stance, while the model's interpretable parameters shed light on the effects of each context.Comment: To appear at EMNLP 201

    Global Contagion of Non-Viral Information

    Get PDF
    Contagion in Online Social Networks (OSN) is typically measured by the tendency of users to re-post information or to adopt a new behavior after exposure to that information/behavior. Most contagion research is bound by modeling: (i) only local neighbor-to-neighbor contagion (ii) the spread of viral information. However, most contagion events are non-viral and can also occur globally by non-neighbors through for example, exposure to information by exploratory browsing, or by content recommendation algorithms. This study is the first to address the phenomenon of both global and local contagion of non-viral information in a quantitative way. Analysis of Twitter networks reveals the prevailing nature of global contagion, the different temporal patterns between global and local contagion, and the ways it varies across topical categories. An interesting finding shows that users who retweeted due to global contagion have more Followers than those who retweeted due to local contagion

    Sharp power in social media: Patterns from datasets across electoral campaigns

    Get PDF
    Using Christopher Walker’s and Jessica Ludwig’s ‘sharp power’ theoretical framework, and based on some preliminary findings from the May 2019 European Parliament election and the two 2019 rounds of elections in Israel, this article describes a novel method for the automatic detection of political trolls and bots active in Twitter in the October 2019 federal election in Canada. The research identified thousands of accounts invested in Canadian politics that presented a unique activity pattern, significantly different from accounts in a control group. The large-scale cross-cross-sectional approach enabled a distinctive perspective on foreign political meddling in Twitter during the recent federal election campaign. Thisforeign political meddling, we argue, aims at manipulating and poisoning the democratic process and can challenge democracies and their values, as well as their societal resilience

    Predicting Rising Follower Counts on Twitter Using Profile Information

    Full text link
    When evaluating the cause of one's popularity on Twitter, one thing is considered to be the main driver: Many tweets. There is debate about the kind of tweet one should publish, but little beyond tweets. Of particular interest is the information provided by each Twitter user's profile page. One of the features are the given names on those profiles. Studies on psychology and economics identified correlations of the first name to, e.g., one's school marks or chances of getting a job interview in the US. Therefore, we are interested in the influence of those profile information on the follower count. We addressed this question by analyzing the profiles of about 6 Million Twitter users. All profiles are separated into three groups: Users that have a first name, English words, or neither of both in their name field. The assumption is that names and words influence the discoverability of a user and subsequently his/her follower count. We propose a classifier that labels users who will increase their follower count within a month by applying different models based on the user's group. The classifiers are evaluated with the area under the receiver operator curve score and achieves a score above 0.800.Comment: 10 pages, 3 figures, 8 tables, WebSci '17, June 25--28, 2017, Troy, NY, US

    Negative Confidence-Aware Weakly Supervised Binary Classification for Effective Review Helpfulness Classification

    Get PDF
    The incompleteness of positive labels and the presence of many unlabelled instances are common problems in binary classification applications such as in review helpfulness classification. Various studies from the classification literature consider all unlabelled instances as negative examples. However, a classification model that learns to classify binary instances with incomplete positive labels while assuming all unlabelled data to be negative examples will often generate a biased classifier. In this work, we propose a novel Negative Confidence-aware Weakly Supervised approach (NCWS), which customises a binary classification loss function by discriminating the unlabelled examples with different negative confidences during the classifier's training. NCWS allows to effectively, unbiasedly identify and separate positive and negative instances after its integration into various binary classifiers from the literature, including SVM, CNN and BERT-based classifiers. We use the review helpfulness classification as a test case for examining the effectiveness of our NCWS approach. We thoroughly evaluate NCWS by using three different datasets, namely one from Yelp (venue reviews), and two from Amazon (Kindle and Electronics reviews). Our results show that NCWS outperforms strong baselines from the literature including an existing SVM-based approach (i.e. SVM-P), the positive and unlabelled learning-based approach (i.e. C-PU) and the positive confidence-based approach (i.e. P-conf) in addressing the classifier's bias problem. Moreover, we further examine the effectiveness of NCWS by using its classified helpful reviews in a state-of-the-art review-based venue recommendation model (i.e. DeepCoNN) and demonstrate the benefits of using NCWS in enhancing venue recommendation effectiveness in comparison to the baselines

    What’s in a Hashtag? Content based Prediction of the Spread of Ideas in Microblogging Communities

    No full text
    Current social media research mainly focuses on temporal trends of the information flow and on the topology of the social graph that facilitates the propagation of information. In this paper we study the effect of the content of the idea on the information propagation. We present an efficient hybrid approach based on a linear regression for predicting the spread of an idea in a given time frame. We show that a combination of content features with temporal and topological features minimizes prediction error. Our algorithm is evaluated on Twitter hashtags extracted from a dataset of more than 400 million tweets. We analyze the contribution and the limitations of the various feature types to the spread of information, demonstrating that content aspects can be used as strong predictors thus should not be disregarded. We also study the dependencies between global features such as graph topology and content features