Search CORE

6,269 research outputs found

Stance Classification on PTT Comments

Author: Chuang Ju-han
Hsieh Shukai
Publication venue
Publication date: 01/01/2015
Field of study

With the development of social media and online forums, users have grown accustomed to expressing their agreement and disagreement via short texts. Elements that reveal the user’s stance or subjectivity thus becomes an important resource in identifying the user’s position on a given topic. In the current study, we observe comments of an online bulletin board in Taiwan for how people express their stance when responding to other people’s post in Chinese. A lexicon is built based on linguistic analysis and annotation of the data. We performed binary classification task using these linguistic features and was able to reach an average of 71 percent accuracy. A linguistic analysis on the confusion caused in the classification task is done for future work on better accuracy for such task.

CiteSeerX

Waseda University Repository

Rumor Stance Classification in Online Social Networks: A Survey on the State-of-the-Art, Prospects, and Future Challenges

Author: Dadlani Aresh
Jami Sarina
Maham Behrouz
Sabermahani Mohammad M.
Sahebi Iman
Shariatpanahi Seyed P.
Publication venue
Publication date: 02/08/2022
Field of study

The emergence of the Internet as a ubiquitous technology has facilitated the rapid evolution of social media as the leading virtual platform for communication, content sharing, and information dissemination. In spite of revolutionizing the way news used to be delivered to people, this technology has also brought along with itself inevitable demerits. One such drawback is the spread of rumors facilitated by social media platforms which may provoke doubt and fear upon people. Therefore, the need to debunk rumors before their wide spread has become essential all the more. Over the years, many studies have been conducted to develop effective rumor verification systems. One aspect of such studies focuses on rumor stance classification, which concerns the task of utilizing users' viewpoints about a rumorous post to better predict the veracity of a rumor. Relying on users' stances in rumor verification task has gained great importance, for it has shown significant improvements in the model performances. In this paper, we conduct a comprehensive literature review on rumor stance classification in complex social networks. In particular, we present a thorough description of the approaches and mark the top performances. Moreover, we introduce multiple datasets available for this purpose and highlight their limitations. Finally, some challenges and future directions are discussed to stimulate further relevant research efforts.Comment: 13 pages, 2 figures, journa

arXiv.org e-Print Archive

Recommended from our members

Perspective Identification in Informal Text

Author: Elfardy Hebatallah
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

This dissertation studies the problem of identifying the ideological perspective of people as expressed in their written text. One's perspective is often expressed in his/her stance towards polarizing topics. We are interested in studying how nuanced linguistic cues can be used to identify the perspective of a person in informal genres. Moreover, we are interested in exploring the problem from a multilingual perspective comparing and contrasting linguistics devices used in both English informal genres datasets discussing American ideological issues and Arabic discussion fora posts related to Egyptian politics. %In doing so, we solve several challenges. Our first and utmost goal is building computational systems that can successfully identify the perspective from which a given informal text is written while studying what linguistic cues work best for each language and drawing insights into the similarities and differences between the notion of perspective in both studied languages. We build computational systems that can successfully identify the stance of a person in English informal text that deal with different topics that are determined by one's perspective, such as legalization of abortion, feminist movement, gay and gun rights; additionally, we are able to identify a more general notion of perspective–namely the 2012 choice of presidential candidate–as well as build systems for automatically identifying different elements of a person's perspective given an Egyptian discussion forum comment. The systems utilize several lexical and semantic features for both languages. Specifically, for English we explore the use of word sense disambiguation, opinion features, latent and frame semantics as well; as Linguistic Inquiry and Word Count features; in Arabic, however, in addition to using sentiment and latent semantics, we study whether linguistic code-switching (LCS) between the standard and dialectal forms for the language can help as a cue for uncovering the perspective from which a comment was written. This leads us to the challenge of devising computational systems that can handle LCS in Arabic. The Arabic language has a diglossic nature where the standard form of the language (MSA) coexists with the regional dialects (DA) corresponding to the native mother tongue of Arabic speakers in different parts of the Arab world. DA is ubiquitously prevalent in written informal genres and in most cases it is code-switched with MSA. The presence of code-switching degrades the performance of almost any MSA-only trained Natural Language Processing tool when applied to DA or to code-switched MSA-DA content. In order to solve this challenge, we build a state-of-the-art system–AIDA–to computationally handle token and sentence-level code-switching. On a conceptual level, for handling and processing Egyptian ideological perspectives, we note the lack of a taxonomy for the most common perspectives among Egyptians and the lack of corresponding annotated corpora. In solving this challenge, we develop a taxonomy for the most common community perspectives among Egyptians and use an iterative feedback-loop process to devise guidelines on how to successfully annotate a given online discussion forum post with different elements of a person's perspective. Using the proposed taxonomy and annotation guidelines, we annotate a large set of Egyptian discussion fora posts to identify a comment's perspective as conveyed in the priority expressed by the comment, as well as the stance on major political entities

Columbia University Academic Commons

Twitter Stance Detection with Textual, Sentiment, and Target-specific Models

Author: Xu Yuxuan
Publication venue
Publication date: 01/01/2020
Field of study

Today more and more users express their opinions and stances on social media platforms such as Twitter. In this paper, I proposed different approaches to automatically detect the stance of a single tweet. I investigated whether including additional sentiment polarity information and the target information would be beneficial for the stance detection task. Moreover, I also researched whether target-specific features could be generalized to other datasets with different targets for the stance detection task.Master of Science in Information Scienc

Carolina Digital Repository

Automated Classification of Argument Stance in Student Essays: A Linguistically Motivated Approach with an Application for Supporting Argument Summarization

Author: Faulkner Adam Robert
Publication venue: CUNY Academic Works
Publication date: 03/06/2014
Field of study

This study describes a set of document- and sentence-level classification models designed to automate the task of determining the argument stance (for or against) of a student argumentative essay and the task of identifying any arguments in the essay that provide reasons in support of that stance. A suggested application utilizing these models is presented which involves the automated extraction of a single-sentence summary of an argumentative essay. This summary sentence indicates the overall argument stance of the essay from which the sentence was extracted and provides a representative argument in support of that stance. A novel set of document-level stance classification features motivated by linguistic research involving stancetaking language is described. Several document-level classification models incorporating these features are trained and tested on a corpus of student essays annotated for stance. These models achieve accuracies significantly above those of two baseline models. High-accuracy features used by these models include a dependency subtree feature incorporating information about the targets of any stancetaking language in the essay text and a feature capturing the semantic relationship between the essay prompt text and stancetaking language in the essay text. We also describe the construction of a corpus of essay sentences annotated for supporting argument stance. The resulting corpus is used to train and test two sentence-level classification models. The first model is designed to classify a given sentence as a supporting argument or as not a supporting argument, while the second model is designed to classify a supporting argument as holding a for or against stance. Features motivated by influential linguistic analyses of the lexical, discourse, and rhetorical features of supporting arguments are used to build these two models, both of which achieve accuracies above their respective baseline models. An application illustrating an interesting use-case for the models presented in this dissertation is described. This application incorporates all three classification models to extract a single sentence summarizing both the overall stance of a given text along with a convincing reason in support of that stance

City University of New York

Recommended from our members

Cross-Lingual and Low-Resource Sentiment Analysis

Author: Farra Noura
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Identifying sentiment in a low-resource language is essential for understanding opinions internationally and for responding to the urgent needs of locals affected by disaster incidents in different world regions. While tools and resources for recognizing sentiment in high-resource languages are plentiful, determining the most effective methods for achieving this task in a low-resource language which lacks annotated data is still an open research question. Most existing approaches for cross-lingual sentiment analysis to date have relied on high-resource machine translation systems, large amounts of parallel data, or resources only available for Indo-European languages. This work presents methods, resources, and strategies for identifying sentiment cross-lingually in a low-resource language. We introduce a cross-lingual sentiment model which can be trained on a high-resource language and applied directly to a low-resource language. The model offers the feature of lexicalizing the training data using a bilingual dictionary, but can perform well without any translation into the target language. Through an extensive experimental analysis, evaluated on 17 target languages, we show that the model performs well with bilingual word vectors pre-trained on an appropriate translation corpus. We compare in-genre and in-domain parallel corpora, out-of-domain parallel corpora, in-domain comparable corpora, and monolingual corpora, and show that a relatively small, in-domain parallel corpus works best as a transfer medium if it is available. We describe the conditions under which other resources and embedding generation methods are successful, and these include our strategies for leveraging in-domain comparable corpora for cross-lingual sentiment analysis. To enhance the ability of the cross-lingual model to identify sentiment in the target language, we present new feature representations for sentiment analysis that are incorporated in the cross-lingual model: bilingual sentiment embeddings that are used to create bilingual sentiment scores, and a method for updating the sentiment embeddings during training by lexicalization of the target language. This feature configuration works best for the largest number of target languages in both untargeted and targeted cross-lingual sentiment experiments. The cross-lingual model is studied further by evaluating the role of the source language, which has traditionally been assumed to be English. We build cross-lingual models using 15 source languages, including two non-European and non-Indo-European source languages: Arabic and Chinese. We show that language families play an important role in the performance of the model, as does the morphological complexity of the source language. In the last part of the work, we focus on sentiment analysis towards targets. We study Arabic as a representative morphologically complex language and develop models and morphological representation features for identifying entity targets and sentiment expressed towards them in Arabic open-domain text. Finally, we adapt our cross-lingual sentiment models for the detection of sentiment towards targets. Through cross-lingual experiments on Arabic and English, we demonstrate that our findings regarding resources, features, and language also hold true for the transfer of targeted sentiment

Columbia University Academic Commons

Fine-grained Subjectivity and Sentiment Analysis: Recognizing the intensity, polarity, and attitudes of private states

Author: Wilson Theresa Ann
Publication venue
Publication date: 16/06/2008
Field of study

Private states (mental and emotional states) are part of the information that is conveyed in many forms of discourse. News articles often report emotional responses to news stories; editorials, reviews, and weblogs convey opinions and beliefs. This dissertation investigates the manual and automatic identification of linguistic expressions of private states in a corpus of news documents from the world press. A term for the linguistic expression of private states is subjectivity.The conceptual representation of private states used in this dissertation is that of Wiebe et al. (2005). As part of this research, annotators are trained to identify expressions of private states and their properties, such as the source and the intensity of the private state. This dissertation then extends the conceptual representation of private states to better model the attitudes and targets of private states. The inter-annotator agreement studies conducted for this dissertation show that the various concepts in the original and extended representation of private states can be reliably annotated.Exploring the automatic recognition of various types of private states is also a large part of this dissertation. Experiments are conducted that focus on three types of fine-grained subjectivity analysis: recognizing the intensity of clauses and sentences, recognizing the contextual polarity of words and phrases, and recognizing the attribution levels where sentiment and arguing attitudes are expressed. Various supervised machine learning algorithms are used to train automatic systems to perform each of these tasks. These experiments result in automatic systems for performing fine-grained subjectivity analysis that significantly outperform baseline systems

D-Scholarship@Pitt

Book Reviews

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 18/12/2007
Field of study

Repozytorium Uniwersytetu Łódzkiego (University of Lodz Repository)

A corpus-driven study of features of Chinese students' undergraduate writing in UK universities

Author: Leedham Maria Elizabeth
Publication venue
Publication date: 01/01/2011
Field of study

Chinese people now comprise the ‘largest single overseas student group in the UK’ with more than 85,000 Chinese students registered at UK institutions in 2009 (British Council, 2010a). While there have been many studies carried out on short argumentative essays from this group (e.g. Chen, 2009), and on postgraduate theses (e.g. Hyland, 2008b), there has been comparatively little research conducted on the high-stakes genre of undergraduate assignments. This study examines assessed writing from Chinese and British undergraduates studying in UK universities between 2000 and 2008; these are investigated using corpus linguistic procedures, supported by qualitative reading. A particular focus is the use of lexical chunks, or recurring strings of words. Findings from the literature on Chinese students’ written English indicate high use of informal chunks, connecting chunks, and those containing first person pronouns (e.g. Milton, 1999). This study found that while the Chinese students make greater use of particular connectors and the first person plural, both student groups make (limited) use of informal language. These areas of difference are more apparent in year 1/2 assignments than those from year 3, suggesting that students gradually conform to the academy’s expectations. Unexpected findings which have not been previously identified in the literature include Chinese students’ significantly higher use of tables, figures (or ‘visuals’) and lists, compared to the British students’ writing. Detailed exploration of writing within Biology, Economics and Engineering suggests that using visuals and lists are different, yet equally acceptable, ways of writing assignments. Since the writing of both student groups has been judged by discipline specialists to be of a high standard, it is argued that the difference in use of visuals and lists illustrates the range of acceptability at undergraduate level. The thesis proposes that scholars therefore need to consider expanding the notion of what constitutes ‘good’ student writing

CiteSeerX

Open Research Online (The Open University)

The impact of social bots on public COVID-19 perceptions during the 2020 U.S. presidential election

Author: Imouza Anne
Publication venue
Publication date: 01/07/2022
Field of study

Plusieurs études ont démontré que les contenus nuisibles et perturbateurs en ligne sont en partie produits par des acteurs communément appelés robots sociaux. Ils représentent des entités autonomes ou semi-autonomes capables de partager, aimer et poster des messages à des fins préjudiciables. Plusieurs auteurs ont mis en évidence une stratégie utilisée par ces acteurs, l’utilisation du cadrage conflictuel des enjeux. Dans ce mémoire, j’examine les caractéristiques et le potentiel rôle des robots sociaux sur la perception de la COVID-19 en période de forte polarisation au moment de l’élection présidentielle américaine de 2020. Je m’appuie sur plusieurs méthodes en science computationnelle pour analyser les caractéristiques (stratégies et comportements) des robots sociaux ainsi que leur portée politique en utilisant des données Twitter durant l’élection présidentielle de 2020. Les résultats de cette étude montrent que les robots sociaux conservateurs envoient plus de tweets de conspiration que leurs homologues libéraux. Cependant, en termes d’émotion liée à la COVID-19, les humains et les robots ont tous les deux un sentiment positif à l’égard de cet enjeu. Finalement, aucune évidence ne suggère que le contenu négatif et la proportion des robots sociaux ont un effet sur la perception de la COVID-19 par les utilisateurs.Increasing evidence suggests that a growing amount of disruptive and harmful content is generated by rogue actors known as malicious social bots. They are autonomous entities that can share, like, or post messages for detrimental purposes. Several authors have highlighted one strategy employed by those automated actors, the use of a conflicting frame of issues, employed throughout this paper. In this work, I present a framework to depict their potential role in online discussions related to COVID-19 topics around the 2020 U.S. presidential election. I leverage different computational methods to look into their online characteristics and potential impact on the users’ COVID-19 perception using Twitter data during the 2020 U.S. presidential election. The results of this study show that conservative bot users send more conspiracy tweets, but human and bot users talk positively about COVID-19. Social bots do not send more negative tweets or retweets over time than human users. Additionally, no evidence suggests that the negativity of bots’ content, as well as their online proportion, will cause a change in users’ COVID-19 perception

Dépôt Institutionnel Numérique