6,269 research outputs found
Stance Classification on PTT Comments
With the development of social media and online forums, users have grown accustomed to expressing their agreement and disagreement via short texts. Elements that reveal the userâs stance or subjectivity thus becomes an important resource in identifying the userâs position on a given topic. In the current study, we observe comments of an online bulletin board in Taiwan for how people express their stance when responding to other peopleâs post in Chinese. A lexicon is built based on linguistic analysis and annotation of the data. We performed binary classification task using these linguistic features and was able to reach an average of 71 percent accuracy. A linguistic analysis on the confusion caused in the classification task is done for future work on better accuracy for such task.
Rumor Stance Classification in Online Social Networks: A Survey on the State-of-the-Art, Prospects, and Future Challenges
The emergence of the Internet as a ubiquitous technology has facilitated the
rapid evolution of social media as the leading virtual platform for
communication, content sharing, and information dissemination. In spite of
revolutionizing the way news used to be delivered to people, this technology
has also brought along with itself inevitable demerits. One such drawback is
the spread of rumors facilitated by social media platforms which may provoke
doubt and fear upon people. Therefore, the need to debunk rumors before their
wide spread has become essential all the more. Over the years, many studies
have been conducted to develop effective rumor verification systems. One aspect
of such studies focuses on rumor stance classification, which concerns the task
of utilizing users' viewpoints about a rumorous post to better predict the
veracity of a rumor. Relying on users' stances in rumor verification task has
gained great importance, for it has shown significant improvements in the model
performances. In this paper, we conduct a comprehensive literature review on
rumor stance classification in complex social networks. In particular, we
present a thorough description of the approaches and mark the top performances.
Moreover, we introduce multiple datasets available for this purpose and
highlight their limitations. Finally, some challenges and future directions are
discussed to stimulate further relevant research efforts.Comment: 13 pages, 2 figures, journa
Recommended from our members
Perspective Identification in Informal Text
This dissertation studies the problem of identifying the ideological perspective of people as expressed in their written text. One's perspective is often expressed in his/her stance towards polarizing topics. We are interested in studying how nuanced linguistic cues can be used to identify the perspective of a person in informal genres. Moreover, we are interested in exploring the problem from a multilingual perspective comparing and contrasting linguistics devices used in both English informal genres datasets discussing American ideological issues and Arabic discussion fora posts related to Egyptian politics. %In doing so, we solve several challenges.
Our first and utmost goal is building computational systems that can successfully identify the perspective from which a given informal text is written while studying what linguistic cues work best for each language and drawing insights into the similarities and differences between the notion of perspective in both studied languages. We build computational systems that can successfully identify the stance of a person in English informal text that deal with different topics that are determined by one's perspective, such as legalization of abortion, feminist movement, gay and gun rights; additionally, we are able to identify a more general notion of perspectiveânamely the 2012 choice of presidential candidateâas well as build systems for automatically identifying different elements of a person's perspective given an Egyptian discussion forum comment. The systems utilize several lexical and semantic features for both languages. Specifically, for English we explore the use of word sense disambiguation, opinion features, latent and frame semantics as well; as Linguistic Inquiry and Word Count features; in Arabic, however, in addition to using sentiment and latent semantics, we study whether linguistic code-switching (LCS) between the standard and dialectal forms for the language can help as a cue for uncovering the perspective from which a comment was written.
This leads us to the challenge of devising computational systems that can handle LCS in Arabic. The Arabic language has a diglossic nature where the standard form of the language (MSA) coexists with the regional dialects (DA) corresponding to the native mother tongue of Arabic speakers in different parts of the Arab world. DA is ubiquitously prevalent in written informal genres and in most cases it is code-switched with MSA. The presence of code-switching degrades the performance of almost any MSA-only trained Natural Language Processing tool when applied to DA or to code-switched MSA-DA content. In order to solve this challenge, we build a state-of-the-art systemâAIDAâto computationally handle token and sentence-level code-switching.
On a conceptual level, for handling and processing Egyptian ideological perspectives, we note the lack of a taxonomy for the most common perspectives among Egyptians and the lack of corresponding annotated corpora. In solving this challenge, we develop a taxonomy for the most common community perspectives among Egyptians and use an iterative feedback-loop process to devise guidelines on how to successfully annotate a given online discussion forum post with different elements of a person's perspective. Using the proposed taxonomy and annotation guidelines, we annotate a large set of Egyptian discussion fora posts to identify a comment's perspective as conveyed in the priority expressed by the comment, as well as the stance on major political entities
Twitter Stance Detection with Textual, Sentiment, and Target-specific Models
Today more and more users express their opinions and stances on social media platforms such as Twitter. In this paper, I proposed different approaches to automatically detect the stance of a single tweet. I investigated whether including additional sentiment polarity information and the target information would be beneficial for the stance detection task. Moreover, I also researched whether target-specific features could be generalized to other datasets with different targets for the stance detection task.Master of Science in Information Scienc
Automated Classification of Argument Stance in Student Essays: A Linguistically Motivated Approach with an Application for Supporting Argument Summarization
This study describes a set of document- and sentence-level classification models designed to automate the task of determining the argument stance (for or against) of a student argumentative essay and the task of identifying any arguments in the essay that provide reasons in support of that stance. A suggested application utilizing these models is presented which involves the automated extraction of a single-sentence summary of an argumentative essay. This summary sentence indicates the overall argument stance of the essay from which the sentence was extracted and provides a representative argument in support of that stance.
A novel set of document-level stance classification features motivated by linguistic research involving stancetaking language is described. Several document-level classification models incorporating these features are trained and tested on a corpus of student essays annotated for stance. These models achieve accuracies significantly above those of two baseline models. High-accuracy features used by these models include a dependency subtree feature incorporating information about the targets of any stancetaking language in the essay text and a feature capturing the semantic relationship between the essay prompt text and stancetaking language in the essay text.
We also describe the construction of a corpus of essay sentences annotated for supporting argument stance. The resulting corpus is used to train and test two sentence-level classification models. The first model is designed to classify a given sentence as a supporting argument or as not a supporting argument, while the second model is designed to classify a supporting argument as holding a for or against stance. Features motivated by influential linguistic analyses of the lexical, discourse, and rhetorical features of supporting arguments are used to build these two models, both of which achieve accuracies above their respective baseline models.
An application illustrating an interesting use-case for the models presented in this dissertation is described. This application incorporates all three classification models to extract a single sentence summarizing both the overall stance of a given text along with a convincing reason in support of that stance
Recommended from our members
Cross-Lingual and Low-Resource Sentiment Analysis
Identifying sentiment in a low-resource language is essential for understanding opinions internationally and for responding to the urgent needs of locals affected by disaster incidents in different world regions. While tools and resources for recognizing sentiment in high-resource languages are plentiful, determining the most effective methods for achieving this task in a low-resource language which lacks annotated data is still an open research question. Most existing approaches for cross-lingual sentiment analysis to date have relied on high-resource machine translation systems, large amounts of parallel data, or resources only available for Indo-European languages.
This work presents methods, resources, and strategies for identifying sentiment cross-lingually in a low-resource language. We introduce a cross-lingual sentiment model which can be trained on a high-resource language and applied directly to a low-resource language. The model offers the feature of lexicalizing the training data using a bilingual dictionary, but can perform well without any translation into the target language.
Through an extensive experimental analysis, evaluated on 17 target languages, we show that the model performs well with bilingual word vectors pre-trained on an appropriate translation corpus. We compare in-genre and in-domain parallel corpora, out-of-domain parallel corpora, in-domain comparable corpora, and monolingual corpora, and show that a relatively small, in-domain parallel corpus works best as a transfer medium if it is available. We describe the conditions under which other resources and embedding generation methods are successful, and these include our strategies for leveraging in-domain comparable corpora for cross-lingual sentiment analysis.
To enhance the ability of the cross-lingual model to identify sentiment in the target language, we present new feature representations for sentiment analysis that are incorporated in the cross-lingual model: bilingual sentiment embeddings that are used to create bilingual sentiment scores, and a method for updating the sentiment embeddings during training by lexicalization of the target language. This feature configuration works best for the largest number of target languages in both untargeted and targeted cross-lingual sentiment experiments.
The cross-lingual model is studied further by evaluating the role of the source language, which has traditionally been assumed to be English. We build cross-lingual models using 15 source languages, including two non-European and non-Indo-European source languages: Arabic and Chinese. We show that language families play an important role in the performance of the model, as does the morphological complexity of the source language.
In the last part of the work, we focus on sentiment analysis towards targets. We study Arabic as a representative morphologically complex language and develop models and morphological representation features for identifying entity targets and sentiment expressed towards them in Arabic open-domain text. Finally, we adapt our cross-lingual sentiment models for the detection of sentiment towards targets. Through cross-lingual experiments on Arabic and English, we demonstrate that our findings regarding resources, features, and language also hold true for the transfer of targeted sentiment
Fine-grained Subjectivity and Sentiment Analysis: Recognizing the intensity, polarity, and attitudes of private states
Private states (mental and emotional states) are part of the information that is conveyed in many forms of discourse. News articles often report emotional responses to news stories; editorials, reviews, and weblogs convey opinions and beliefs. This dissertation investigates the manual and automatic identification of linguistic expressions of private states in a corpus of news documents from the world press. A term for the linguistic expression of private states is subjectivity.The conceptual representation of private states used in this dissertation is that of Wiebe et al. (2005). As part of this research, annotators are trained to identify expressions of private states and their properties, such as the source and the intensity of the private state. This dissertation then extends the conceptual representation of private states to better model the attitudes and targets of private states. The inter-annotator agreement studies conducted for this dissertation show that the various concepts in the original and extended representation of private states can be reliably annotated.Exploring the automatic recognition of various types of private states is also a large part of this dissertation. Experiments are conducted that focus on three types of fine-grained subjectivity analysis: recognizing the intensity of clauses and sentences, recognizing the contextual polarity of words and phrases, and recognizing the attribution levels where sentiment and arguing attitudes are expressed. Various supervised machine learning algorithms are used to train automatic systems to perform each of these tasks. These experiments result in automatic systems for performing fine-grained subjectivity analysis that significantly outperform baseline systems
A corpus-driven study of features of Chinese students' undergraduate writing in UK universities
Chinese people now comprise the âlargest single overseas student group in the UKâ with more than 85,000 Chinese students registered at UK institutions in 2009 (British Council, 2010a). While there have been many studies carried out on short argumentative essays from this group (e.g. Chen, 2009), and on postgraduate theses (e.g. Hyland, 2008b), there has been comparatively little research conducted on the high-stakes genre of undergraduate assignments. This study examines assessed writing from Chinese and British undergraduates studying in UK universities between 2000 and 2008; these are investigated using corpus linguistic procedures, supported by qualitative reading.
A particular focus is the use of lexical chunks, or recurring strings of words. Findings from the literature on Chinese studentsâ written English indicate high use of informal chunks, connecting chunks, and those containing first person pronouns (e.g. Milton, 1999). This study found that while the Chinese students make greater use of particular connectors and the first person plural, both student groups make (limited) use of informal language. These areas of difference are more apparent in year 1/2 assignments than those from year 3, suggesting that students gradually conform to the academyâs expectations. Unexpected findings which have not been previously identified in the literature include Chinese studentsâ significantly higher use of tables, figures (or âvisualsâ) and lists, compared to the British studentsâ writing. Detailed exploration of writing within Biology, Economics and Engineering suggests that using visuals and lists are different, yet equally acceptable, ways of writing assignments.
Since the writing of both student groups has been judged by discipline specialists to be of a high standard, it is argued that the difference in use of visuals and lists illustrates the range of acceptability at undergraduate level. The thesis proposes that scholars therefore need to consider expanding the notion of what constitutes âgoodâ student writing
The impact of social bots on public COVID-19 perceptions during the 2020 U.S. presidential election
Plusieurs Ă©tudes ont dĂ©montrĂ© que les contenus nuisibles et perturbateurs en ligne sont en partie produits par des acteurs communĂ©ment appelĂ©s robots sociaux. Ils reprĂ©sentent des entitĂ©s autonomes ou semi-autonomes capables de partager, aimer et poster des messages Ă des fins prĂ©judiciables. Plusieurs auteurs ont mis en Ă©vidence une stratĂ©gie utilisĂ©e par ces acteurs, lâutilisation du cadrage conflictuel des enjeux. Dans ce mĂ©moire, jâexamine les caractĂ©ristiques et le potentiel rĂŽle des robots sociaux sur la perception de la COVID-19 en pĂ©riode de forte polarisation au moment de lâĂ©lection prĂ©sidentielle amĂ©ricaine de 2020. Je mâappuie sur plusieurs mĂ©thodes en science computationnelle pour analyser les caractĂ©ristiques (stratĂ©gies et comportements) des robots sociaux ainsi que leur portĂ©e politique en utilisant des donnĂ©es Twitter durant lâĂ©lection prĂ©sidentielle de 2020. Les rĂ©sultats de cette Ă©tude montrent que les robots sociaux conservateurs envoient plus de tweets de conspiration que leurs homologues libĂ©raux. Cependant, en termes dâĂ©motion liĂ©e Ă la COVID-19, les humains et les robots ont tous les deux un sentiment positif Ă lâĂ©gard de cet enjeu. Finalement, aucune Ă©vidence ne suggĂšre que le contenu nĂ©gatif et la proportion des robots sociaux ont un effet sur la perception de la COVID-19 par les utilisateurs.Increasing evidence suggests that a growing amount of disruptive and harmful content is generated by rogue actors known as malicious social bots. They are autonomous entities that can share, like, or post messages for detrimental purposes. Several authors have highlighted one strategy employed by those automated actors, the use of a conflicting frame of issues, employed throughout this paper. In this work, I present a framework to depict their potential role in online discussions related to COVID-19 topics around the 2020 U.S. presidential election. I leverage different computational methods to look into their online characteristics and potential impact on the usersâ COVID-19 perception using Twitter data during the 2020 U.S. presidential election. The results of this study show that conservative bot users send more conspiracy tweets, but human and bot users talk positively about COVID-19. Social bots do not send more negative tweets or retweets over time than human users. Additionally, no evidence suggests that the negativity of botsâ content, as well as their online proportion, will cause a change in usersâ COVID-19 perception
- âŠ