Search CORE

175 research outputs found

Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation

Author: Jia Chenyan
Liu Ruibo
Ma Weicheng
Vosoughi Soroush
Wang Lili
Xu Guangxuan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation. We evaluate Data Boost on three diverse text classification tasks under five different classifier architectures. The result shows that Data Boost can boost the performance of classifiers especially in low-resource data scenarios. For instance, Data Boost improves F1 for the three tasks by 8.7% on average when given only 10% of the whole data for training. We also compare Data Boost with six prior text augmentation methods. Through human evaluations (N=178), we confirm that Data Boost augmentation has comparable quality as the original data with respect to readability and class consistency.Comment: In proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020). Onlin

arXiv.org e-Print Archive

Cyberbullying Classification based on Social Network Analysis

Author: Wang Anqi
Publication venue: SJSU ScholarWorks
Publication date: 25/05/2021
Field of study

With the popularity of social media platforms such as Facebook, Twitter, and Instagram, people widely share their opinions and comments over the Internet. Exten- sive use of social media has also caused a lot of problems. A representative problem is Cyberbullying, which is a serious social problem, mostly among teenagers. Cyber- bullying occurs when a social media user posts aggressive words or phrases to harass other users, and that leads to negatively affects on their mental and social well-being. Additionally, it may ruin the reputation of that media. We are considering the problem of detecting posts that are aggressive. Moreover, we try to detect Cyberbullies. In this research, we study Cyberbullying as a classification problem by combining text mining techniques, and the graph of the social network relationships based on a dataset from Twitter. We create an new dataset that has more information for each tweet (post). We improve the classification accuracy by considering the additional social network features based on the user’s follower list and retweet information

SJSU ScholarWorks

A review on Natural Language Processing Models for COVID-19 research

Author: Chang Victor
Hall Karl
Jayne Chrisina
Publication venue
Publication date: 19/07/2022
Field of study

This survey paper reviews Natural Language Processing Models and their use in COVID-19 research in two main areas. Firstly, a range of transformer-based biomedical pretrained language models are evaluated using the BLURB benchmark. Secondly, models used in sentiment analysis surrounding COVID-19 vaccination are evaluated. We filtered literature curated from various repositories such as PubMed and Scopus and reviewed 27 papers. When evaluated using the BLURB benchmark, the novel T-BPLM BioLinkBERT gives groundbreaking results by incorporating document link knowledge and hyperlinking into its pretraining. Sentiment analysis of COVID-19 vaccination through various Twitter API tools has shown the public’s sentiment towards vaccination to be mostly positive. Finally, we outline some limitations and potential solutions to drive the research community to improve the models used for NLP tasks

Teeside University's Research Repository

The role of sentiment analysis in forecasting successful ICOs

Author: Falde Ettore
Publication venue: Universitat Politècnica de Catalunya
Publication date: 18/05/2023
Field of study

I explored the potential of Sentiment Analysis (SA) infore casting successful initial coin offerings (ICOs). The aim is to determine if the SA and Twitter data alone, and in combination with TORD, a publicly available database Paul P. 2021, can evaluate the success of ICOs. Hence, I provided background information on the initial coin offering (ICO) market and cryptocurrencies, followed by a thorough literature review on SA and the main success factors of ICOs.Then, I finally presented the research project results, including the use of SA methodologies, data cleaning, graphical, and predictive analysis. Along with the conclusions with personal insights on the result

Semantic Sentiment Analysis of Microblogs

Author: Saif Hassan
Publication venue
Publication date: 22/06/2015
Field of study

Microblogs and social media platforms are now considered among the most popular forms of online communication. Through a platform like Twitter, much information reflecting people's opinions and attitudes is published and shared among users on a daily basis. This has recently brought great opportunities to companies interested in tracking and monitoring the reputation of their brands and businesses, and to policy makers and politicians to support their assessment of public opinions about their policies or political issues. A wide range of approaches to sentiment analysis on Twitter, and other similar microblogging platforms, have been recently built. Most of these approaches rely mainly on the presence of affect words or syntactic structures that explicitly and unambiguously reflect sentiment (e.g., "great'', "terrible''). However, these approaches are semantically weak, that is, they do not account for the semantics of words when detecting their sentiment in text. This is problematic since the sentiment of words, in many cases, is associated with their semantics, either along the context they occur within (e.g., "great'' is negative in the context "pain'') or the conceptual meaning associated with the words (e.g., "Ebola" is negative when its associated semantic concept is "Virus"). This thesis investigates the role of words' semantics in sentiment analysis of microblogs, aiming mainly at addressing the above problem. In particular, Twitter is used as a case study of microblogging platforms to investigate whether capturing the sentiment of words with respect to their semantics leads to more accurate sentiment analysis models on Twitter. To this end, several approaches are proposed in this thesis for extracting and incorporating two types of word semantics for sentiment analysis: contextual semantics (i.e., semantics captured from words' co-occurrences) and conceptual semantics (i.e., semantics extracted from external knowledge sources). Experiments are conducted with both types of semantics by assessing their impact in three popular sentiment analysis tasks on Twitter; entity-level sentiment analysis, tweet-level sentiment analysis and context-sensitive sentiment lexicon adaptation. Evaluation under each sentiment analysis task includes several sentiment lexicons, and up to 9 Twitter datasets of different characteristics, as well as comparing against several state-of-the-art sentiment analysis approaches widely used in the literature. The findings from this body of work demonstrate the value of using semantics in sentiment analysis on Twitter. The proposed approaches, which consider words' semantics for sentiment analysis at both, entity and tweet levels, surpass non-semantic approaches in most datasets

Using Tsetlin Machine to discover interpretable rules in natural language processing applications

Author: Goodwin Morten
Granmo Ole-Christoffer
Saha Rupsa
Publication venue: 'Wiley'
Publication date: 01/01/2021
Field of study

publishedVersio

Agder University Research Archive