596 research outputs found

    Classification of the Stance in Online Debates Using the Dependency Relations Feature

    Get PDF
    Online discussion forums offer Internet users a medium for discussions about current political debates. The debate is a system of claims regarding interactivity and representation. Users make claims in an online discussion with superior content to support their position. Factual accuracy and emotional appeal are critical attributes used to convince readers. A key challenge in debate forums is to identify the participants’ stance, each of which is inter-dependent and inter-connected. This research work aims to construct a classifier that takes the linguistic features of the posts as input and outputs predictions for the stance label of each post. Three types of features which include Lexical, Dependency, and Morphology are used to detect the stance of the posts. Lexical features such as cue words are employed as surface features, and deep features include dependency and morphology features. Multinomial Naïve Bayes classifier is used to build a model for classifying stance and the Chi-Square method is used to select the good feature set. The performance of the stance classification system is evaluated in terms of accuracy. The result of stance labels for this proposed research represents as for and against by analyzing the surface and deep features that capture the content of a post

    Argument Mining on Italian News Blogs

    Get PDF
    International audienceThe goal of argument mining is to extract structured information, namely the arguments and their relations, from un-structured text. In this paper, we propose an approach to argument relation prediction based on supervised learning of linguistic and semantic features of the text. We test our method on the CorEA corpus of user comments to online newspaper articles, evaluating our system's performances in assigning the correct relation, i.e., support or attack, to pairs of arguments. We obtain results consistently better than a sentiment analysis-based base-line (over two out three correctly classified pairs), and we observe that sentiment and lexical semantics are the most informative features with respect to the relation prediction task.L'estrazione automatica di argomenti ha come scopo recuperare informazione strutturata, in particolare gli argomenti e le loro relazioni, a partire da testo semplice. In questo con-tributo proponiamo un metodo di predizione delle relazioni tra argomenti basato sull'apprendimento supervisionato di feature linguistiche e semantiche del testo. Il metodò e testato sul corpus di commenti di news CorEA, edèed`edè valutata la capacità del sistema di classificare le relazioni di supporto ed attacco tra coppie di argomenti. I risultati ottenuti sono superiori ad una baseline basata sulla sola analisi del sentimento (oltre due coppie di argomenti su trè e classificata correttamente) ed osserviamo che il sentimento e la semantica lessicale sono gli indicatoripì u informativi per la predizione delle relazioni tra argomenti

    Representation and learning schemes for argument stance mining.

    Get PDF
    Argumentation is a key part of human interaction. Used introspectively, it searches for the truth, by laying down argument for and against positions. As a mediation tool, it can be used to search for compromise between multiple human agents. For this purpose, theories of argumentation have been in development since the Ancient Greeks in order to formalise the process and therefore remove the human imprecision from it. From this practice the process of argument mining has emerged. As human interaction has moved from the small scale of one-to-one (or few-to-few) debates to large scale discussions where tens of thousands of participants can express their opinion in real time, the importance of argument mining has grown while its feasibility in a manual annotation setting has diminished and relied mainly on a human-defined heuristics to process the data. This underlines the importance of a new generation of computational tools that can automate this process on a larger scale. In this thesis we study argument stance detection, one of the steps involved in the argument mining workflow. We demonstrate how we can use data of varying reliability in order to mine argument stance in social media data. We investigate a spectrum of techniques, from completely unsupervised classification of stance using a sentiment lexicon, automated computation of a regularised stance lexicon, automated computation of a lexicon with modifiers, and the use of a lexicon with modifiers as a temporal feature model for more complex classification algorithms. We find that the addition of contextual information enhances unsupervised stance classification, within reason, and that multi-strategy algorithms that combine multiple heuristics by ordering them from the precise to the general tend to outperform other approaches by a large margin. Focusing then on building a stance lexicon, we find that optimising such lexicons using an empirical risk minimisation framework allows us to regularise them to a higher degree than competing probabilistic techniques, which helps us learn better lexicons from noisy data. We also conclude that adding local context (neighbouring words) information during the learning phase of the lexicons tends to produce more accurate results at the cost of robustness, since part of the weights is distributed from the words with a class valence to the contextual words. Finally, when investigating the use of lexicons to build feature models for traditional machine learning techniques, simple lexicons (without context) seem to perform overall as well as more complex ones, and better than purely semantic representations. We also find that word-level feature models tend to outperform sentence and instance-level representations, but that they do not benefit as much from being augmented by lexicon knowledge.This research programme was carried out in collaboration with the University of Glasgow, Department of Computer Science

    Developing natural language processing instruments to study sociotechnical systems

    Get PDF
    Identifying temporal linguistic patterns and tracing social amplification across communities has always been vital to understanding modern sociotechnical systems. Now, well into the age of information technology, the growing digitization of text archives powered by machine learning systems has enabled an enormous number of interdisciplinary studies to examine the coevolution of language and culture. However, most research in that domain investigates formal textual records, such as books and newspapers. In this work, I argue that the study of conversational text derived from social media is just as important. I present four case studies to identify and investigate societal developments in longitudinal social media streams with high temporal resolution spanning over 100 languages. These case studies show how everyday conversations on social media encode a unique perspective that is often complementary to observations derived from more formal texts. This unique perspective improves our understanding of modern sociotechnical systems and enables future research in computational linguistics, social science, and behavioral science

    Social Media Analysis for Social Good

    Get PDF
    Data on social media is abundant and offers valuable information that can be utilised for a range of purposes. Users share their experiences and opinions on various topics, ranging from their personal life to the community and the world, in real-time. In comparison to conventional data sources, social media is cost-effective to obtain, is up-to-date and reaches a larger audience. By analysing this rich data source, it can contribute to solving societal issues and promote social impact in an equitable manner. In this thesis, I present my research in exploring innovative applications using \ac{NLP} and machine learning to identify patterns and extract actionable insights from social media data to ultimately make a positive impact on society. First, I evaluate the impact of an intervention program aimed at promoting inclusive and equitable learning opportunities for underrepresented communities using social media data. Second, I develop EmoBERT, an emotion-based variant of the BERT model, for detecting fine-grained emotions to gauge the well-being of a population during significant disease outbreaks. Third, to improve public health surveillance on social media, I demonstrate how emotions expressed in social media posts can be incorporated into health mention classification using an intermediate task fine-tuning and multi-feature fusion approach. I also propose a multi-task learning framework to model the literal meanings of disease and symptom words to enhance the classification of health mentions. Fourth, I create a new health mention dataset to address the imbalance in health data availability between developing and developed countries, providing a benchmark alternative to the traditional standards used in digital health research. Finally, I leverage the power of pretrained language models to analyse religious activities, recognised as social determinants of health, during disease outbreaks

    Expressions of psychological stress on Twitter: detection and characterisation

    Get PDF
    A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Long-term psychological stress is a significant predictive factor for individual mental health and short-term stress is a useful indicator of an immediate problem. Traditional psychology studies have relied on surveys to understand reasons for stress in general and in specific contexts. The popularity and ubiquity of social media make it a potential data source for identifying and characterising aspects of stress. Previous studies of stress in social media have focused on users responding to stressful personal life events. Prior social media research has not explored expressions of stress in other important domains, however, including travel and politics. This thesis detects and analyses expressions of psychological stress in social media. So far, TensiStrength is the only existing lexicon for stress and relaxation scores in social media. Using a word-vector based word sense disambiguation method, the TensiStrength lexicon was modified to include the stress scores of the different senses of the same word. On a dataset of 1000 tweets containing ambiguous stress-related words, the accuracy of the modified TensiStrength increased by 4.3%. This thesis also finds and reports characteristics of a multiple-domain stress dataset of 12000 tweets, 3000 each for airlines, personal events, UK politics, and London traffic. A two-step method for identifying stressors in tweets was implemented. The first step used LDA topic modelling and k-means clustering to find a set of types of stressors (e.g., delay, accident). Second, three word-vector based methods - maximum-word similarity, context-vector similarity, and cluster-vector similarity - were used to detect the stressors in each tweet. The cluster vector similarity method was found to identify the stressors in tweets in all four domains better than machine learning classifiers, based on the performance metrics of accuracy, precision, recall, and f-measure. Swearing and sarcasm were also analysed in high-stress and no-stress datasets from the four domains using a Convolutional Neural Network and Multilayer Perceptron, respectively. The presence of swearing and sarcasm was higher in the high-stress tweets compared to no-stress tweets in all the domains. The stressors in each domain with higher percentages of swearing or sarcasm were identified. Furthermore, the distribution of the temporal classes (past, present, future, and atemporal) in high-stress tweets was found using an ensemble classifier. The distribution depended on the domain and the stressors. This study contributes a modified and improved lexicon for the identification of stress scores in social media texts. The two-step method to identify stressors follows a general framework that can be used for domains other than those which were studied. The presence of swearing, sarcasm, and the temporal classes of high-stress tweets belonging to different domains are found and compared to the findings from traditional psychology, for the first time. The algorithms and knowledge may be useful for travel, political, and personal life systems that need to identify stressful events in order to take appropriate action.European Union's Horizon 2020 research and innovation programme under grant agreement No 636160-2, the Optimum project (www.optimumproject.eu)
    • …
    corecore