753 research outputs found

    Social Emotion Mining: An Insight

    Full text link
    Emotions are an indispensable component of variety of texts present on online social media services. A lot of research has been done to detect and analyse the emotions present in text but most of them are done from the author’s perspective. This paper focuses on providing an in-depth survey of different work done in Social Emotion Mining (SEM) from reader’s perspective. It is a first attempt towards categorization of existing literature into emotion mining levels. It also highlights different models and techniques utilized by various authors in this area. Major limitations and challenges in this area of Emotion Detection and Analysis are also presented

    Fake news detection and analysis

    Get PDF
    The evolution of technology has led to the development of environments that allow instantaneous communication and dissemination of information. As a result, false news, article manipulation, lack of trust in media and information bubbles have become high-impact issues. In this context, the need for automatic tools that can classify the content as reliable or not and that can create a trustworthy environment is continually increasing. Current solutions do not entirely solve this problem as the degree of difficulty of the task is high and dependent on factors such as type of language, type of news or subject volatility. The main objective of this thesis is the exploration of this crucial problem of Natural Language Processing, namely false content detection and of how it can be solved as a classification problem with automatic learning. A linguistic approach is taken, experimenting with different types of features and models to build accurate fake news detectors. The experiments are structured in the following three main steps: text pre-processing, feature extraction and classification itself. In addition, they are conducted on a real-world dataset, LIAR, to offer a good overview of which model best overcomes day-to-day situations. Two approaches are chosen: multi-class and binary classification. In both cases, we prove that out of all the experiments, a simple feed-forward network combined with fine-tuned DistilBERT embeddings reports the highest accuracy - 27.30% on 6-labels classification and 63.61% on 2-labels classification. These results emphasize that transfer learning bring important improvements in this task. In addition, we demonstrate that classic machine learning algorithms like Decision Tree, Naïve Bayes, and Support Vector Machine act similar with the state-of-the-art solutions, even performing better than some recurrent neural networks like LSTM or BiLSTM. This clearly confirms that more complex solutions do not guarantee higher performance. Regarding features, we confirm that there is a connection between the degree of veracity of a text and the frequency of terms, more powerful than their position or order. Yet, context prove to be the most powerful aspect in the characteristic extraction process. Also, indices that describe the author's style must be carefully selected to provide relevant information

    A novel Auto-ML Framework for Sarcasm Detection

    Get PDF
    Many domains have sarcasm or verbal irony presented in the text of reviews, tweets, comments, and dialog discussions. The purpose of this research is to classify sarcasm for multiple domains using the deep learning based AutoML framework. The proposed AutoML framework has five models in the model search pipeline, these five models are the combination of convolutional neural network (CNN), Long Short-Term Memory (LSTM), deep neural network (DNN), and Bidirectional Long Short-Term Memory (BiLSTM). The hybrid combination of CNN, LSTM, and DNN models are presented as CNN-LSTM-DNN, LSTM-DNN, BiLSTM-DNN, and CNN-BiLSTM-DNN. This work has proposed the algorithms that contrast polarities between terms and phrases, which are categorized into implicit and explicit incongruity categories. The incongruity and pragmatic features like punctuation, exclamation marks, and others integrated into the AutoML DeepConcat framework models. That integration was possible when the DeepConcat AutoML framework initiate a model search pipeline for five models to achieve better performance. Conceptually, DeepConcat means that model will integrate with generalized features. It was evident that the pretrain model BiLSTM achieved a better performance of 0.98 F1 when compared with the other five model performances. Similarly, the AutoML based BiLSTM-DNN model achieved the best performance of 0.98 F1, which is better than core approaches and existing state-of-the-art Tweeter tweet dataset, Amazon reviews, and dialog discussion comments. The proposed AutoML framework has compared performance metrics F1 and AUC and discovered that F1 is better than AUC. The integration of all feature categories achieved a better performance than the individual category of pragmatic and incongruity features. This research also evaluated the performance of the dropout layer hyperparameter and it achieved better performance than the fixed percentage like 10% of dropout parameter of the AutoML based Bayesian optimization. Proposed AutoML framework DeepConcat evaluated best pretrain models BiLSTM-DNN and CNN-CNN-DNN to transfer knowledge across domains like Amazon reviews and Dialog discussion comments (text) using the last strategy, full layer, and our fade-out freezing strategies. In the transfer learning fade-out strategy outperformed the existing state-of-the-art model BiLSTM-DNN, the performance is 0.98 F1 on tweets, 0.85 F1 on Amazon reviews, and 0.87 F1 on the dialog discussion SCV2-Gen dataset. Further, all strategies with various domains can be compared for the best model selection

    Mapping the COVID-19 pandemic - The influence of map design choices by media outlets on people’s perception of the state of the pandemic

    Full text link
    Following the outbreak of the COVID-19 pandemic in 2019, numerous online newspapers took it upon themselves to inform the public on a daily basis on the spread of the virus. Given their visually appealing, informative, and comprehensive nature, maps were often used for this purpose. The Swiss media have also resorted to this form of data journalism. Numerous studies in recent years have demonstrated that maps can affect people’s perceptions and emotions. Particularly the colours used in the maps can contribute to these effects. In this thesis, it was therefore of interest to find out what emotional, perceptual, and behavioural impact these many COVID-19 maps had on readers. The study shows that especially the COVID-19 topic had significant influence on people's emotions and that warm colour scales evoked more concern and led to more cautious behaviour than cold ones. Maps are therefore powerful tools, which is why it is important to make thoughtful decisions when designing them. This is especially important when maps reach a wide public and provide information about important events. This study therefore seeks to raise awareness and, in the best case scenario, contribute to a more considered and improved map design. This thesis is divided into two parts. Firstly, it examines what the COVID-19 case rate maps published by the Swiss media looked like and how they were created. Secondly, it is being analysed which emotions, perceptions, and behaviour they generated

    A Comparison of Aesthetic and Efferent Reading Strategies of College Students

    Get PDF
    This study was designed to investigate, compare, and document the use of efferent and aesthetic reading strategies, as used by both undergraduate and graduate college students enrolled in a course in developmental reading instruction. Twenty students were individually interviewed, each student attending two separate interviews: one which focused on efferent reading and the other which focused on aesthetic reading. The students\u27 responses to the question, What is efferent/aesthetic reading comprehension? ; their comments made as they “thought-out-loud” while reading; and their identifications of efferent and aesthetic reading comprehension strategies, were analyzed and categorized according to similarities in items included in the students\u27 responses. The findings of this study indicate: 1) that reading is both an active and a transactive process; 2) that students have at their disposal a wide variety of reading comprehension strategies to help them understand a text; 3) that students are not necessarily aware of the strategies which they are employing while reading; and 4) that a reader\u27s purpose does play a role in determining which aspects of the text are brought into awareness by the reader. The data suggest that reading is a complex and individual process and therefore support the concepts of student-centered education. Additionally, because it was indicated in this study that a reader\u27s purpose plays an important role in reading comprehension, the data also support those reading programs in which the students are guided to discover the different purposes for reading different types of texts. Implications for future research include conducting similar studies with readers from a variety of age and population groups. Implications also include the development of a less verbal procedure to gain insight into the thought processes which occur, while reading, in children who have not yet reached the stage of cognitive development needed to successfully participate in the Thinking-Out-Loud procedure involved in this study

    False textual information detection, a deep learning approach

    Get PDF
    Many approaches exist for analysing fact checking for fake news identification, which is the focus of this thesis. Current approaches still perform badly on a large scale due to a lack of authority, or insufficient evidence, or in certain cases reliance on a single piece of evidence. To address the lack of evidence and the inability of models to generalise across domains, we propose a style-aware model for detecting false information and improving existing performance. We discovered that our model was effective at detecting false information when we evaluated its generalisation ability using news articles and Twitter corpora. We then propose to improve fact checking performance by incorporating warrants. We developed a highly efficient prediction model based on the results and demonstrated that incorporating is beneficial for fact checking. Due to a lack of external warrant data, we develop a novel model for generating warrants that aid in determining the credibility of a claim. The results indicate that when a pre-trained language model is combined with a multi-agent model, high-quality, diverse warrants are generated that contribute to task performance improvement. To resolve a biased opinion and making rational judgments, we propose a model that can generate multiple perspectives on the claim. Experiments confirm that our Perspectives Generation model allows for the generation of diverse perspectives with a higher degree of quality and diversity than any other baseline model. Additionally, we propose to improve the model's detection capability by generating an explainable alternative factual claim assisting the reader in identifying subtle issues that result in factual errors. The examination demonstrates that it does indeed increase the veracity of the claim. Finally, current research has focused on stance detection and fact checking separately, we propose a unified model that integrates both tasks. Classification results demonstrate that our proposed model outperforms state-of-the-art methods

    Predictive modeling of human placement decisions in an English Writing Placement Test

    Get PDF
    Writing is an important component in standardized tests that are utilized for admission decisions, class placement, and academic or professional development. Placement results of the EPT Writing Test at the undergraduate level are used to determine whether international students meet English requirements for writing skills (i.e., Pass); and to direct students to appropriate ESL writing classes (i.e., 101B or 101C). Practical constraints during evaluation processes in the English Writing Placement Test (the EPT Writing Test) at Iowa State University, such as rater disagreement, rater turnover, and heavy administrative workload, have demonstrated the necessity to develop valid scoring models for an automated writing evaluation tool. Statistical algorithms of the scoring engines were essential to predict human raters\u27 quality judgments of EPT essays in the future. Furthermore, in measuring L2 writing performance, previous research has heavily focused on writer-oriented text features in students\u27 writing performance, rather than reader-oriented linguistic features that were influential to human raters for making quality judgments. To address the practical concerns of the EPT Writing Test and the existing gap in the literature, the current project aimed at developing a predictive model that best defines human placement decisions in the EPT Writing Test. A two-phase multistage mixed-methods design was adopted in this study within a model-specification phase and in interconnection with model-specification and model-construction phases. In the model-specification phase, results of a Multifaceted-Rasch-Measurement (MFRM) analysis allowed for selection of five EPT expert raters that represented rating severity levels. Concurrent think-aloud protocols provided by the five participants while evaluating EPT sample essays were analyzed qualitatively to identify text features to which raters attended. Based on the qualitative findings, 52 evaluative variables and metrics were generated. Among the 52 variables, 36 variables were chosen to be analyzed in the whole EPT essay corpus. After that, a corpus-based analysis of 297 EPT essays in terms of 37 text features was conducted to obtain quantitative data on the 36 variables in the model-construction phase. Principal Component Analysis (PCA) helped extract seven principal components (PCs). Results of MANOVA and one-way ANOVA tests revealed 17 original variables and six PCs that significantly differentiated the three EPT placement levels (i.e., 101B, 101C, and Pass). A profile analysis suggested that the lowest level (101B) and the highest level (Pass) seemed to have distinct profiles in terms of text features. Test takers placed in 101C classes were likely to be characterized as an average group. Like 101B students, 101C students appeared to have some linguistic problems. However, students in 101C classes and those who passed the test similarly demonstrated an ability to develop an essay. In the model-construction phase, random forests (Breiman, 2001) were deployed as a data mining technique to define predictive models of human raters\u27 placement decisions in different task types. Results of the random forests indicated that fragments, part-of-speech-related errors and PC2 (clear organization but limited paragraph development) were significant predictors of the 101B level, and PC6 (academic word use) of the Pass level. The generic classifier on the 17 original variables was seemingly the best model that could perfectly predict the training data set (0% error) and successfully forecast the test set (8% error). Differences in prediction performance between the generic and task specific models were negligible. Results of this project provided little evidence of generalizability of the predictive models in classifying new EPT essays. However, within-class examinations showed that the best classifier could recognize the highest and lowest essays, but crossover cases existed at the adjacent levels. Implications of the project for placement assessment purposes, pedagogical practices in ESL writing courses and automated essay scoring (AES) development for the EPT Writing Test are brought into the discussion

    On the Recognition of Emotion from Physiological Data

    Get PDF
    This work encompasses several objectives, but is primarily concerned with an experiment where 33 participants were shown 32 slides in order to create ‗weakly induced emotions‘. Recordings of the participants‘ physiological state were taken as well as a self report of their emotional state. We then used an assortment of classifiers to predict emotional state from the recorded physiological signals, a process known as Physiological Pattern Recognition (PPR). We investigated techniques for recording, processing and extracting features from six different physiological signals: Electrocardiogram (ECG), Blood Volume Pulse (BVP), Galvanic Skin Response (GSR), Electromyography (EMG), for the corrugator muscle, skin temperature for the finger and respiratory rate. Improvements to the state of PPR emotion detection were made by allowing for 9 different weakly induced emotional states to be detected at nearly 65% accuracy. This is an improvement in the number of states readily detectable. The work presents many investigations into numerical feature extraction from physiological signals and has a chapter dedicated to collating and trialing facial electromyography techniques. There is also a hardware device we created to collect participant self reported emotional states which showed several improvements to experimental procedure
    corecore