8,167 research outputs found
Multilingual Cross-domain Perspectives on Online Hate Speech
In this report, we present a study of eight corpora of online hate speech, by
demonstrating the NLP techniques that we used to collect and analyze the
jihadist, extremist, racist, and sexist content. Analysis of the multilingual
corpora shows that the different contexts share certain characteristics in
their hateful rhetoric. To expose the main features, we have focused on text
classification, text profiling, keyword and collocation extraction, along with
manual annotation and qualitative study.Comment: 24 page
Annotating for Hate Speech: The MaNeCo Corpus and Some Input from Critical Discourse Analysis
This paper presents a novel scheme for the annotation of hate speech in
corpora of Web 2.0 commentary. The proposed scheme is motivated by the critical
analysis of posts made in reaction to news reports on the Mediterranean
migration crisis and LGBTIQ+ matters in Malta, which was conducted under the
auspices of the EU-funded C.O.N.T.A.C.T. project. Based on the realization that
hate speech is not a clear-cut category to begin with, appears to belong to a
continuum of discriminatory discourse and is often realized through the use of
indirect linguistic means, it is argued that annotation schemes for its
detection should refrain from directly including the label 'hate speech,' as
different annotators might have different thresholds as to what constitutes
hate speech and what not. In view of this, we suggest a multi-layer annotation
scheme, which is pilot-tested against a binary +/- hate speech classification
and appears to yield higher inter-annotator agreement. Motivating the
postulation of our scheme, we then present the MaNeCo corpus on which it will
eventually be used; a substantial corpus of on-line newspaper comments spanning
10 years.Comment: 10 pages, 1 table. Appears in Proceedings of the 12th edition of the
Language Resources and Evaluation Conference (LREC'20
Argumentation Mining in User-Generated Web Discourse
The goal of argumentation mining, an evolving research field in computational
linguistics, is to design methods capable of analyzing people's argumentation.
In this article, we go beyond the state of the art in several ways. (i) We deal
with actual Web data and take up the challenges given by the variety of
registers, multiple domains, and unrestricted noisy user-generated Web
discourse. (ii) We bridge the gap between normative argumentation theories and
argumentation phenomena encountered in actual data by adapting an argumentation
model tested in an extensive annotation study. (iii) We create a new gold
standard corpus (90k tokens in 340 documents) and experiment with several
machine learning methods to identify argument components. We offer the data,
source codes, and annotation guidelines to the community under free licenses.
Our findings show that argumentation mining in user-generated Web discourse is
a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in
User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17
A document management methodology based on similarity contents
The advent of the WWW and distributed information systems have made it possible to share documents between different users and organisations. However, this has created many problems related to the security, accessibility, right and most importantly the consistency of documents. It is important that the people involved in the documents management process have access to the most up-to-date version of documents, retrieve the correct documents and should be able to update the documents repository in such a way that his or her document are known to others. In this paper we propose a method for organising, storing and retrieving documents based on similarity contents. The method uses techniques based on information retrieval, document indexation and term extraction and indexing. This methodology is developed for the E-Cognos project which aims at developing tools for the management and sharing of documents in the construction domain
Logging Stress and Anxiety Using a Gamified Mobile-based EMA Application, and Emotion Recognition Using a Personalized Machine Learning Approach
According to American Psychological Association (APA) more than 9 in 10 (94 percent) adults believe that stress can contribute to the development of major health problems, such as heart disease, depression, and obesity. Due to the subjective nature of stress, and anxiety, it has been demanding to measure these psychological issues accurately by only relying on objective means. In recent years, researchers have increasingly utilized computer vision techniques and machine learning algorithms to develop scalable and accessible solutions for remote mental health monitoring via web and mobile applications. To further enhance accuracy in the field of digital health and precision diagnostics, there is a need for personalized machine-learning approaches that focus on recognizing mental states based on individual characteristics, rather than relying solely on general-purpose solutions.
This thesis focuses on conducting experiments aimed at recognizing and assessing levels of stress and anxiety in participants. In the initial phase of the study, a mobile application with broad applicability (compatible with both Android and iPhone platforms) is introduced (we called it STAND). This application serves the purpose of Ecological Momentary Assessment (EMA). Participants receive daily notifications through this smartphone-based app, which redirects them to a screen consisting of three components. These components include a question that prompts participants to indicate their current levels of stress and anxiety, a rating scale ranging from 1 to 10 for quantifying their response, and the ability to capture a selfie. The responses to the stress and anxiety questions, along with the corresponding selfie photographs, are then analyzed on an individual basis. This analysis focuses on exploring the relationships between self-reported stress and anxiety levels and potential facial expressions indicative of stress and anxiety, eye features such as pupil size variation and eye closure, and specific action units (AUs) observed in the frames over time. In addition to its primary functions, the mobile app also gathers sensor data, including accelerometer and gyroscope readings, on a daily basis. This data holds potential for further analysis related to stress and anxiety. Furthermore, apart from capturing selfie photographs, participants have the option to upload video recordings of themselves while engaging in two neuropsychological games. These recorded videos are then subjected to analysis in order to extract pertinent features that can be utilized for binary classification of stress and anxiety (i.e., stress and anxiety recognition). The participants that will be selected for this phase are students aged between 18 and 38, who have received recent clinical diagnoses indicating specific stress and anxiety levels. In order to enhance user engagement in the intervention, gamified elements - an emerging trend to influence user behavior and lifestyle - has been utilized. Incorporating gamified elements into non-game contexts (e.g., health-related) has gained overwhelming popularity during the last few years which has made the interventions more delightful, engaging, and motivating.
In the subsequent phase of this research, we conducted an AI experiment employing a personalized machine learning approach to perform emotion recognition on an established dataset called Emognition. This experiment served as a simulation of the future analysis that will be conducted as part of a more comprehensive study focusing on stress and anxiety recognition. The outcomes of the emotion recognition experiment in this study highlight the effectiveness of personalized machine learning techniques and bear significance for the development of future diagnostic endeavors. For training purposes, we selected three models, namely KNN, Random Forest, and MLP. The preliminary performance accuracy results for the experiment were 93%, 95%, and 87% respectively for these models
Application of Common Sense Computing for the Development of a Novel Knowledge-Based Opinion Mining Engine
The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, web communities, blogs, wikis and other online collaborative media. The distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand, or organisation. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions, in fact, involves a deep understanding of natural language text by machines, from which we are still very far.
Hitherto, online information retrieval has been mainly based on algorithms relying on the textual representation of web-pages. Such algorithms are very good at retrieving texts, splitting them into parts, checking the spelling and counting their words. But when it comes to interpreting sentences and extracting meaningful information, their capabilities are known to be very limited. Existing approaches to opinion mining and sentiment analysis, in particular, can be grouped into three main categories: keyword spotting, in which text is classified into categories based on the presence of fairly unambiguous affect words; lexical affinity, which assigns arbitrary words a probabilistic affinity for a particular emotion; statistical methods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Early works aimed to classify entire documents as containing overall positive or negative polarity, or rating scores of reviews.
Such systems were mainly based on supervised approaches relying on manually labelled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor they are limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present across the span of a document. In more recent works, text analysis granularity has been taken down to segment and sentence level, e.g., by using presence of opinion-bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainly rely on knowledge bases that are still too limited to efficiently process text at sentence level.
In this thesis, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural language data and the concept-level opinions conveyed by these. In particular, the ensemble application of graph mining and multi-dimensionality reduction techniques on two common sense knowledge bases was exploited to develop a novel intelligent engine for open-domain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional information associated with natural language opinions and, hence, a more efficient passage from (unstructured) textual information to (structured) machine-processable data.
The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database and a PatientOpinion dataset, and its performance compared both with results obtained using standard sentiment analysis techniques and using different state-of-the-art knowledge bases such as Princeton’s WordNet, MIT’s ConceptNet and Microsoft’s Probase. Differently from most currently available opinion mining services, the developed engine does not base its analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence conveyed by these. This allows the engine to be domain-independent and, hence, to be embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI and e-health. Looking ahead, the combined novel use of different knowledge bases and of common sense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies and learning from experience
Application of Common Sense Computing for the Development of a Novel Knowledge-Based Opinion Mining Engine
The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, web communities, blogs, wikis and other online collaborative media. The distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand, or organisation. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions, in fact, involves a deep understanding of natural language text by machines, from which we are still very far.
Hitherto, online information retrieval has been mainly based on algorithms relying on the textual representation of web-pages. Such algorithms are very good at retrieving texts, splitting them into parts, checking the spelling and counting their words. But when it comes to interpreting sentences and extracting meaningful information, their capabilities are known to be very limited. Existing approaches to opinion mining and sentiment analysis, in particular, can be grouped into three main categories: keyword spotting, in which text is classified into categories based on the presence of fairly unambiguous affect words; lexical affinity, which assigns arbitrary words a probabilistic affinity for a particular emotion; statistical methods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Early works aimed to classify entire documents as containing overall positive or negative polarity, or rating scores of reviews.
Such systems were mainly based on supervised approaches relying on manually labelled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor they are limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present across the span of a document. In more recent works, text analysis granularity has been taken down to segment and sentence level, e.g., by using presence of opinion-bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainly rely on knowledge bases that are still too limited to efficiently process text at sentence level.
In this thesis, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural language data and the concept-level opinions conveyed by these. In particular, the ensemble application of graph mining and multi-dimensionality reduction techniques on two common sense knowledge bases was exploited to develop a novel intelligent engine for open-domain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional information associated with natural language opinions and, hence, a more efficient passage from (unstructured) textual information to (structured) machine-processable data.
The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database and a PatientOpinion dataset, and its performance compared both with results obtained using standard sentiment analysis techniques and using different state-of-the-art knowledge bases such as Princeton’s WordNet, MIT’s ConceptNet and Microsoft’s Probase. Differently from most currently available opinion mining services, the developed engine does not base its analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence conveyed by these. This allows the engine to be domain-independent and, hence, to be embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI and e-health. Looking ahead, the combined novel use of different knowledge bases and of common sense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies and learning from experience
Recommended from our members
Semantic Sentiment Analysis of Microblogs
Microblogs and social media platforms are now considered among the most popular forms of online communication. Through a platform like Twitter, much information reflecting people's opinions and attitudes is published and shared among users on a daily basis. This has recently brought great opportunities to companies interested in tracking and monitoring the reputation of their brands and businesses, and to policy makers and politicians to support their assessment of public opinions about their policies or political issues.
A wide range of approaches to sentiment analysis on Twitter, and other similar microblogging platforms, have been recently built. Most of these approaches rely mainly on the presence of affect words or syntactic structures that explicitly and unambiguously reflect sentiment (e.g., "great'', "terrible''). However, these approaches are semantically weak, that is, they do not account for the semantics of words when detecting their sentiment in text. This is problematic since the sentiment of words, in many cases, is associated with their semantics, either along the context they occur within (e.g., "great'' is negative in the context "pain'') or the conceptual meaning associated with the words (e.g., "Ebola" is negative when its associated semantic concept is "Virus").
This thesis investigates the role of words' semantics in sentiment analysis of microblogs, aiming mainly at addressing the above problem. In particular, Twitter is used as a case study of microblogging platforms to investigate whether capturing the sentiment of words with respect to their semantics leads to more accurate sentiment analysis models on Twitter. To this end, several approaches are proposed in this thesis for extracting and incorporating two types of word semantics for sentiment analysis: contextual semantics (i.e., semantics captured from words' co-occurrences) and conceptual semantics (i.e., semantics extracted from external knowledge sources).
Experiments are conducted with both types of semantics by assessing their impact in three popular sentiment analysis tasks on Twitter; entity-level sentiment analysis, tweet-level sentiment analysis and context-sensitive sentiment lexicon adaptation. Evaluation under each sentiment analysis task includes several sentiment lexicons, and up to 9 Twitter datasets of different characteristics, as well as comparing against several state-of-the-art sentiment analysis approaches widely used in the literature.
The findings from this body of work demonstrate the value of using semantics in sentiment analysis on Twitter. The proposed approaches, which consider words' semantics for sentiment analysis at both, entity and tweet levels, surpass non-semantic approaches in most datasets
- …