Search CORE

169 research outputs found

Extending the EmotiNet Knowledge Base to Improve the Automatic Detection of Implicitly Expressed Emotions from Text

Author: BALAHUR DOBRESCU ALEXANDRA
Publication venue: European Language Resources Association
Publication date: 01/02/2012
Field of study

Sentiment analysis is one of the recent, highly dynamic fields in Natural Language Processing. Most existing approaches are based on word-level analysis of texts and are mostly able to detect only explicit expressions of sentiment. However, in many cases, emotions are not expressed by using words with an affective meaning (e.g. happy), but by describing real-life situations, which readers (based on their commonsense knowledge) detect as being related to a specic emotion. Given the challenges of detecting emotions from contexts in which no lexical clue is present, in this article we present a comparative analysis between the performance of well-established methods for emotion detection (supervised and lexical knowledge-based) and a method we propose and extend, which is based on commonsense knowledge stored in the EmotiNet knowledge base. Our extensive evaluations show that, in the context of this task, the approach based on EmotiNet is the most appropriate.JRC.G.2-Global security and crisis managemen

JRC Publications Repository

Kомплексний порівняльний контент-аналіз промов жінок-лідерів (2009– 2013) (Comprehensive content analysis of the speeches of female leaders (2009-2013)

Author: Балагур О. (O. Balahur)
Publication venue: Видавництво Національного університету «Острозька академія»
Publication date: 01/01/2017
Field of study

Тези присвячено комплексному контент-аналізу політичного дискурсу Державного секретаря США Гілларі Родем Клінтон, канцлеру Німеччини Ангели Меркель та прем’єр- міністру Австралії Джулії Ейлін Гіллард (2009–2013), здійснено класифікацію та аналіз термінологічної наповненості промов, висвітлено стилістичні особливості політичного дискурсу. (The research is devoted to the structural and lexical analysis of the political discourse of the Secretary of State Hillary Rodham Clinton, the Chancellor Angela Merkel and Prime-minister of Australia Julia Eileen Gillard (2009–2013). It deals with the classification and analysis of the terminology of speeches, stylistic peculiarities of political discourse were distinguished.

Цифровий архів Острозької академії (Digital Repository of Ostroh Academy)

IEST: WASSA-2018 Implicit Emotions Shared Task

Author: Balahur Alexandra
De Clercq Orphée
Klinger Roman
Mohammad Saif M.
Publication venue
Publication date: 01/01/2018
Field of study

Past shared tasks on emotions use data with both overt expressions of emotions (I am so happy to see you!) as well as subtle expressions where the emotions have to be inferred, for instance from event descriptions. Further, most datasets do not focus on the cause or the stimulus of the emotion. Here, for the first time, we propose a shared task where systems have to predict the emotions in a large automatically labeled dataset of tweets without access to words denoting emotions. Based on this intention, we call this the Implicit Emotion Shared Task (IEST) because the systems have to infer the emotion mostly from the context. Every tweet has an occurrence of an explicit emotion word that is masked. The tweets are collected in a manner such that they are likely to include a description of the cause of the emotion - the stimulus. Altogether, 30 teams submitted results which range from macro F1 scores of 21 % to 71 %. The baseline (MaxEnt bag of words and bigrams) obtains an F1 score of 60 % which was available to the participants during the development phase. A study with human annotators suggests that automatic methods outperform human predictions, possibly by honing into subtle textual clues not used by humans. Corpora, resources, and results are available at the shared task website at http://implicitemotions.wassa2018.com.Comment: Accepted at Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysi

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Detecting Event-Related Links and Sentiments from Social Media Texts

Author: BALAHUR DOBRESCU ALEXANDRA
TANEV Hristo
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 22/05/2013
Field of study

Nowadays, the importance of Social Media is constantly growing, as people often use such platforms to share mainstream media news and comment on the events that they relate to. As such, people no loger remain mere spectators to the events that happen in the world, but become part of them, commenting on their developments and the entities involved, sharing their opinions and distributing related content. This paper describes a system that links the main events detected from clusters of newspaper articles to tweets related to them, detects complementary information sources from the links they contain and subsequently applies sentiment analysis to classify them into positive, negative and neutral. In this manner, readers can follow the main events happening in the world, both from the perspective of mainstream as well as social media and the public's perception on them. This system is part of a media monitoring framework working live and it will be demonstrated using Google Earth.JRC.G.2-Global security and crisis managemen

JRC Publications Repository

Improving Sentiment Analysis over non-English Tweets using Multilingual Transformers and Automatic Translation for Data-Augmentation

Author: Balahur Alexandra
Barriere Valentin
Publication venue
Publication date: 01/01/2020
Field of study

Tweets are specific text data when compared to general text. Although sentiment analysis over tweets has become very popular in the last decade for English, it is still difficult to find huge annotated corpora for non-English languages. The recent rise of the transformer models in Natural Language Processing allows to achieve unparalleled performances in many tasks, but these models need a consequent quantity of text to adapt to the tweet domain. We propose the use of a multilingual transformer model, that we pre-train over English tweets and apply data-augmentation using automatic translation to adapt the model to non-English languages. Our experiments in French, Spanish, German and Italian suggest that the proposed technique is an efficient way to improve the results of the transformers over small corpora of tweets in a non-English language.Comment: Accepted to COLING202

arXiv.org e-Print Archive

Crossref

Definición de disparador de emoción asociado a la cultura y aplicación a la clasificación de la valencia y la emoción en textos

Author: Balahur Dobrescu Alexandra
Montoyo Andres
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2008
Field of study

Este artículo presenta un método de identificación y clasificación de la valencia y las emociones presentes en un texto. Para ello, se introduce un nuevo concepto denominado disparador de emoción. Inicialmente, se construye de forma incremental una base de datos léxica de disparadores de emoción asociados a la cultura con la que se quiere trabajar, basándose en tres teorías diferentes: la Teoría de la Relevancia de Pragmática, la Teoría de la Motivación de Maslow de Psicología y la Teoría de Necesidades de Neef de Economía. La base de datos creada parte de un conjunto inicial de términos y es ampliada con la información de otros recursos léxicos, como WordNet, NomLex y dominios relevantes. El enlace entre idiomas se hace por medio de EuroWordNet y se completa y adapta a diversas culturas con bases de conocimiento específicas para cada lengua. También, se demuestra cómo la base de datos construida puede ser utilizada para buscar en textos la valencia (polaridad) y el significado afectivo. Finalmente, se evalúa el método utilizando los datos de prueba de la tarea nº 14 de Semeval “Texto afectivo” y su traducción al español. Los resultados y las mejoras se presentan junto con una discusión en la que se tratan los puntos fuertes y débiles del método y las directrices para el trabajo futuro.This paper presents a method to automatically spot and classify the valence and emotions present in written text, based on a concept we introduced - of emotion triggers. The first step consists of incrementally building a culture dependent lexical database of emotion triggers, emerging from the theory of relevance from pragmatics, Maslow´s theory of human needs from psychology and Neef´s theory of human needs in economics. We start from a core of terms and expand them using lexical resources such as WordNet, completed by NomLex, sense number disambiguated using the Relevant Domains concept. The mapping among languages is accomplished using EuroWordNet and the completion and projection to different cultures is done through language-specific commonsense knowledge bases. Subsequently, we show the manner in which the constructed database can be used to mine texts for valence (polarity) and affective meaning. An evaluation is performed on the Semeval Task No. 14: Affective Text test data and their corresponding translation to Spanish. The results and improvements are presented together with an argument on the strong and weak points of the method and the directions for future work

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Going beyond traditional QA systems: challenges and keys in opinion question answering

Author: Balahur Dobrescu Alexandra
Boldrini Ester
Martínez-Barco Patricio
Montoyo Andres
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

The treatment of factual data has been widely studied in different areas of Natural Language Processing (NLP). However, processing subjective information still poses important challenges. This paper presents research aimed at assessing techniques that have been suggested as appropriate in the context of subjective - Opinion Question Answering (OQA). We evaluate the performance of an OQA with these new components and propose methods to optimally tackle the issues encountered. We assess the impact of including additional resources and processes with the purpose of improving the system performance on two distinct blog datasets. The improvements obtained for the different combination of tools are statistically significant. We thus conclude that the proposed approach is adequate for the OQA task, offering a good strategy to deal with opinionated questions.This paper has been partially supported by Ministerio de Ciencia e Innovación - Spanish Government (grant no. TIN2009-13391-C04-01), and Conselleria d'Educación - Generalitat Valenciana (grant no. PROMETEO/2009/119 and ACOMP/2010/286)

Repositorio Institucional de la Universidad de Alicante

CiteSeerX

Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Author: Balahur Alexandra
Carlo Strapparava Carlo
De Clercq Orphée
Hoste Veronique
Klinger Roman
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Ghent University Academic Bibliography

Identifying subjective statements in news titles using a personal sense annotation framework

Author: Balahur Dobrescu
Bhowmick
Ding
Dumais
Gamallo
Kira
Leontev
Navigli
Pang
Popescu
Scherer
Wiebe
Wiebe
Witten
Publication venue: 'Wiley'
Publication date: 01/01/2013
Field of study

This is the accepted version of the following article: Panicheva, P.; Cardiff, J.; Rosso, P. (2013). Identifying subjective statements in news titles using a personal sense annotation framework. Journal of the American Society for Information Science and Technology. 64(7):1411-1422 , which has been published in final form at http://dx.doi.org/10.1002/asi.22841.[EN] Subjective language contains information about private states. The goal of subjective language identification is to determine that a private state is expressed, without considering its polarity or specific emotion. A component of word meaning, "Personal Sense," has clear potential in the field of subjective language identification, as it reflects a meaning of words in terms of unique personal experience and carries personal characteristics. In this paper we investigate how Personal Sense can be harnessed for the purpose of identifying subjectivity in news titles. In the process, we develop a new Personal Sense annotation framework for annotating and classifying subjectivity, polarity, and emotion. The Personal Sense framework yields high performance in a fine-grained subsentence subjectivity classification. Our experiments demonstrate lexico-syntactic features to be useful for the identification of subjectivity indicators and the targets that receive the subjective Personal Sense.The work of Paolo Rosso was done within the EC WIQEI IRSES project (grant no. 269180) FP 7 Marie Curie People Framework, the MICINN Text-Enterprise 2.0 project (TIN2009-13391-C04-03) Plan I+D+I, and the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems. We are grateful to the anonymous reviewers for helpful comments.Panicheva, P.; Cardiff, J.; Rosso, P. (2013). Identifying subjective statements in news titles using a personal sense annotation framework. Journal of the American Society for Information Science and Technology. 64(7):1411-1422. https://doi.org/10.1002/asi.22841S1411142264

Mapping Nanomedicine Terminology in the Regulatory Landscape

Author: BALAHUR-DOBRESCU ALEXANDRA
BREMER SUSANNE
GOTTARDO STEFANIA
JOANNY GERALDINE
QUIROS PESUDO LAIA
RASMUSSEN KIRSTEN
WAGNER GERHARD
Publication venue: 'Publications Office of the European Union'
Publication date: 20/06/2018
Field of study

A common terminology is essential in any field of science and technology for a mutual understanding among different communities of experts and regulators, harmonisation of policy actions, standardisation of quality procedures and experimental testing, and the communication to the general public. It also allows effective revision of information for policy making and optimises research fund allocation. In particular, in emerging scientific fields with a high innovation potential, new terms, descriptions and definitions are quickly generated, which are then ambiguously used by stakeholders having diverse interests, coming from different scientific disciplines and/or from various regions. The application of nanotechnology in health -often called nanomedicine- is considered as such emerging and multidisciplinary field with a growing interest of various communities. In order to support a better understanding of terms used in the regulatory domain, the Nanomedicines Working Group of the International Pharmaceutical Regulators Forum (IPRF) has prioritised the need to map, compile and discuss the currently used terminology of regulatory scientists coming from different geographic areas. The JRC has taken the lead to identify and compile frequently used terms in the field by using web crawling and text mining tools as well as the manual extraction of terms. Websites of 13 regulatory authorities and clinical trial registries globally involved in regulating nanomedicines have been crawled. The compilation and analysis of extracted terms demonstrated sectorial and geographical differences in the frequency and type of nanomedicine related terms used in a regulatory context. Finally 31 relevant and most frequently used terms deriving from various agencies have been compiled, discussed and analysed for their similarities and differences. These descriptions will support the development of harmonised use of terminology in the future. The report provides necessary background information to advance the discussion among stakeholders. It will strengthen activities aiming to develop harmonised standards in the field of nanomedicine, which is an essential factor to stimulate innovation and industrial competitiveness.JRC.F.2-Consumer Products Safet

JRC Publications Repository