231 research outputs found
IEST: WASSA-2018 Implicit Emotions Shared Task
Past shared tasks on emotions use data with both overt expressions of
emotions (I am so happy to see you!) as well as subtle expressions where the
emotions have to be inferred, for instance from event descriptions. Further,
most datasets do not focus on the cause or the stimulus of the emotion. Here,
for the first time, we propose a shared task where systems have to predict the
emotions in a large automatically labeled dataset of tweets without access to
words denoting emotions. Based on this intention, we call this the Implicit
Emotion Shared Task (IEST) because the systems have to infer the emotion mostly
from the context. Every tweet has an occurrence of an explicit emotion word
that is masked. The tweets are collected in a manner such that they are likely
to include a description of the cause of the emotion - the stimulus.
Altogether, 30 teams submitted results which range from macro F1 scores of 21 %
to 71 %. The baseline (MaxEnt bag of words and bigrams) obtains an F1 score of
60 % which was available to the participants during the development phase. A
study with human annotators suggests that automatic methods outperform human
predictions, possibly by honing into subtle textual clues not used by humans.
Corpora, resources, and results are available at the shared task website at
http://implicitemotions.wassa2018.com.Comment: Accepted at Proceedings of the 9th Workshop on Computational
Approaches to Subjectivity, Sentiment and Social Media Analysi
Kомплексний порівняльний контент-аналіз промов жінок-лідерів (2009– 2013) (Comprehensive content analysis of the speeches of female leaders (2009-2013)
Тези присвячено комплексному контент-аналізу політичного дискурсу Державного секретаря США Гілларі Родем
Клінтон, канцлеру Німеччини Ангели Меркель та прем’єр-
міністру Австралії Джулії Ейлін Гіллард (2009–2013), здійснено класифікацію та аналіз термінологічної наповненості промов, висвітлено стилістичні особливості політичного дискурсу.
(The research is devoted to the structural and lexical analysis of the political discourse of the Secretary of State Hillary Rodham Clinton, the Chancellor Angela Merkel and Prime-minister of Australia Julia Eileen Gillard (2009–2013). It deals with the
classification and analysis of the terminology of speeches, stylistic peculiarities of political discourse were distinguished.
Extending the EmotiNet Knowledge Base to Improve the Automatic Detection of Implicitly Expressed Emotions from Text
Sentiment analysis is one of the recent, highly dynamic fields in Natural
Language Processing. Most existing approaches are based on word-level
analysis of texts and are mostly able to detect only explicit expressions of
sentiment. However, in many cases, emotions are not expressed by using
words with an affective meaning (e.g. happy), but by describing real-life
situations, which readers (based on their commonsense knowledge) detect
as being related to a specic emotion. Given the challenges of detecting
emotions from contexts in which no lexical clue is present, in this article we
present a comparative analysis between the performance of well-established
methods for emotion detection (supervised and lexical knowledge-based) and
a method we propose and extend, which is based on commonsense knowledge
stored in the EmotiNet knowledge base. Our extensive evaluations show
that, in the context of this task, the approach based on EmotiNet is the
most appropriate.JRC.G.2-Global security and crisis managemen
Detecting Event-Related Links and Sentiments from Social Media Texts
Nowadays, the importance of Social Media is constantly growing, as people often use such platforms to share mainstream media news and comment on the events that they relate to. As such, people no loger remain mere spectators to the events that happen in the world, but become part of them, commenting on their developments and the entities involved, sharing their opinions and distributing related content. This paper describes a system that links the main events detected from clusters of newspaper articles to tweets related to them, detects complementary information sources from the links they contain and subsequently applies sentiment analysis to classify them into positive, negative and neutral. In this manner, readers can follow the main events happening in the world, both from the perspective of mainstream as well as social media and the public's perception on them. This system is part of a media monitoring framework working live and it will be demonstrated using Google Earth.JRC.G.2-Global security and crisis managemen
Definición de disparador de emoción asociado a la cultura y aplicación a la clasificación de la valencia y la emoción en textos
Este artículo presenta un método de identificación y clasificación de la valencia y las
emociones presentes en un texto. Para ello, se introduce un nuevo concepto denominado
disparador de emoción. Inicialmente, se construye de forma incremental una base de datos
léxica de disparadores de emoción asociados a la cultura con la que se quiere trabajar,
basándose en tres teorías diferentes: la Teoría de la Relevancia de Pragmática, la Teoría de la
Motivación de Maslow de Psicología y la Teoría de Necesidades de Neef de Economía. La base
de datos creada parte de un conjunto inicial de términos y es ampliada con la información de
otros recursos léxicos, como WordNet, NomLex y dominios relevantes. El enlace entre idiomas
se hace por medio de EuroWordNet y se completa y adapta a diversas culturas con bases de
conocimiento específicas para cada lengua. También, se demuestra cómo la base de datos
construida puede ser utilizada para buscar en textos la valencia (polaridad) y el significado
afectivo. Finalmente, se evalúa el método utilizando los datos de prueba de la tarea nº 14 de
Semeval “Texto afectivo” y su traducción al español. Los resultados y las mejoras se presentan
junto con una discusión en la que se tratan los puntos fuertes y débiles del método y las
directrices para el trabajo futuro.This paper presents a method to automatically spot and classify the valence and
emotions present in written text, based on a concept we introduced - of emotion triggers. The
first step consists of incrementally building a culture dependent lexical database of emotion
triggers, emerging from the theory of relevance from pragmatics, Maslow´s theory of human
needs from psychology and Neef´s theory of human needs in economics. We start from a core
of terms and expand them using lexical resources such as WordNet, completed by NomLex,
sense number disambiguated using the Relevant Domains concept. The mapping among
languages is accomplished using EuroWordNet and the completion and projection to different
cultures is done through language-specific commonsense knowledge bases. Subsequently, we
show the manner in which the constructed database can be used to mine texts for valence
(polarity) and affective meaning. An evaluation is performed on the Semeval Task No. 14:
Affective Text test data and their corresponding translation to Spanish. The results and
improvements are presented together with an argument on the strong and weak points of the
method and the directions for future work
Sentiment Analysis in Social Media Texts
This paper presents a method for sentiment
analysis specifically designed to work with
Twitter data (tweets), taking into account their
structure, length and specific language. The
approach employed makes it easily extendible
to other languages and makes it able to process
tweets in near real time. The main contributions
of this work are: a) the pre-processing
of tweets to normalize the language and generalize
the vocabulary employed to express sentiment;
b) the use minimal linguistic processing,
which makes the approach easily portable
to other languages; c) the inclusion of higher
order n-grams to spot modifications in the polarity
of the sentiment expressed; d) the use of
simple heuristics to select features to be employed;
e) the application of supervised learning
using a simple Support Vector Machines
linear classifier on a set of realistic data. We
show that using the training models generated
with the method described we can improve
the sentiment classification performance, irrespective
of the domain and distribution of the
test sets.JRC.G.2 - Global security and crisis managemen
Going beyond traditional QA systems: challenges and keys in opinion question answering
The treatment of factual data has been widely studied in different areas of Natural Language Processing (NLP). However, processing subjective information still poses important challenges. This paper presents research aimed at assessing techniques that have been suggested as appropriate in the context of subjective - Opinion Question Answering (OQA). We evaluate the performance of an OQA with these new components and propose methods to optimally tackle the issues encountered. We assess the impact of including additional resources and processes with the purpose of improving the system performance on two distinct blog datasets. The improvements obtained for the different combination of tools are statistically significant. We thus conclude that the proposed approach is adequate for the OQA task, offering a good strategy to deal with opinionated questions.This paper has been partially supported by Ministerio de Ciencia e Innovación - Spanish Government (grant no. TIN2009-13391-C04-01), and Conselleria d'Educación - Generalitat Valenciana (grant no. PROMETEO/2009/119 and ACOMP/2010/286)
Mapping Nanomedicine Terminology in the Regulatory Landscape
A common terminology is essential in any field of science and technology for a mutual understanding among different communities of experts and regulators, harmonisation of policy actions, standardisation of quality procedures and experimental testing, and the communication to the general public. It also allows effective revision of information for policy making and optimises research fund allocation.
In particular, in emerging scientific fields with a high innovation potential, new terms, descriptions and definitions are quickly generated, which are then ambiguously used by stakeholders having diverse interests, coming from different scientific disciplines and/or from various regions. The application of nanotechnology in health -often called nanomedicine- is considered as such emerging and multidisciplinary field with a growing interest of various communities.
In order to support a better understanding of terms used in the regulatory domain, the Nanomedicines Working Group of the International Pharmaceutical Regulators Forum (IPRF) has prioritised the need to map, compile and discuss the currently used terminology of regulatory scientists coming from different geographic areas. The JRC has taken the lead to identify and compile frequently used terms in the field by using web crawling and text mining tools as well as the manual extraction of terms. Websites of 13 regulatory authorities and clinical trial registries globally involved in regulating nanomedicines have been crawled. The compilation and analysis of extracted terms demonstrated sectorial and geographical differences in the frequency and type of nanomedicine related terms used in a regulatory context. Finally 31 relevant and most frequently used terms deriving from various agencies have been compiled, discussed and analysed for their similarities and differences. These descriptions will support the development of harmonised use of terminology in the future.
The report provides necessary background information to advance the discussion among stakeholders. It will strengthen activities aiming to develop harmonised standards in the field of nanomedicine, which is an essential factor to stimulate innovation and industrial competitiveness.JRC.F.2-Consumer Products Safet
Resource Creation and Evaluation for Multilingual Sentiment Analysis in Social Media Texts
Sentiment analysis (SA) regards the classification of texts according to the polarity of the opinions they express. SA systems are highly relevant to many real-world applications (e.g. marketing, eGovernance, business intelligence, behavioral sciences) and also to many tasks in Natural Language Processing (NLP) – information extraction, question answering, textual entailment, to name just a few. The importance of this field has been proven by the high number of approaches proposed in research, as well as by the interest that it raised from other disciplines and the applications that were created using its technology.
In our case, the primary focus is to use sentiment analysis in the context of media monitoring, to enable tracking of global reactions to events. The main challenge that we face is that tweets are written in different languages and an unbiased system should be able to deal with all of them, in order to process all (possible) available data.
Unfortunately, although many linguistic resources exist for processing texts written in English, for many other languages data and tools are scarce. Following our initial efforts described in (Balahur and Turchi, 2013), in this article we extend our study on the possibility to implement a multilingual system that is able to a) classify sentiment expressed in tweets in various languages using training data obtained through machine translation; b) to verify the extent to which the quality of the translations influences the sentiment classification performance, in this case, of highly informal texts; and c) to improve multilingual sentiment classification using small amounts of data annotated in the target language. To this aim, varying sizes of target language data are tested. The languages we explore are: Arabic, Turkish, Russian, Italian, Spanish, German and French.JRC.G.2-Global security and crisis managemen
Proceedings of the First Workshop on Computing News Storylines (CNewsStory 2015)
This volume contains the proceedings of the 1st Workshop on Computing News Storylines (CNewsStory
2015) held in conjunction with the 53rd Annual Meeting of the Association for Computational
Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP
2015) at the China National Convention Center in Beijing, on July 31st 2015.
Narratives are at the heart of information sharing. Ever since people began to share their experiences,
they have connected them to form narratives. The study od storytelling and the field of literary theory
called narratology have developed complex frameworks and models related to various aspects of
narrative such as plots structures, narrative embeddings, characters’ perspectives, reader response, point
of view, narrative voice, narrative goals, and many others. These notions from narratology have been
applied mainly in Artificial Intelligence and to model formal semantic approaches to narratives (e.g.
Plot Units developed by Lehnert (1981)). In recent years, computational narratology has qualified as an
autonomous field of study and research. Narrative has been the focus of a number of workshops and
conferences (AAAI Symposia, Interactive Storytelling Conference (ICIDS), Computational Models of
Narrative). Furthermore, reference annotation schemes for narratives have been proposed (NarrativeML
by Mani (2013)).
The workshop aimed at bringing together researchers from different communities working on
representing and extracting narrative structures in news, a text genre which is highly used in NLP
but which has received little attention with respect to narrative structure, representation and analysis.
Currently, advances in NLP technology have made it feasible to look beyond scenario-driven, atomic
extraction of events from single documents and work towards extracting story structures from multiple
documents, while these documents are published over time as news streams. Policy makers, NGOs,
information specialists (such as journalists and librarians) and others are increasingly in need of tools
that support them in finding salient stories in large amounts of information to more effectively implement
policies, monitor actions of “big players” in the society and check facts. Their tasks often revolve around
reconstructing cases either with respect to specific entities (e.g. person or organizations) or events (e.g.
hurricane Katrina). Storylines represent explanatory schemas that enable us to make better selections
of relevant information but also projections to the future. They form a valuable potential for exploiting
news data in an innovative way.JRC.G.2-Global security and crisis managemen
- …
