794 research outputs found
Definición de disparador de emoción asociado a la cultura y aplicación a la clasificación de la valencia y la emoción en textos
Este artículo presenta un método de identificación y clasificación de la valencia y las
emociones presentes en un texto. Para ello, se introduce un nuevo concepto denominado
disparador de emoción. Inicialmente, se construye de forma incremental una base de datos
léxica de disparadores de emoción asociados a la cultura con la que se quiere trabajar,
basándose en tres teorías diferentes: la Teoría de la Relevancia de Pragmática, la Teoría de la
Motivación de Maslow de Psicología y la Teoría de Necesidades de Neef de Economía. La base
de datos creada parte de un conjunto inicial de términos y es ampliada con la información de
otros recursos léxicos, como WordNet, NomLex y dominios relevantes. El enlace entre idiomas
se hace por medio de EuroWordNet y se completa y adapta a diversas culturas con bases de
conocimiento específicas para cada lengua. También, se demuestra cómo la base de datos
construida puede ser utilizada para buscar en textos la valencia (polaridad) y el significado
afectivo. Finalmente, se evalúa el método utilizando los datos de prueba de la tarea nº 14 de
Semeval “Texto afectivo” y su traducción al español. Los resultados y las mejoras se presentan
junto con una discusión en la que se tratan los puntos fuertes y débiles del método y las
directrices para el trabajo futuro.This paper presents a method to automatically spot and classify the valence and
emotions present in written text, based on a concept we introduced - of emotion triggers. The
first step consists of incrementally building a culture dependent lexical database of emotion
triggers, emerging from the theory of relevance from pragmatics, Maslow´s theory of human
needs from psychology and Neef´s theory of human needs in economics. We start from a core
of terms and expand them using lexical resources such as WordNet, completed by NomLex,
sense number disambiguated using the Relevant Domains concept. The mapping among
languages is accomplished using EuroWordNet and the completion and projection to different
cultures is done through language-specific commonsense knowledge bases. Subsequently, we
show the manner in which the constructed database can be used to mine texts for valence
(polarity) and affective meaning. An evaluation is performed on the Semeval Task No. 14:
Affective Text test data and their corresponding translation to Spanish. The results and
improvements are presented together with an argument on the strong and weak points of the
method and the directions for future work
Abstract syntax as interlingua: Scaling up the grammatical framework from controlled languages to robust pipelines
Syntax is an interlingual representation used in compilers. Grammatical Framework (GF) applies the abstract syntax idea to natural languages. The development of GF started in 1998, first as a tool for controlled language implementations, where it has gained an established position in both academic and commercial projects. GF provides grammar resources for over 40 languages, enabling accurate generation and translation, as well as grammar engineering tools and components for mobile and Web applications. On the research side, the focus in the last ten years has been on scaling up GF to wide-coverage language processing. The concept of abstract syntax offers a unified view on many other approaches: Universal Dependencies, WordNets, FrameNets, Construction Grammars, and Abstract Meaning Representations. This makes it possible for GF to utilize data from the other approaches and to build robust pipelines. In return, GF can contribute to data-driven approaches by methods to transfer resources from one language to others, to augment data by rule-based generation, to check the consistency of hand-annotated corpora, and to pipe analyses into high-precision semantic back ends. This article gives an overview of the use of abstract syntax as interlingua through both established and emerging NLP applications involving GF
Deep Learning-Based Knowledge Injection for Metaphor Detection: A Comprehensive Review
The history of metaphor research also marks the evolution of knowledge
infusion research. With the continued advancement of deep learning techniques
in recent years, the natural language processing community has shown great
interest in applying knowledge to successful results in metaphor recognition
tasks. Although there has been a gradual increase in the number of approaches
involving knowledge injection in the field of metaphor recognition, there is a
lack of a complete review article on knowledge injection based approaches.
Therefore, the goal of this paper is to provide a comprehensive review of
research advances in the application of deep learning for knowledge injection
in metaphor recognition tasks. In this paper, we systematically summarize and
generalize the mainstream knowledge and knowledge injection principles, as well
as review the datasets, evaluation metrics, and benchmark models used in
metaphor recognition tasks. Finally, we explore the current issues facing
knowledge injection methods and provide an outlook on future research
directions.Comment: 15 page
Recommended from our members
Leveraging Text-to-Scene Generation for Language Elicitation and Documentation
Text-to-scene generation systems take input in the form of a natural language text and output a 3D scene illustrating the meaning of that text. A major benefit of text-to-scene generation is that it allows users to create custom 3D scenes without requiring them to have a background in 3D graphics or knowledge of specialized software packages. This contributes to making text-to-scene useful in scenarios from creative applications to education. The primary goal of this thesis is to explore how we can use text-to-scene generation in a new way: as a tool to facilitate the elicitation and formal documentation of language. In particular, we use text-to-scene generation (a) to assist field linguists studying endangered languages; (b) to provide a cross-linguistic framework for formally modeling spatial language; and (c) to collect language data using crowdsourcing. As a side effect of these goals, we also explore the problem of multilingual text-to-scene generation, that is, systems for generating 3D scenes from languages other than English.
The contributions of this thesis are the following. First, we develop a novel tool suite (the WordsEye Linguistics Tools, or WELT) that uses the WordsEye text-to-scene system to assist field linguists with eliciting and documenting endangered languages. WELT allows linguists to create custom elicitation materials and to document semantics in a formal way. We test WELT with two endangered languages, Nahuatl and Arrernte. Second, we explore the question of how to learn a syntactic parser for WELT. We show that an incremental learning method using a small number of annotated dependency structures can produce reasonably accurate results. We demonstrate that using a parser trained in this way can significantly decrease the time it takes an annotator to label a new sentence with dependency information. Third, we develop a framework that generates 3D scenes from spatial and graphical semantic primitives. We incorporate this system into the WELT tools for creating custom elicitation materials, allowing users to directly manipulate the underlying semantics of a generated scene. Fourth, we introduce a deep semantic representation of spatial relations and use this to create a new resource, SpatialNet, which formally declares the lexical semantics of spatial relations for a language. We demonstrate how SpatialNet can be used to support multilingual text-to-scene generation. Finally, we show how WordsEye and the semantic resources it provides can be used to facilitate elicitation of language using crowdsourcing
Design of a Controlled Language for Critical Infrastructures Protection
We describe a project for the construction of controlled language for critical infrastructures protection (CIP). This project originates
from the need to coordinate and categorize the communications on CIP at the European level. These communications can be physically
represented by official documents, reports on incidents, informal communications and plain e-mail. We explore the application of
traditional library science tools for the construction of controlled languages in order to achieve our goal. Our starting point is an
analogous work done during the sixties in the field of nuclear science known as the Euratom Thesaurus.JRC.G.6-Security technology assessmen
Idiom treatment experiments in machine translation
Idiomatic expressions pose a particular challenge for the today\u27;s Machine Translation systems, because their translation mostly does not result literally, but logically. The present dissertation shows, how with the help of a corpus, and morphosyntactic rules, such idiomatic expressions can be recognized and finally correctly translated. The work leads the reader in the first chapter generally to the field of Machine Translation and following that, it focuses on the special field of Example-based Machine Translation. Next, an important part of the doctoral thesis dissertation is devoted to the theory of idiomatic expressions. The practical part of the thesis describes how the hybrid Example-based Machine Translation system METIS-II, with the help of morphosyntactic rules, is able to correctly process certain idiomatic expressions and finally, to translate them. The following chapter deals with the function of the transfer system CAT2 and its handling of the idiomatic expressions. The last part of the thesis includes the evaluation of three commercial systems, namely SYSTRAN, T1 Langenscheidt, and Power Translator Pro, with respect to continuous and discontinuous idiomatic expressions. For this, both small corpora and a part of the extensive corpus Europarl and the Digital Lexicon of the German Language in 20th century were processed, firstly manually and then automatically. The dissertation concludes with results from this evaluation.Idiomatische Redewendungen stellen für heutige maschinelle Übersetzungssysteme eine besondere Herausforderung dar, da ihre Übersetzung nicht wörtlich, sondern stets sinngemäß erfolgen muss. Die vorliegende Dissertation zeigt, wie mit Hilfe eines Korpus sowie morphosyntaktischer Regeln solche idiomatische Redewendungen erkannt und am Ende richtig übersetzt werden können. Die Arbeit führt den Leser im ersten Kapitel allgemein in das Gebiet der Maschinellen Übersetzung ein und vertieft im Anschluss daran das Spezialgebiet der Beispielbasierten Maschinellen Übersetzung. Im Folgenden widmet sich ein wesentlicher Teil der Doktorarbeit der Theorie über idiomatische Redewendungen. Der praktische Teil der Arbeit beschreibt wie das hybride Beispielbasierte Maschinelle Übersetzungssystem METIS-II mit Hilfe von morphosyntaktischen Regeln befähigt wurde, bestimmte idiomatische Redewendungen korrekt zu bearbeiten und am Ende zu übersetzen. Das nachfolgende Kapitel behandelt die Funktion des Transfersystems CAT2 und dessen Umgang mit idiomatischen Wendungen. Der letzte Teil der Arbeit beinhaltet die Evaluation von drei kommerzielle Systemen, nämlich SYSTRAN, T1 Langenscheidt und Power Translator Pro, in Bezug auf deren Umgang mit kontinuierlichen und diskontinuierlichen idiomatischen Redewendungen. Hierzu wurden sowohl kleine Korpora als auch ein Teil des umfangreichen Korpus Europarl und des Digatalen Wörterbuchs der deutschen Sprache des 20. Jh. erst manuell und dann maschinell bearbeitet. Die Dissertation wird mit Folgerungen aus der Evaluation abgeschlossen
Sentiment Analysis: State of the Art
We present the state of art in sentiment analysis which covers the purpose of sentiment analysis, levels of sentiment analysis and processes that could be used to measure polarity and classify labels. Moreover, brief details about some resources of sentiment analysis are included
ArAutoSenti: Automatic annotation and new tendencies for sentiment classification of Arabic messages
The file attached to this record is the author's final peer reviewed version.A corpus-based sentiment analysis approach for messages written in Arabic and its dialects is presented and implemented. The originality of this approach resides in the automation construction of the annotated sentiment corpus, which relies mainly on a sentiment lexicon that is also constructed automatically. For the classification step, shallow and deep classifiers are used with features being extracted applying word embedding models. For the validation of the constructed corpus, we proceed with a manual reviewing and it was found that 85.17% were correctly annotated. This approach is applied on the under-resourced Algerian dialect and the approach is tested on two external test corpora presented in the literature. The obtained results are very
encouraging with an F1-score that is up to 88% (on the first test corpus) and up to 81% (on the second test corpus). These results respectively represent a 20% and a 6% improvement, respectively, when compared with existing work in the research literature
- …