19,914 research outputs found
A Survey of Paraphrasing and Textual Entailment Methods
Paraphrasing methods recognize, generate, or extract phrases, sentences, or
longer natural language expressions that convey almost the same information.
Textual entailment methods, on the other hand, recognize, generate, or extract
pairs of natural language expressions, such that a human who reads (and trusts)
the first element of a pair would most likely infer that the other element is
also true. Paraphrasing can be seen as bidirectional textual entailment and
methods from the two areas are often similar. Both kinds of methods are useful,
at least in principle, in a wide range of natural language processing
applications, including question answering, summarization, text generation, and
machine translation. We summarize key ideas from the two areas by considering
in turn recognition, generation, and extraction methods, also pointing to
prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of
Informatics, Athens University of Economics and Business, Greece, 201
Combination of Domain Knowledge and Deep Learning for Sentiment Analysis of Short and Informal Messages on Social Media
Sentiment analysis has been emerging recently as one of the major natural
language processing (NLP) tasks in many applications. Especially, as social
media channels (e.g. social networks or forums) have become significant sources
for brands to observe user opinions about their products, this task is thus
increasingly crucial. However, when applied with real data obtained from social
media, we notice that there is a high volume of short and informal messages
posted by users on those channels. This kind of data makes the existing works
suffer from many difficulties to handle, especially ones using deep learning
approaches. In this paper, we propose an approach to handle this problem. This
work is extended from our previous work, in which we proposed to combine the
typical deep learning technique of Convolutional Neural Networks with domain
knowledge. The combination is used for acquiring additional training data
augmentation and a more reasonable loss function. In this work, we further
improve our architecture by various substantial enhancements, including
negation-based data augmentation, transfer learning for word embeddings, the
combination of word-level embeddings and character-level embeddings, and using
multitask learning technique for attaching domain knowledge rules in the
learning process. Those enhancements, specifically aiming to handle short and
informal messages, help us to enjoy significant improvement in performance once
experimenting on real datasets.Comment: A Preprint of an article accepted for publication by Inderscience in
IJCVR on September 201
Investigating Combining Quantitative And Textual Causal Knowledge In Learning Causal Structure
The study of causes and effects in large systems such as meteorology, biochemistry, finance, and sociology plays a critical role in predicting future developments and possible interventions. In the last decades, several new techniques and algorithms have been developed to discover causal structures in multivariate quantitative datasets. Yet, solely determining causal structure from observations is challenging and often
yields ambiguous results. Additional knowledge from other sources is likely to be beneficial.
Recently emerging large-scale language models are showing impressive results in the field of natural language processing (NLP). One task in the field of NLP is to extract causal relations from text. Combining these with causal discovery algorithms could be advantageous.
This bachelor thesis investigates the combination of causal structures from quantitative and qualitative sources. A feasibility study was conducted on two datasets; (1) a biochemistry flow cytometry dataset and (2) a self-collected financial dataset. During this process, a common framework was developed that enables the combination of both sources. Considerations and problems were monitored and improvements suggested. A focus laid upon visualizing the evidences with different Python and R libraries.
In principle, it is possible to combine both domains. However, it was found, that a lack of training data for causal relation extraction exists. Knowledge graphs with an underlying ontology need to be leveraged to account for lexically different terms of the same entity. To improve the results from the qualitative data, it would be advantageous to extract events rather than causal relations. This thesis makes a valuable contribution to the study of integrating quantitative and qualitative causal knowledge by applying various methods to two distinct datasets from different domains. Furthermore, it addresses a research gap, as there is limited existing literature in this specific area to the best of my knowledge
Active learning in annotating micro-blogs dealing with e-reputation
Elections unleash strong political views on Twitter, but what do people
really think about politics? Opinion and trend mining on micro blogs dealing
with politics has recently attracted researchers in several fields including
Information Retrieval and Machine Learning (ML). Since the performance of ML
and Natural Language Processing (NLP) approaches are limited by the amount and
quality of data available, one promising alternative for some tasks is the
automatic propagation of expert annotations. This paper intends to develop a
so-called active learning process for automatically annotating French language
tweets that deal with the image (i.e., representation, web reputation) of
politicians. Our main focus is on the methodology followed to build an original
annotated dataset expressing opinion from two French politicians over time. We
therefore review state of the art NLP-based ML algorithms to automatically
annotate tweets using a manual initiation step as bootstrap. This paper focuses
on key issues about active learning while building a large annotated data set
from noise. This will be introduced by human annotators, abundance of data and
the label distribution across data and entities. In turn, we show that Twitter
characteristics such as the author's name or hashtags can be considered as the
bearing point to not only improve automatic systems for Opinion Mining (OM) and
Topic Classification but also to reduce noise in human annotations. However, a
later thorough analysis shows that reducing noise might induce the loss of
crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science -
Vol 3 - Contextualisation digitale - 201
- …