13 research outputs found
Using Implicit Feedback to Improve Question Generation
Question Generation (QG) is a task of Natural Language Processing (NLP) that
aims at automatically generating questions from text. Many applications can
benefit from automatically generated questions, but often it is necessary to
curate those questions, either by selecting or editing them. This task is
informative on its own, but it is typically done post-generation, and, thus,
the effort is wasted. In addition, most existing systems cannot incorporate
this feedback back into them easily. In this work, we present a system, GEN,
that learns from such (implicit) feedback. Following a pattern-based approach,
it takes as input a small set of sentence/question pairs and creates patterns
which are then applied to new unseen sentences. Each generated question, after
being corrected by the user, is used as a new seed in the next iteration, so
more patterns are created each time. We also take advantage of the corrections
made by the user to score the patterns and therefore rank the generated
questions. Results show that GEN is able to improve by learning from both
levels of implicit feedback when compared to the version with no learning,
considering the top 5, 10, and 20 questions. Improvements go up from 10%,
depending on the metric and strategy used.Comment: 27 pages, 8 figure
Question Generation based on Lexico-Syntactic Patterns Learned from the Web
THE MENTOR automatically generates multiple-choice tests from a given text. This tool aims at supporting the dialogue system of the FalaComigo project, as one of FalaComigo's goals is the interaction with tourists through questions/answers and quizzes about their visit. In a minimally supervised learning process and by leveraging the redundancy and linguistic variability of the Web, THE MENTOR learns lexico-syntactic patterns using a set of question/answer seeds. Afterward, these patterns are used to match the sentences from which new questions (and answers) can be generated. Finally, several ï¬lters are applied in order to discard low quality items. In this paper we detail the question generation task as performed by T- Mand evaluate its performance
Towards a Fully Unsupervised Framework for Intent Induction in Customer Support Dialogues
State of the art models in intent induction require annotated datasets.
However, annotating dialogues is time-consuming, laborious and expensive. In
this work, we propose a completely unsupervised framework for intent induction
within a dialogue. In addition, we show how pre-processing the dialogue corpora
can improve results. Finally, we show how to extract the dialogue flows of
intentions by investigating the most common sequences. Although we test our
work in the MultiWOZ dataset, the fact that this framework requires no prior
knowledge make it applicable to any possible use case, making it very relevant
to real world customer support applications across industry.Comment: 16 pages, 8 figure
Reviewing Possible Extraction Tools
UID/LIN/03213/2013Collocations are a main problem for any natural language processing task, from machine translation to summarization. With the goal of building a corpus with collocations, enriched with statistical information about them, we survey, in this paper, four tools for extracting collocations. These tools allow us to collect sentences with collocations, and also to gather statistics on this particular type of co-ocurrences, like Mutual Information and Log likelihood values.publishersversionpublishe
A linguistically motivated taxonomy for Machine Translation error analysis
UID/LIN/03213/2013
SFRH/BD/85737/2012
SFRH/BD/51157/2010
SFRH/BD/51156/2010A detailed error analysis is a fundamental step in every natural lan- guage processing task, as to be able to diagnosis what went wrong will provide cues to decide which are the research directions to be followed. In this paper we focus on error analysis in Machine Translation. We deeply extend previous error taxonomies so that translation errors associated with Romance languages speci- ficities can be accommodated. Also, based on the proposed taxonomy, we carry out an extensive analysis of the errors generated by four di↵erent systems: two mainstream online translation systems Google Translate (Statistical) and Systran (Hybrid Machine Translation) and two in-house Machine Translation systems, in three scenarios representing di↵erent challenges in the translation from English to European Portuguese. Additionally, we comment on how distinct error types di↵erently impact translation quality.publishersversionpublishe
Fuzzy Fingerprinting Transformer Language-Models for Emotion Recognition in Conversations
Fuzzy Fingerprints have been successfully used as an interpretable text
classification technique, but, like most other techniques, have been largely
surpassed in performance by Large Pre-trained Language Models, such as BERT or
RoBERTa. These models deliver state-of-the-art results in several Natural
Language Processing tasks, namely Emotion Recognition in Conversations (ERC),
but suffer from the lack of interpretability and explainability. In this paper,
we propose to combine the two approaches to perform ERC, as a means to obtain
simpler and more interpretable Large Language Models-based classifiers. We
propose to feed the utterances and their previous conversational turns to a
pre-trained RoBERTa, obtaining contextual embedding utterance representations,
that are then supplied to an adapted Fuzzy Fingerprint classification module.
We validate our approach on the widely used DailyDialog ERC benchmark dataset,
in which we obtain state-of-the-art level results using a much lighter model.Comment: FUZZ-IEEE 202
Avaliação de recursos computacionais para o português
Têm sido desenvolvidas várias ferramentas para
o processamento da língua portuguesa. No entanto,
devido a escolhas variadas na base dos comportamentos
destas ferramentas (diferentes opções de pré-processamento,
diferentes conjuntos de etiquetas morfossintáticas e de dependências, etc.), torna-se difícil
ter uma ideia do desempenho comparativo de cada
uma. Neste trabalho, avaliamos um conjunto de ferramentas
gratuitas e publicamente disponíveis, que
realizam as tarefas de Etiquetação Morfossintática e
de Reconhecimento de Entidades Mencionadas, para
a língua portuguesa. São tidos em conta doze modelos
diferentes para a primeira tarefa e oito para a
segunda. Todos os recursos usados nesta avaliaçãao
(tabelas de mapeamento de etiquetas, corpora de referência, etc.) são disponibilizados, permitindo replicar/
afinar os resultados. Apresentamos ainda um estudo
qualitativo de dois analisadores de dependências.
Não temos conhecimento de nenhum trabalho similar
recente, isto ´e, que tenha em conta as ferramentas atuais
disponíveis, realizado para a língua portuguesa.FCT: UIDB/50021/2020 e PTDC/LLTLIN/
29887/2017info:eu-repo/semantics/publishedVersio