Search CORE

13 research outputs found

Using Implicit Feedback to Improve Question Generation

Author: Coheur Luisa
Nyberg Eric
Rodrigues Hugo
Publication venue
Publication date: 26/04/2023
Field of study

Question Generation (QG) is a task of Natural Language Processing (NLP) that aims at automatically generating questions from text. Many applications can benefit from automatically generated questions, but often it is necessary to curate those questions, either by selecting or editing them. This task is informative on its own, but it is typically done post-generation, and, thus, the effort is wasted. In addition, most existing systems cannot incorporate this feedback back into them easily. In this work, we present a system, GEN, that learns from such (implicit) feedback. Following a pattern-based approach, it takes as input a small set of sentence/question pairs and creates patterns which are then applied to new unseen sentences. Each generated question, after being corrected by the user, is used as a new seed in the next iteration, so more patterns are created each time. We also take advantage of the corrections made by the user to score the patterns and therefore rank the generated questions. Results show that GEN is able to improve by learning from both levels of implicit feedback when compared to the version with no learning, considering the top 5, 10, and 20 questions. Improvements go up from 10%, depending on the metric and strategy used.Comment: 27 pages, 8 figure

arXiv.org e-Print Archive

Question Generation based on Lexico-Syntactic Patterns Learned from the Web

Author: Coheur Luisa
Curto Sergio
Mendes Ana Cristina
Publication venue: University of Illinois at Chicago Library
Publication date: 16/03/2012
Field of study

THE MENTOR automatically generates multiple-choice tests from a given text. This tool aims at supporting the dialogue system of the FalaComigo project, as one of FalaComigo's goals is the interaction with tourists through questions/answers and quizzes about their visit. In a minimally supervised learning process and by leveraging the redundancy and linguistic variability of the Web, THE MENTOR learns lexico-syntactic patterns using a set of question/answer seeds. Afterward, these patterns are used to match the sentences from which new questions (and answers) can be generated. Finally, several ï¬lters are applied in order to discard low quality items. In this paper we detail the question generation task as performed by T- Mand evaluate its performance

University of Illinois at Chicago: Journals@UIC

Dialogue & Discourse (E-Journal - Universität Bielefeld)

Towards a Fully Unsupervised Framework for Intent Induction in Customer Support Dialogues

Author: Coheur Luisa
Costa Rita
Martins Bruno
Viana Sérgio
Publication venue
Publication date: 28/07/2023
Field of study

State of the art models in intent induction require annotated datasets. However, annotating dialogues is time-consuming, laborious and expensive. In this work, we propose a completely unsupervised framework for intent induction within a dialogue. In addition, we show how pre-processing the dialogue corpora can improve results. Finally, we show how to extract the dialogue flows of intentions by investigating the most common sequences. Although we test our work in the MultiWOZ dataset, the fact that this framework requires no prior knowledge make it applicable to any possible use case, making it very relevant to real world customer support applications across industry.Comment: 16 pages, 8 figure

arXiv.org e-Print Archive

Reviewing Possible Extraction Tools

Author: Coheur Luisa
Costa Ângela
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

UID/LIN/03213/2013Collocations are a main problem for any natural language processing task, from machine translation to summarization. With the goal of building a corpus with collocations, enriched with statistical information about them, we survey, in this paper, four tools for extracting collocations. These tools allow us to collect sentences with collocations, and also to gather statistics on this particular type of co-ocurrences, like Mutual Information and Log likelihood values.publishersversionpublishe

Repositório da Universidade Nova de Lisboa

A linguistically motivated taxonomy for Machine Translation error analysis

Author: Coheur Luisa
Correia Rui
Costa Ângela
Ling Wang
Luís Tiago
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

UID/LIN/03213/2013 SFRH/BD/85737/2012 SFRH/BD/51157/2010 SFRH/BD/51156/2010A detailed error analysis is a fundamental step in every natural lan- guage processing task, as to be able to diagnosis what went wrong will provide cues to decide which are the research directions to be followed. In this paper we focus on error analysis in Machine Translation. We deeply extend previous error taxonomies so that translation errors associated with Romance languages speci- ficities can be accommodated. Also, based on the proposed taxonomy, we carry out an extensive analysis of the errors generated by four di↵erent systems: two mainstream online translation systems Google Translate (Statistical) and Systran (Hybrid Machine Translation) and two in-house Machine Translation systems, in three scenarios representing di↵erent challenges in the translation from English to European Portuguese. Additionally, we comment on how distinct error types di↵erently impact translation quality.publishersversionpublishe

Repositório da Universidade Nova de Lisboa

Fuzzy Fingerprinting Transformer Language-Models for Emotion Recognition in Conversations

Author: Carvalho Joao Paulo
Coheur Luisa
Moniz Helena
Pereira Patrícia
Ribeiro Rui
Publication venue
Publication date: 08/09/2023
Field of study

Fuzzy Fingerprints have been successfully used as an interpretable text classification technique, but, like most other techniques, have been largely surpassed in performance by Large Pre-trained Language Models, such as BERT or RoBERTa. These models deliver state-of-the-art results in several Natural Language Processing tasks, namely Emotion Recognition in Conversations (ERC), but suffer from the lack of interpretability and explainability. In this paper, we propose to combine the two approaches to perform ERC, as a means to obtain simpler and more interpretable Large Language Models-based classifiers. We propose to feed the utterances and their previous conversational turns to a pre-trained RoBERTa, obtaining contextual embedding utterance representations, that are then supplied to an adapted Fuzzy Fingerprint classification module. We validate our approach on the widely used DailyDialog ERC benchmark dataset, in which we obtain state-of-the-art level results using a much lighter model.Comment: FUZZ-IEEE 202

arXiv.org e-Print Archive

Avaliação de recursos computacionais para o português

Author: Baptista Jorge
Coheur Luisa
Gonçalves Matilde
Mineiro Ana
Publication venue: 'University of Minho'
Publication date: 01/12/2020
Field of study

Têm sido desenvolvidas várias ferramentas para o processamento da língua portuguesa. No entanto, devido a escolhas variadas na base dos comportamentos destas ferramentas (diferentes opções de pré-processamento, diferentes conjuntos de etiquetas morfossintáticas e de dependências, etc.), torna-se difícil ter uma ideia do desempenho comparativo de cada uma. Neste trabalho, avaliamos um conjunto de ferramentas gratuitas e publicamente disponíveis, que realizam as tarefas de Etiquetação Morfossintática e de Reconhecimento de Entidades Mencionadas, para a língua portuguesa. São tidos em conta doze modelos diferentes para a primeira tarefa e oito para a segunda. Todos os recursos usados nesta avaliaçãao (tabelas de mapeamento de etiquetas, corpora de referência, etc.) são disponibilizados, permitindo replicar/ afinar os resultados. Apresentamos ainda um estudo qualitativo de dois analisadores de dependências. Não temos conhecimento de nenhum trabalho similar recente, isto ´e, que tenha em conta as ferramentas atuais disponíveis, realizado para a língua portuguesa.FCT: UIDB/50021/2020 e PTDC/LLTLIN/ 29887/2017info:eu-repo/semantics/publishedVersio

Repositório Institucional da Universidade Católica Portuguesa

Sapientia