235 research outputs found
Know Thy Strengths: Comprehensive Dialogue State Tracking Diagnostics
Recent works that revealed the vulnerability of dialogue state tracking (DST)
models to distributional shifts have made holistic comparisons on robustness
and qualitative analyses increasingly important for understanding their
relative performance. We present our findings from standardized and
comprehensive DST diagnoses, which have previously been sparse and
uncoordinated, using our toolkit, CheckDST, a collection of robustness tests
and failure mode analytics. We discover that different classes of DST models
have clear strengths and weaknesses, where generation models are more promising
for handling language variety while span-based classification models are more
robust to unseen entities. Prompted by this discovery, we also compare
checkpoints from the same model and find that the standard practice of
selecting checkpoints using validation loss/accuracy is prone to overfitting
and each model class has distinct patterns of failure. Lastly, we demonstrate
how our diagnoses motivate a pre-finetuning procedure with non-dialogue data
that offers comprehensive improvements to generation models by alleviating the
impact of distributional shifts through transfer learning.Comment: EMNLP202
Reinforcement Learning from Reformulations in Conversational Question Answering over Knowledge Graphs
The rise of personal assistants has made conversational question answering (ConvQA) a very popular mechanism for user-system interaction. State-of-the-art methods for ConvQA over knowledge graphs (KGs) can only learn from crisp question-answer pairs found in popular benchmarks. In reality, however, such training data is hard to come by: users would rarely mark answers explicitly as correct or wrong. In this work, we take a step towards a more natural learning paradigm - from noisy and implicit feedback via question reformulations. A reformulation is likely to be triggered by an incorrect system response, whereas a new follow-up question could be a positive signal on the previous turn's answer. We present a reinforcement learning model, termed CONQUER, that can learn from a conversational stream of questions and reformulations. CONQUER models the answering process as multiple agents walking in parallel on the KG, where the walks are determined by actions sampled using a policy network. This policy network takes the question along with the conversational context as inputs and is trained via noisy rewards obtained from the reformulation likelihood. To evaluate CONQUER, we create and release ConvRef, a benchmark with about 11k natural conversations containing around 205k reformulations. Experiments show that CONQUER successfully learns to answer conversational questions from noisy reward signals, significantly improving over a state-of-the-art baseline
Knowledge base integration in biomedical natural language processing applications
With the progress of natural language processing in the biomedical field, the lack of annotated data due to regulations and expensive labor remains an issue. In this work, we study the potential of knowledge bases for biomedical language processing to compensate for the shortage of annotated data. Accordingly, we experiment with the integration of a rigorous biomedical knowledge base, the Unified Medical Language System, in three different biomedical natural language processing applications: text simplification, conversational agents for medication adherence, and automatic evaluation of medical students' chart notes.
In the first task, we take as a use case simplifying medication instructions to enhance medication adherence among patients. Given the lack of an appropriate parallel corpus, the Unified Medical Language System provided simpler synonyms for an unsupervised system we devise, and we show a positive impact on comprehension through a human subjects study.
As for the second task, we devise an unsupervised system to automatically evaluate chart notes written by medical students. The purpose of the system is to speed up the feedback process and enhance the educational experience. With the lack of training corpora, utilizing the Unified Medical Language System proved to enhance the accuracy of evaluation after integration into the baseline system.
For the final task, the Unified Medical Language System was used to augment the training data of a conversational agent that educates patients on their medications. As part of the educational procedure, the agent needed to assess the comprehension of the patients by evaluating their answers to predefined questions. Starting with a small seed set of paraphrases of acceptable answers, the Unified Medical Language System was used to artificially augment the original small seed set via synonymy. Results did not show an increase in quality of system output after knowledge base integration due to the majority of errors resulting from mishandling of counts and negations.
We later demonstrate the importance of a (lacking) entity linking system to perform optimal integration of biomedical knowledge bases, and we offer a first stride towards solving that problem, along with conclusions on proper training setup and processes for automatic collection of an annotated dataset for biomedical word sense disambiguation
Recommended from our members
Modeling Narrative Discourse
This thesis describes new approaches to the formal modeling of narrative discourse. Although narratives of all kinds are ubiquitous in daily life, contemporary text processing techniques typically do not leverage the aspects that separate narrative from expository discourse. We describe two approaches to the problem. The first approach considers the conversational networks to be found in literary fiction as a key aspect of discourse coherence; by isolating and analyzing these networks, we are able to comment on longstanding literary theories. The second approach proposes a new set of discourse relations that are specific to narrative. By focusing on certain key aspects, such as agentive characters, goals, plans, beliefs, and time, these relations represent a theory-of-mind interpretation of a text. We show that these discourse relations are expressive, formal, robust, and through the use of a software system, amenable to corpus collection projects through the use of trained annotators. We have procured and released a collection of over 100 encodings, covering a set of fables as well as longer texts including literary fiction and epic poetry. We are able to inferentially find similarities and analogies between encoded stories based on the proposed relations, and an evaluation of this technique shows that human raters prefer such a measure of similarity to a more traditional one based on the semantic distances between story propositions
Semantic relations between sentences: from lexical to linguistically inspired semantic features and beyond
This thesis is concerned with the identification of semantic equivalence between pairs of natural language
sentences, by studying and computing models to address Natural Language Processing tasks where some
form of semantic equivalence is assessed. In such tasks, given two sentences, our models output either
a class label, corresponding to the semantic relation between the sentences, based on a predefined set
of semantic relations, or a continuous score, corresponding to their similarity on a predefined scale. The
former setup corresponds to the tasks of Paraphrase Identification and Natural Language Inference, while
the latter corresponds to the task of Semantic Textual Similarity.
We present several models for English and Portuguese, where various types of features are considered,
for instance based on distances between alternative representations of each sentence, following lexical
and semantic frameworks, or embeddings from pre-trained Bidirectional Encoder Representations from
Transformers models. For English, a new set of semantic features is proposed, from the formal semantic
representation of Discourse Representation Structure. In Portuguese, suitable corpora are scarce and formal
semantic representations are unavailable, hence an evaluation of currently available features and corpora is
conducted, following the modelling setup employed for English.
Competitive results are achieved on all tasks, for both English and Portuguese, particularly when considering
that our models are based on generally available tools and technologies, and that all features and models are
suitable for computation in most modern computers, except for those based on embeddings. In particular,
for English, our semantic features from DRS are able to improve the performance of other models, when
integrated in the feature set of such models, and state of the art results are achieved for Portuguese, with
models based on fine tuning embeddings to a specific task; Sumário:
Relações semânticas entre frases: de aspectos
lexicais a aspectos semânticos inspirados em
linguística e além destes
Esta tese é dedicada à identificação de equivalência semântica entre frases em língua natural, através do
estudo e computação de modelos destinados a tarefas de Processamento de Linguagem Natural relacionadas
com alguma forma de equivalência semântica. Em tais tarefas, a partir de duas frases, os nossos modelos
produzem uma etiqueta de classificação, que corresponde à relação semântica entre as frases, baseada
num conjunto predefinido de possíveis relações semânticas, ou um valor contínuo, que corresponde à
similaridade das frases numa escala predefinida. A primeira configuração mencionada corresponde às tarefas
de Identificação de Paráfrases e de Inferência em Língua Natural, enquanto que a última configuração
mencionada corresponde à tarefa de Similaridade Semântica em Texto.
Apresentamos diversos modelos para Inglês e Português, onde vários tipos de aspectos são considerados,
por exemplo baseados em distâncias entre representações alternativas para cada frase, seguindo formalismos
semânticos e lexicais, ou vectores contextuais de modelos previamente treinados com Representações
Codificadas Bidirecionalmente a partir de Transformadores. Para Inglês, propomos um novo conjunto de
aspectos semânticos, a partir da representação formal de semântica em Estruturas de Representação de
Discurso. Para Português, os conjuntos de dados apropriados são escassos e não estão disponíveis representações
formais de semântica, então implementámos uma avaliação de aspectos actualmente disponíveis,
seguindo a configuração de modelos aplicada para Inglês.
Obtivemos resultados competitivos em todas as tarefas, em Inglês e Português, particularmente considerando
que os nossos modelos são baseados em ferramentas e tecnologias disponíveis, e que todos
os nossos aspectos e modelos são apropriados para computação na maioria dos computadores modernos,
excepto os modelos baseados em vectores contextuais. Em particular, para Inglês, os nossos aspectos
semânticos a partir de Estruturas de Representação de Discurso melhoram o desempenho de outros modelos,
quando integrados no conjunto de aspectos de tais modelos, e obtivemos resultados estado da arte
para Português, com modelos baseados em afinação de vectores contextuais para certa tarefa
Evaluating Robustness of Dialogue Summarization Models in the Presence of Naturally Occurring Variations
Dialogue summarization task involves summarizing long conversations while
preserving the most salient information. Real-life dialogues often involve
naturally occurring variations (e.g., repetitions, hesitations) and existing
dialogue summarization models suffer from performance drop on such
conversations. In this study, we systematically investigate the impact of such
variations on state-of-the-art dialogue summarization models using publicly
available datasets. To simulate real-life variations, we introduce two types of
perturbations: utterance-level perturbations that modify individual utterances
with errors and language variations, and dialogue-level perturbations that add
non-informative exchanges (e.g., repetitions, greetings). We conduct our
analysis along three dimensions of robustness: consistency, saliency, and
faithfulness, which capture different aspects of the summarization model's
performance. We find that both fine-tuned and instruction-tuned models are
affected by input variations, with the latter being more susceptible,
particularly to dialogue-level perturbations. We also validate our findings via
human evaluation. Finally, we investigate if the robustness of fine-tuned
models can be improved by training them with a fraction of perturbed data and
observe that this approach is insufficient to address robustness challenges
with current models and thus warrants a more thorough investigation to identify
better solutions. Overall, our work highlights robustness challenges in
dialogue summarization and provides insights for future research
Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey
Large Language Models (LLMs) have revolutionized the domain of natural
language processing (NLP) with remarkable capabilities of generating human-like
text responses. However, despite these advancements, several works in the
existing literature have raised serious concerns about the potential misuse of
LLMs such as spreading misinformation, generating fake news, plagiarism in
academia, and contaminating the web. To address these concerns, a consensus
among the research community is to develop algorithmic solutions to detect
AI-generated text. The basic idea is that whenever we can tell if the given
text is either written by a human or an AI, we can utilize this information to
address the above-mentioned concerns. To that end, a plethora of detection
frameworks have been proposed, highlighting the possibilities of AI-generated
text detection. But in parallel to the development of detection frameworks,
researchers have also concentrated on designing strategies to elude detection,
i.e., focusing on the impossibilities of AI-generated text detection. This is a
crucial step in order to make sure the detection frameworks are robust enough
and it is not too easy to fool a detector. Despite the huge interest and the
flurry of research in this domain, the community currently lacks a
comprehensive analysis of recent developments. In this survey, we aim to
provide a concise categorization and overview of current work encompassing both
the prospects and the limitations of AI-generated text detection. To enrich the
collective knowledge, we engage in an exhaustive discussion on critical and
challenging open questions related to ongoing research on AI-generated text
detection
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4
Large language models (LLMs) are a special class of pretrained language
models obtained by scaling model size, pretraining corpus and computation.
LLMs, because of their large size and pretraining on large volumes of text
data, exhibit special abilities which allow them to achieve remarkable
performances without any task-specific training in many of the natural language
processing tasks. The era of LLMs started with OpenAI GPT-3 model, and the
popularity of LLMs is increasing exponentially after the introduction of models
like ChatGPT and GPT4. We refer to GPT-3 and its successor OpenAI models,
including ChatGPT and GPT4, as GPT-3 family large language models (GLLMs). With
the ever-rising popularity of GLLMs, especially in the research community,
there is a strong need for a comprehensive survey which summarizes the recent
research progress in multiple dimensions and can guide the research community
with insightful future research directions. We start the survey paper with
foundation concepts like transformers, transfer learning, self-supervised
learning, pretrained language models and large language models. We then present
a brief overview of GLLMs and discuss the performances of GLLMs in various
downstream tasks, specific domains and multiple languages. We also discuss the
data labelling and data augmentation abilities of GLLMs, the robustness of
GLLMs, the effectiveness of GLLMs as evaluators, and finally, conclude with
multiple insightful future research directions. To summarize, this
comprehensive survey paper will serve as a good resource for both academic and
industry people to stay updated with the latest research related to GPT-3
family large language models.Comment: Preprint under review, 58 page
- …