239 research outputs found
An Efficient Distribution of Labor in a Two Stage Robust Interpretation Process
Although Minimum Distance Parsing (MDP) offers a theoretically attractive
solution to the problem of extragrammaticality, it is often computationally
infeasible in large scale practical applications. In this paper we present an
alternative approach where the labor is distributed between a more restrictive
partial parser and a repair module. Though two stage approaches have grown in
popularity in recent years because of their efficiency, they have done so at
the cost of requiring hand coded repair heuristics. In contrast, our two stage
approach does not require any hand coded knowledge sources dedicated to repair,
thus making it possible to achieve a similar run time advantage over MDP
without losing the quality of domain independence.Comment: 9 pages, 1 Postscript figure, uses aclap.sty and psfig.tex, In
Proceedings of EMNLP 199
Towards Multilingual Automatic Dialogue Evaluation
The main limiting factor in the development of robust multilingual dialogue
evaluation metrics is the lack of multilingual data and the limited
availability of open sourced multilingual dialogue systems. In this work, we
propose a workaround for this lack of data by leveraging a strong multilingual
pretrained LLM and augmenting existing English dialogue data using Machine
Translation. We empirically show that the naive approach of finetuning a
pretrained multilingual encoder model with translated data is insufficient to
outperform the strong baseline of finetuning a multilingual model with only
source data. Instead, the best approach consists in the careful curation of
translated data using MT Quality Estimation metrics, excluding low quality
translations that hinder its performance.Comment: SIGDIAL2
Tagging Of Speech Acts And Dialogue Games In Spanish Call Home
The Clarity project is devoted to automatic detection and classification of discourse structures in casual, non-task-oriented conversation using shallow, corpus-based methods of analysis. For the Clarity project, we have tagged speech acts and dialogue games in the Call Home Spanish corpus. We have done preliminary cross-level experiments on the relationship of word and speech act n-grams to dialogue games. Our results show that the label of a game cannot be predicted from n-grams of words it contains. We get better than baseline results for predicting the label of a game from the sequence of speech acts it contains, but only when the speech acts are hand tagged, and not when they are automatically detected. Our future research will focus on finding linguistic cues that are more predictive of game labels. The automatic classification of speech acts and games is carried out in a multi-level architecture that integrates classification at multiple discourse levels instead of performing them sequentially
Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation
Despite significant research effort in the development of automatic dialogue
evaluation metrics, little thought is given to evaluating dialogues other than
in English. At the same time, ensuring metrics are invariant to semantically
similar responses is also an overlooked topic. In order to achieve the desired
properties of robustness and multilinguality for dialogue evaluation metrics,
we propose a novel framework that takes advantage of the strengths of current
evaluation models with the newly-established paradigm of prompting Large
Language Models (LLMs). Empirical results show our framework achieves state of
the art results in terms of mean Spearman correlation scores across several
benchmarks and ranks first place on both the Robust and Multilingual tasks of
the DSTC11 Track 4 "Automatic Evaluation Metrics for Open-Domain Dialogue
Systems", proving the evaluation capabilities of prompted LLMs.Comment: DSTC11 best paper for Track
- …