Search CORE

13 research outputs found

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Author: Galstyan Aram
Ghazarian Sarik
Peng Nanyun
Weischedel Ralph
Publication venue
Publication date: 24/01/2020
Field of study

User engagement is a critical metric for evaluating the quality of open-domain dialogue systems. Prior work has focused on conversation-level engagement by using heuristically constructed features such as the number of turns and the total time of the conversation. In this paper, we investigate the possibility and efficacy of estimating utterance-level engagement and define a novel metric, {\em predictive engagement}, for automatic evaluation of open-domain dialogue systems. Our experiments demonstrate that (1) human annotators have high agreement on assessing utterance-level engagement scores; (2) conversation-level engagement scores can be predicted from properly aggregated utterance-level engagement scores. Furthermore, we show that the utterance-level engagement scores can be learned from data. These scores can improve automatic evaluation metrics for open-domain dialogue systems, as shown by correlation with human judgements. This suggests that predictive engagement can be used as a real-time feedback for training better dialogue models

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Overview of Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems at DSTC 11 Track 4

Author: D'Haro Luis Fernando
Ghazarian Sarik
Rodríguez-Cantelar Mario
Rudnicky Alexander
Sedoc João
Shi Ke
Tang Chengguang
Zhang Chen
Publication venue
Publication date: 22/06/2023
Field of study

The advent and fast development of neural networks have revolutionized the research on dialogue systems and subsequently have triggered various challenges regarding their automatic evaluation. Automatic evaluation of open-domain dialogue systems as an open challenge has been the center of the attention of many researchers. Despite the consistent efforts to improve automatic metrics' correlations with human evaluation, there have been very few attempts to assess their robustness over multiple domains and dimensions. Also, their focus is mainly on the English language. All of these challenges prompt the development of automatic evaluation metrics that are reliable in various domains, dimensions, and languages. This track in the 11th Dialogue System Technology Challenge (DSTC11) is part of the ongoing effort to promote robust and multilingual automatic evaluation metrics. This article describes the datasets and baselines provided to participants and discusses the submission and result details of the two proposed subtasks

arXiv.org e-Print Archive

ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems

Author: Galstyan Aram
Ghazarian Sarik
Han Rujun
Peng Nanyun
Shao Yijia
Publication venue
Publication date: 12/05/2023
Field of study

Commonsense reasoning is omnipresent in human communications and thus is an important feature for open-domain dialogue systems. However, evaluating commonsense in dialogue systems is still an open challenge. We take the first step by focusing on event commonsense that considers events and their relations, and is crucial in both dialogues and general commonsense reasoning. We propose ACCENT, an event commonsense evaluation metric empowered by commonsense knowledge bases (CSKBs). ACCENT first extracts event-relation tuples from a dialogue, and then evaluates the response by scoring the tuples in terms of their compatibility with the CSKB. To evaluate ACCENT, we construct the first public event commonsense evaluation dataset for open-domain dialogues. Our experiments show that ACCENT is an efficient metric for event commonsense evaluation, which achieves higher correlations with human judgments than existing baselines.Comment: ACL 202

arXiv.org e-Print Archive

Modeling Dialogues with Hashcode Representations: A Nonparametric Approach

Author: Cecchi Guillermo
Galstyan Aram
Gao Shuyang
Garg Sahil
Ghazarian Sarik
Goyal Palash
Rish Irina
Ver Steeg Greg
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 03/04/2020
Field of study

We propose a novel dialogue modeling framework, the first-ever nonparametric kernel functions based approach for dialogue modeling, which learns hashcodes as text representations; unlike traditional deep learning models, it handles well relatively small datasets, while also scaling to large ones. We also derive a novel lower bound on mutual information, used as a model-selection criterion favoring representations with better alignment between the utterances of participants in a collaborative dialogue setting, as well as higher predictability of the generated responses. As demonstrated on three real-life datasets, including prominently psychotherapy sessions, the proposed approach significantly outperforms several state-of-art neural network based dialogue systems, both in terms of computational efficiency, reducing training time from days or weeks to hours, and the response quality, achieving an order of magnitude improvement over competitors in frequency of being chosen as the best model by human evaluators

Association for the Advancement of Artificial Intelligence: AAAI Publications

Enhancing memory-based collaborative filtering for group recommender systems

Author: Al-Shamri
Amer-Yahia
Ardissono
Chee
Chen
Christensen
Deshpande
Eckhardt
Goldberg
Grčar
Han
Huang
Jameson
Kardan
Kim
Koren
Martín-Vicente
Masthoff
Min
Mohammad Ali Nematbakhsh
Ntoutsi
Park
Pasero
Perugini
Popescu
Resnick
Sarik Ghazarian
Smola
Su
Vucetic
Yu
Yu
Üstün
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

PARSINLU: A Suite of Language Understanding Challenges for Persian

Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce PARSINLU, the first benchmark in Persian language that includes a range of language understanding tasks-reading comprehension, textual entailment, and so on. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5k new instances across 6 distinct NLU tasks. Additionally, we present the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope PARSINLU fosters further research and advances in Persian language understanding.(1

Infoscience - École polytechnique fédérale de Lausanne