Charles University

Biblio at Institute of Formal and Applied Linguistics

Not a member yet

506 research outputs found

Sort by

LEEETs-Dial: Linguistic Entrainment in End-to-End Task-oriented Dialogue systems

Author: Dušek Ondřej
Kumar Nalin
Publication venue
Publication date: 01/01/2024
Field of study

Linguistic entrainment, or alignment, represents a phenomenon where linguistic patterns employed by conversational participants converge to one another. While entrainment has been shown to produce a more natural user experience, most dialogue systems do not have any provisions for it. In this work, we introduce methods for achieving dialogue entrainment in a GPT-2-based end-to-end task-oriented dialogue system through the utilization of shared vocabulary. We experiment with training instance weighting, entrainment-specific loss, and additional conditioning to generate responses that align with the user. We demonstrate that all three approaches produce significantly better entrainment than the base, non-entrainment-optimized model, as confirmed by both automated and manual evaluation metrics

Představení projektu ELITR

Author: Bojar Ondřej
Macháček Dominik
Publication venue
Publication date: 01/01/2024
Field of study

I presented the result of the EU project ELITR: live speech translation system from 99 to 43 languages

Ask the experts: sourcing a high-quality nutrition counseling dataset through Human-AI collaboration

Author: Sargsyan Rafael
Kumar Vivek
Reforgiato Recupero Diego
Riboni Daniele
Li Karen
Dušek Ondřej
Balloccu Simone
Reiter Ehud
Publication venue
Publication date: 01/01/2024
Field of study

Large Language Models (LLMs) are being employed by end-users for various tasks, including sensitive ones such as health counseling, disregarding potential safety concerns. It is thus necessary to understand how adequately LLMs perform in such domains. We conduct a case study on ChatGPT in nutrition counseling, a popular use-case where the model supports a user with their dietary struggles. We crowdsource real-world diet-related struggles, then work with nutrition experts to generate supportive text using ChatGPT. Finally, experts evaluate the safety and text quality of ChatGPT’s output. The result is the HAI-Coaching dataset, containing ~2.4K crowdsourced dietary struggles and ~97K corresponding ChatGPT-generated and expert-annotated supportive texts. We analyse ChatGPT’s performance, discovering potentially harmful behaviours, especially for sensitive topics like mental health. Finally, we use HAI-Coaching to test open LLMs on various downstream tasks, showing that even the latest models struggle to achieve good performance. HAI-Coaching is available at https://github.com/uccollab/hai-coaching/

Are Large Language Models Actually Good at Text Style Transfer?

Author: Mukherjee Sourabrata
Dušek Ondřej
Ojha Atul
Publication venue
Publication date: 01/01/2024
Field of study

We analyze the performance of large language models (LLMs) on Text Style Transfer (TST), specifically focusing on sentiment transfer and text detoxification across three languages: English, Hindi, and Bengali. Text Style Transfer involves modifying the linguistic style of a text while preserving its core content. We evaluate the capabilities of pre-trained LLMs using zero-shot and few-shot prompting as well as parameter-efficient finetuning on publicly available datasets. Our evaluation using automatic metrics, GPT-4 and human evaluations reveals that while some prompted LLMs perform well in English, their performance in on other languages (Hindi, Bengali) remains average. However, finetuning significantly improves results compared to zero-shot and few-shot prompting, making them comparable to previous state-of-the-art. This underscores the necessity of dedicated datasets and specialized models for effective TST

UFAL Speech Corpus of North Levantine Arabic 1.0 - Part 1

Author: Pecina Pavel
Pospíšil Adam
Krubiński Mateusz
Zemánek Petr
Sellat Hashem
Publication venue
Publication date: 01/01/2024
Field of study

The corpus contains recordings by the native speakers of the North Levantine Arabic (apc) acquired during 2020, 2021, and 2023 in Prague, Paris, Kabardia, and St. Petersburg. The data provided in this repository corresponds to the validation split of the dialectal Arabic to English shared task hosted at the 21st edition of the International Conference on Spoken Language Translation, i.e., IWSLT 2024

FINDINGS OF THE IWSLT 2024 EVALUATION CAMPAIGN

This paper reports on the shared tasks organized by the 21st IWSLT Conference. The shared tasks address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, dialect and low-resource speech translation, and Indic languages. The shared tasks attracted 17 teams whose submissions are documented in 27 system papers. The growing interest towards spoken language translation is also witnessed by the constantly increasing number of shared task organizers and contributors to the overview paper, almost evenly distributed across industry and academia

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Author: Reddy Siva
Lu Xing
Kasner Zdeněk
Publication venue
Publication date: 01/01/2024
Field of study

We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve realworld tasks in a multi-turn dialogue fashion. To support this problem, we introduce WebLINX – a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. Our benchmark covers abroad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios. Due to the magnitude of information present, Large Language Models (LLMs) cannot process entire web pages in real-time. To solve this bottleneck, we design a retrieval-inspired model that efficiently prunes HTML pages by ranking relevant elements. We use the selected elements, along with screenshots and action history, to assess a variety of models for their ability to replicate human behavior when navigating the web. Our experiments span from small text-only to proprietary multimodal LLMs. We find that smaller finetuned decoders surpass the best zero-shot LLMs (including GPT4V), but also larger finetuned multimodal models which were explicitly pretrained on screenshots. However, all finetuned models struggle to generalize to unseen websites. Our findings highlight the need for large multimodal models that can generalize to novel settings. Our code, data and models are available for research: https://mcgillnlp.github.io/weblinx

Critic-Driven Decoding for Mitigating Hallucinations in Data-to-text Generation

Author: Dušek Ondřej
Lango Mateusz
Publication venue
Publication date: 01/01/2024
Field of study

Hallucination of text ungrounded in the input is a well-known problem in neural data-to-text generation. Many methods have been proposed to mitigate it, but they typically require altering model architecture or collecting additional data, and thus cannot be easily applied to an existing model. In this paper, we explore a new way to mitigate hallucinations by combining the probabilistic output of a generator language model (LM) with the output of a special “text critic” classifier, which guides the generation by assessing the match between the input data and the text generated so far. Our method does not need any changes to the underlying LM’s architecture or training procedure and can thus be combined with any model and decoding operating on word probabilities. The critic does not need any additional training data, using the base LM’s training data and synthetic negative examples. Our experimental results show that our method improves over the baseline on the WebNLG and OpenDialKG benchmarks

Leveraging Large Language Models for Building Interpretable Rule-Based Data-to-Text Systems

Author: Dušek Ondřej
Warczyński Jędrzej
Lango Mateusz
Publication venue
Publication date: 01/01/2024
Field of study

We introduce a simple approach that uses a large language model (LLM) to automatically implement a fully interpretable rule-based data-to-text system in pure Python. Experimental evaluation on the WebNLG dataset showed that such a constructed system produces text of better quality (according to the BLEU and BLEURT metrics) than the same LLM prompted to directly produce outputs, and produces fewer hallucinations than a BART language model fine-tuned on the same data. Furthermore, at runtime, the approach generates text in a fraction of the processing time required by neural approaches, using only a single CPU

Getting Structure in Dialogue with Large Language Models

Author: Dušek Ondřej
Publication venue
Publication date: 01/01/2024
Field of study

An introduction into LLM workings and problems as well as an overview of recent experiments with using LLMs to model and evaluate dialogue

58

full texts

506

metadata records

Updated in last 30 days.

Biblio at Institute of Formal and Applied Linguistics is based in Czechia

Access Repository Dashboard

Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇