Search CORE

7 research outputs found

Answering Unseen Questions With Smaller Language Models Using Rationale Generation and Dense Retrieval

Author: Benavides-Prado Diana
Hartill Tim
Riddle Patricia J.
Witbrock Michael
Publication venue
Publication date: 11/08/2023
Field of study

When provided with sufficient explanatory context, smaller Language Models have been shown to exhibit strong reasoning ability on challenging short-answer question-answering tasks where the questions are unseen in training. We evaluate two methods for further improvement in this setting. Both methods focus on combining rationales generated by a larger Language Model with longer contexts created from a multi-hop dense retrieval system. The first method (

\textit{RR}

) involves training a Rationale Ranking model to score both generated rationales and retrieved contexts with respect to relevance and truthfulness. We then use the scores to derive combined contexts from both knowledge sources using a number of combinatory strategies. For the second method (

\textit{RATD}

) we train a smaller Reasoning model using retrieval-augmented training datasets such that it becomes proficient at utilising relevant information from longer text sequences that may be only partially evidential and frequently contain many irrelevant sentences. Generally we find that both methods are effective but that the

\textit{RATD}

method is more straightforward to apply and produces the strongest results in the unseen setting on which we focus. Our single best Reasoning model using only 440 million parameters materially improves upon strong comparable prior baselines for unseen evaluation datasets (StrategyQA 58.9

\rightarrow

61.7 acc., CommonsenseQA 63.6

\rightarrow

72.7 acc., ARC-DA 31.6

\rightarrow

52.1 F1, IIRC 25.5

\rightarrow

27.3 F1) and a version utilising our prior knowledge of each type of question in selecting a context combination strategy does even better. Our proposed models also generally outperform direct prompts against much larger models (BLOOM 175B and StableVicuna 13B) in both few-shot chain-of-thought and few-shot answer-only settings

arXiv.org e-Print Archive

The Entity-Deduction Arena: A playground for probing the conversational reasoning and planning capabilities of LLMs

Author: Jaitly Navdeep
Lu Jiarui
Zhang Yizhe
Publication venue
Publication date: 04/10/2023
Field of study

Large language models (LLMs) are effective at answering questions that are clearly asked. However, when faced with ambiguous queries they can act unpredictably and produce incorrect outputs. This underscores the need for the development of intelligent agents capable of asking clarification questions to resolve ambiguities effectively. This capability requires complex understanding, state tracking, reasoning and planning over multiple conversational turns. However, directly measuring this can be challenging. In this paper, we offer a surrogate problem which assesses an LLMs's capability to deduce an entity unknown to itself, but revealed to a judge, by asking the judge a series of queries. This entity-deducing game can serve as an evaluation framework to probe the conversational reasoning and planning capabilities of language models. We systematically evaluate various LLMs and discover significant differences in their performance on this task. We find that strong LLMs like GPT-4 outperform human players by a large margin. We further employ Behavior Cloning (BC) to examine whether a weaker model is capable of imitating a stronger model and generalizing to data or domains, using only the demonstrations from a stronger model. We finally propose to use Reinforcement Learning to enhance reasoning and planning capacity of Vicuna models through episodes of game playing, which lead to significant performance improvement. We hope that this problem offers insights into how autonomous agents could be trained to behave more intelligently in ambiguous circumstances.Comment: 22 page

arXiv.org e-Print Archive

VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models

Author: Bac-Bien Ngo
Hong-Phuoc Nguyen
Ngoc-Bich Le
The-Duy Vo
Thi-My-Thanh Nguyen
Van-Tien Nguyen
Xuan-Dung Phan
Xuan-Quy Dao
Publication venue
Publication date: 20/05/2023
Field of study

The VNHSGE (VietNamese High School Graduation Examination) dataset, developed exclusively for evaluating large language models (LLMs), is introduced in this article. The dataset, which covers nine subjects, was generated from the Vietnamese National High School Graduation Examination and comparable tests. 300 literary essays have been included, and there are over 19,000 multiple-choice questions on a range of topics. The dataset assesses LLMs in multitasking situations such as question answering, text generation, reading comprehension, visual question answering, and more by including both textual data and accompanying images. Using ChatGPT and BingChat, we evaluated LLMs on the VNHSGE dataset and contrasted their performance with that of Vietnamese students to see how well they performed. The results show that ChatGPT and BingChat both perform at a human level in a number of areas, including literature, English, history, geography, and civics education. They still have space to grow, though, especially in the areas of mathematics, physics, chemistry, and biology. The VNHSGE dataset seeks to provide an adequate benchmark for assessing the abilities of LLMs with its wide-ranging coverage and variety of activities. We intend to promote future developments in the creation of LLMs by making this dataset available to the scientific community, especially in resolving LLMs' limits in disciplines involving mathematics and the natural sciences.Comment: 74 pages, 44 figure

arXiv.org e-Print Archive

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Author: Giulianelli M.
Jumelet J.
Schubert M.
Shutova E.
Siro C.
Srivastava A.
ter Hoeve M.
Tong X.
Publication venue
Publication date: 10/06/2022
Field of study

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Pseudo-contractions as Gentle Repairs

Author: Bitencourt Matos Vinícius
David Santos Yuri
Ferreira Guimarães Ricardo
Wassermann Renata
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Updating a knowledge base to remove an unwanted consequence is a challenging task. Some of the original sentences must be either deleted or weakened in such a way that the sentence to be removed is no longer entailed by the resulting set. On the other hand, it is desirable that the existing knowledge be preserved as much as possible, minimising the loss of information. Several approaches to this problem can be found in the literature. In particular, when the knowledge is represented by an ontology, two different families of frameworks have been developed in the literature in the past decades with numerous ideas in common but with little interaction between the communities: applications of AGM-like Belief Change and justification-based Ontology Repair. In this paper, we investigate the relationship between pseudo-contraction operations and gentle repairs. Both aim to avoid the complete deletion of sentences when replacing them with weaker versions is enough to prevent the entailment of the unwanted formula. We show the correspondence between concepts on both sides and investigate under which conditions they are equivalent. Furthermore, we propose a unified notation for the two approaches, which might contribute to the integration of the two areas

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Is safety a value proposition?:The case of fire inspection

Author: Hansen Anne Vorre
Scupola Ada
Publication venue: Mondragon Unibertsitatea
Publication date: 01/01/2017
Field of study

Roskilde Universitet

LIPIcs, Volume 261, ICALP 2023, Complete Volume

Author: Etessami Kousha
Feige Uriel
Puppis Gabriele
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)
Publication date: 01/01/2023
Field of study

LIPIcs, Volume 261, ICALP 2023, Complete Volum

Dagstuhl Research Online Publication Server