Search CORE

20 research outputs found

Test-Time Amendment with a Coarse Classifier for Fine-Grained Classification

Author: Gandhi Vineet
Jain Kanishk
Karthik Shyamgopal
Publication venue
Publication date: 30/10/2023
Field of study

We investigate the problem of reducing mistake severity for fine-grained classification. Fine-grained classification can be challenging, mainly due to the requirement of domain expertise for accurate annotation. However, humans are particularly adept at performing coarse classification as it requires relatively low levels of expertise. To this end, we present a novel approach for Post-Hoc Correction called Hierarchical Ensembles (HiE) that utilizes label hierarchy to improve the performance of fine-grained classification at test-time using the coarse-grained predictions. By only requiring the parents of leaf nodes, our method significantly reduces avg. mistake severity while improving top-1 accuracy on the iNaturalist-19 and tieredImageNet-H datasets, achieving a new state-of-the-art on both benchmarks. We also investigate the efficacy of our approach in the semi-supervised setting. Our approach brings notable gains in top-1 accuracy while significantly decreasing the severity of mistakes as training data decreases for the fine-grained classes. The simplicity and post-hoc nature of HiE renders it practical to be used with any off-the-shelf trained model to improve its predictions further.Comment: 8 pages, 2 figures, 3 tables, Accepted at NeurIPS 202

arXiv.org e-Print Archive

Strategic Reasoning with Language Models

Author: Gandhi Kanishk
Goodman Noah D.
Sadigh Dorsa
Publication venue
Publication date: 30/05/2023
Field of study

Strategic reasoning enables agents to cooperate, communicate, and compete with other agents in diverse situations. Existing approaches to solving strategic games rely on extensive training, yielding strategies that do not generalize to new scenarios or games without retraining. Large Language Models (LLMs), with their ability to comprehend and generate complex, context-rich language, could prove powerful as tools for strategic gameplay. This paper introduces an approach that uses pretrained LLMs with few-shot chain-of-thought examples to enable strategic reasoning for AI agents. Our approach uses systematically generated demonstrations of reasoning about states, values, and beliefs to prompt the model. Using extensive variations of simple matrix games, we show that strategies that are derived based on systematically generated prompts generalize almost perfectly to new game structures, alternate objectives, and hidden information. Additionally, we demonstrate our approach can lead to human-like negotiation strategies in realistic scenarios without any extra training or fine-tuning. Our results highlight the ability of LLMs, guided by systematic reasoning demonstrations, to adapt and excel in diverse strategic scenarios

arXiv.org e-Print Archive

Understanding Social Reasoning in Language Models with Language Models

Author: Fränken Jan-Philipp
Gandhi Kanishk
Gerstenberg Tobias
Goodman Noah D.
Publication venue
Publication date: 21/06/2023
Field of study

As Large Language Models (LLMs) become increasingly integrated into our everyday lives, understanding their ability to comprehend human mental states becomes critical for ensuring effective interactions. However, despite the recent attempts to assess the Theory-of-Mind (ToM) reasoning capabilities of LLMs, the degree to which these models can align with human ToM remains a nuanced topic of exploration. This is primarily due to two distinct challenges: (1) the presence of inconsistent results from previous evaluations, and (2) concerns surrounding the validity of existing evaluation methodologies. To address these challenges, we present a novel framework for procedurally generating evaluations with LLMs by populating causal templates. Using our framework, we create a new social reasoning benchmark (BigToM) for LLMs which consists of 25 controls and 5,000 model-written evaluations. We find that human participants rate the quality of our benchmark higher than previous crowd-sourced evaluations and comparable to expert-written evaluations. Using BigToM, we evaluate the social reasoning capabilities of a variety of LLMs and compare model performances with human performance. Our results suggest that GPT4 has ToM capabilities that mirror human inference patterns, though less reliable, while other LLMs struggle

arXiv.org e-Print Archive

Stream of Search (SoS): Learning to Search in Language

Author: Cheng Winson
Gandhi Kanishk
Goodman Noah D.
Grand Gabriel
Lee Denise
Liu Muxin
Sharma Archit
Publication venue
Publication date: 01/04/2024
Field of study

Language models are rarely shown fruitful mistakes while training. They then struggle to look beyond the next token, suffering from a snowballing of errors and struggling to predict the consequence of their actions several steps ahead. In this paper, we show how language models can be taught to search by representing the process of search in language, as a flattened string -- a stream of search (SoS). We propose a unified language for search that captures an array of different symbolic search strategies. We demonstrate our approach using the simple yet difficult game of Countdown, where the goal is to combine input numbers with arithmetic operations to reach a target number. We pretrain a transformer-based language model from scratch on a dataset of streams of search generated by heuristic solvers. We find that SoS pretraining increases search accuracy by 25% over models trained to predict only the optimal search trajectory. We further finetune this model with two policy improvement methods: Advantage-Induced Policy Alignment (APA) and Self-Taught Reasoner (STaR). The finetuned SoS models solve 36% of previously unsolved problems, including problems that cannot be solved by any of the heuristic solvers. Our results indicate that language models can learn to solve problems via search, self-improve to flexibly use different search strategies, and potentially discover new ones

arXiv.org e-Print Archive

Social Contract AI: Aligning AI Assistants with Implicit Group Norms

Author: Arumugam Dilip
Fränken Jan-Philipp
Gandhi Kanishk
Gerstenberg Tobias
Goodman Noah D.
Kwok Sam
Moore Jared
Tamkin Alex
Ye Peixuan
Publication venue
Publication date: 03/12/2023
Field of study

We explore the idea of aligning an AI assistant by inverting a model of users' (unknown) preferences from observed interactions. To validate our proposal, we run proof-of-concept simulations in the economic ultimatum game, formalizing user preferences as policies that guide the actions of simulated players. We find that the AI assistant accurately aligns its behavior to match standard policies from the economic literature (e.g., selfish, altruistic). However, the assistant's learned policies lack robustness and exhibit limited generalization in an out-of-distribution setting when confronted with a currency (e.g., grams of medicine) that was not included in the assistant's training distribution. Additionally, we find that when there is inconsistency in the relationship between language use and an unknown policy (e.g., an altruistic policy combined with rude language), the assistant's learning of the policy is slowed. Overall, our preliminary results suggest that developing simulation frameworks in which AI assistants need to infer preferences from diverse users can provide a valuable approach for studying practical alignment questions.Comment: SoLaR NeurIPS 2023 Workshop (https://solar-neurips.github.io/

arXiv.org e-Print Archive

Instance-Level Semantic Maps for Vision Language Navigation

Author: Agarwal Anmol
Gandhi Vineet
Hafez Abdul
Jain Kanishk
Krishna K. Madhava
Mathur Aditya
Monis Aaron
Murthy Krishna
Nanwani Laksh
Prabhakar Raghav
Publication venue
Publication date: 23/05/2023
Field of study

Humans have a natural ability to perform semantic associations with the surrounding objects in the environment. This allows them to create a mental map of the environment which helps them to navigate on-demand when given a linguistic instruction. A natural goal in Vision Language Navigation (VLN) research is to impart autonomous agents with similar capabilities. Recently introduced VL Maps \cite{huang23vlmaps} take a step towards this goal by creating a semantic spatial map representation of the environment without any labelled data. However, their representations are limited for practical applicability as they do not distinguish between different instances of the same object. In this work, we address this limitation by integrating instance-level information into spatial map representation using a community detection algorithm and by utilizing word ontology learned by large language models (LLMs) to perform open-set semantic associations in the mapping representation. The resulting map representation improves the navigation performance by two-fold (233\%) on realistic language commands with instance-specific descriptions compared to VL Maps. We validate the practicality and effectiveness of our approach through extensive qualitative and quantitative experiments

arXiv.org e-Print Archive

Recommended from our members

Evaluating infants’ reasoning about agents using the Baby Intuitions Benchmark (BIB)

Author: Dillon Moira Rose
Gandhi Kanishk
Lake Brenden
Stojnic Gala
Publication venue: eScholarship, University of California
Publication date: 01/01/2021
Field of study

Young infants reason about the goals, preferences, and actions of others. State of the art computational models, however, still fail in such reasoning. The Baby Intuitions Benchmark (BIB) was designed to test agency reasoning in AI using an infant behavioral paradigm. While BIB’s presentation of simple animations makes it particularly suitable for testing AI, such vignettes have yet to be validated with infants. In this pilot, 11-month-old infants watched two sets of animations from BIB, one on agents’ consistent preferences and the other on agents’ efficient actions. Infants looked longer towards violations in agents’ behavior in both the preference (N = 24, β = 3.24 p = .040) and efficiency task (N = 24, β = 4.50 p = .016). These preliminary results suggest that infants’ agency reasoning is abstract enough to be elicited by simple animations and validate BIB as a test of agency reasoning for humans and AIs

eScholarship - University of California

Inaccessible-Goal Task

Author: Brenden M. Lake
Gala Stojnic
Kanishk Gandhi
Moira R. Dillon
Shannon Yasuda
Publication venue: 'Center for Open Science'
Publication date: 09/02/2023
Field of study

OSF Preprints

Multi-Agent Task

Author: Brenden M. Lake
Gala Stojnic
Kanishk Gandhi
Moira R. Dillon
Shannon Yasuda
Publication venue: 'Center for Open Science'
Publication date: 09/02/2023
Field of study

OSF Preprints