131,112 research outputs found
lilGym: Natural Language Visual Reasoning with Reinforcement Learning
We present lilGym, a new benchmark for language-conditioned reinforcement
learning in visual environments. lilGym is based on 2,661 highly-compositional
human-written natural language statements grounded in an interactive visual
environment. We introduce a new approach for exact reward computation in every
possible world state by annotating all statements with executable Python
programs. Each statement is paired with multiple start states and reward
functions to form thousands of distinct Markov Decision Processes of varying
difficulty. We experiment with lilGym with different models and learning
regimes. Our results and analysis show that while existing methods are able to
achieve non-trivial performance, lilGym forms a challenging open problem.
lilGym is available at https://lil.nlp.cornell.edu/lilgym/.Comment: ACL 2023 Long Pape
Chain of Thought Prompt Tuning in Vision Language Models
Language-Image Pre-training has demonstrated promising results on zero-shot
and few-shot downstream tasks by prompting visual models with natural language
prompts. However, most recent studies only use a single prompt for tuning,
neglecting the inherent step-to-step cognitive reasoning process that humans
conduct in complex task settings, for example, when processing images from
unfamiliar domains. Chain of Thought is a simple and effective approximation to
human reasoning process and has been proven useful for natural language
processing (NLP) tasks. Based on this cognitive intuition, we believe that
conducting effective reasoning is also an important problem in visual tasks,
and a chain of thought could be a solution to this problem. In this work, we
propose a novel chain of thought prompt tuning for vision-language modeling.
Extensive experiments show that our method not only generalizes better in image
classification tasks, has greater transferability beyond a single dataset, and
has stronger domain generalization performance, but also performs much better
in imagetext retrieval and visual question answering, which require more
reasoning capabilities. We are the first to successfully adapt chain-of-thought
prompting that combines visual and textual embeddings. We will release our
code
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought
How does language inform our downstream thinking? In particular, how do
humans make meaning from language -- and how can we leverage a theory of
linguistic meaning to build machines that think in more human-like ways? In
this paper, we propose \textit{rational meaning construction}, a computational
framework for language-informed thinking that combines neural models of
language with probabilistic models for rational inference. We frame linguistic
meaning as a context-sensitive mapping from natural language into a
\textit{probabilistic language of thought} (PLoT) -- a general-purpose symbolic
substrate for probabilistic, generative world modeling. Our architecture
integrates two powerful computational tools that have not previously come
together: we model thinking with \textit{probabilistic programs}, an expressive
representation for flexible commonsense reasoning; and we model meaning
construction with \textit{large language models} (LLMs), which support
broad-coverage translation from natural language utterances to code expressions
in a probabilistic programming language. We illustrate our framework in action
through examples covering four core domains from cognitive science:
probabilistic reasoning, logical and relational reasoning, visual and physical
reasoning, and social reasoning about agents and their plans. In each, we show
that LLMs can generate context-sensitive translations that capture
pragmatically-appropriate linguistic meanings, while Bayesian inference with
the generated programs supports coherent and robust commonsense reasoning. We
extend our framework to integrate cognitively-motivated symbolic modules to
provide a unified commonsense thinking interface from language. Finally, we
explore how language can drive the construction of world models themselves
- …